maha.parsers#

Subpackages#

Submodules#

Package Contents#

Classes#

Expression

Regex pattern holder.

Functions#

compile_rules()

compile_numeral_rules()

compile_ordinal_rules()

compile_time_rules()

compile_duration_rules()

get_fractions_of_unit_pattern(unit)

Returns the fractions of a unit pattern.

get_fractions_of_pattern(pattern)

Returns the fractions of a pattern.

wrap_pattern(pattern)

Adds start and end expression to the pattern.

spaced_patterns(*patterns)

Returns a regex pattern that matches any of the given patterns, separated by spaces.

parse_duration(match)

Parse duration.

parse_ordinal(match)

parse_time(match)

Attributes#

THIRD

Pattern that matches the pronunciation of third in Arabic

QUARTER

Pattern that matches the pronunciation of quarter in Arabic

HALF

Pattern that matches the pronunciation of half in Arabic

THREE_QUARTERS

Pattern that matches the pronunciation of three quarters in Arabic

WAW_CONNECTOR

Pattern that matches WAW as a connector between two words

WORD_SEPARATOR

Pattern that matches the word separator between numerals in Arabic

ALL_ALEF

Pattern that matches all possible forms of the ALEF in Arabic

TWO_SUFFIX

Pattern that matches the two-suffix of words in Arabic

SUM_SUFFIX

Pattern that matches the sum-suffix of words in Arabic

EXPRESSION_START

Pattern that matches the start of a rule expression in Arabic

EXPRESSION_END

Pattern that matches the end of a rule expression in Arabic

FRACTIONS

TEH_OPTIONAL_SUFFIX

AFTER

BEFORE

PREVIOUS

NEXT

AFTER_NEXT

BEFORE_PREVIOUS

IN_FROM_AT

FROM

TO

RULE_DURATION_SECONDS

RULE_DURATION_MINUTES

RULE_DURATION_HOURS

RULE_DURATION_DAYS

RULE_DURATION_WEEKS

RULE_DURATION_MONTHS

RULE_DURATION_YEARS

RULE_DURATION

RULE_NAME

RULE_NUMERAL_ONES

RULE_NUMERAL_TENS

RULE_NUMERAL_HUNDREDS

RULE_NUMERAL_THOUSANDS

RULE_NUMERAL_MILLIONS

RULE_NUMERAL_BILLIONS

RULE_NUMERAL_TRILLIONS

RULE_NUMERAL_INTEGERS

RULE_NUMERAL

RULE_ORDINAL_ONES

RULE_ORDINAL_TENS

RULE_ORDINAL_HUNDREDS

RULE_ORDINAL_THOUSANDS

RULE_ORDINAL_MILLIONS

RULE_ORDINAL_BILLIONS

RULE_ORDINAL_TRILLIONS

RULE_ORDINAL

RULE_TIME_YEARS

RULE_TIME_MONTHS

RULE_TIME_WEEKS

RULE_TIME_DAYS

RULE_TIME_HOURS

RULE_TIME_MINUTES

RULE_TIME_AM_PM

RULE_TIME_NOW

RULE_TIME

compile_rules()[source]#
compile_numeral_rules()[source]#
compile_ordinal_rules()[source]#
compile_time_rules()[source]#
compile_duration_rules()[source]#
get_fractions_of_unit_pattern(unit)#

Returns the fractions of a unit pattern.

Parameters

unit (str) – The unit pattern.

Returns

Pattern for the fractions of the unit.

Return type

str

get_fractions_of_pattern(pattern)#

Returns the fractions of a pattern.

Parameters

pattern (str) – The pattern.

Returns

Pattern for the fractions of the input pattern.

Return type

str

wrap_pattern(pattern)#

Adds start and end expression to the pattern.

Parameters

pattern (str) –

Return type

str

spaced_patterns(*patterns)#

Returns a regex pattern that matches any of the given patterns, separated by spaces.

Parameters

patterns – The patterns to match.

Return type

str

THIRD#

Pattern that matches the pronunciation of third in Arabic

QUARTER#

Pattern that matches the pronunciation of quarter in Arabic

HALF#

Pattern that matches the pronunciation of half in Arabic

THREE_QUARTERS#

Pattern that matches the pronunciation of three quarters in Arabic

WAW_CONNECTOR#

Pattern that matches WAW as a connector between two words

WORD_SEPARATOR#

Pattern that matches the word separator between numerals in Arabic

ALL_ALEF#

Pattern that matches all possible forms of the ALEF in Arabic

TWO_SUFFIX#

Pattern that matches the two-suffix of words in Arabic

SUM_SUFFIX#

Pattern that matches the sum-suffix of words in Arabic

EXPRESSION_START#

Pattern that matches the start of a rule expression in Arabic

EXPRESSION_END#

Pattern that matches the end of a rule expression in Arabic

FRACTIONS#
TEH_OPTIONAL_SUFFIX = [ةه]?#
AFTER#
BEFORE#
PREVIOUS#
NEXT#
AFTER_NEXT#
BEFORE_PREVIOUS#
IN_FROM_AT#
FROM#
TO#
RULE_DURATION_SECONDS#
RULE_DURATION_MINUTES#
RULE_DURATION_HOURS#
RULE_DURATION_DAYS#
RULE_DURATION_WEEKS#
RULE_DURATION_MONTHS#
RULE_DURATION_YEARS#
RULE_DURATION#
parse_duration(match)#

Parse duration.

class Expression(pattern, pickle=False)#

Regex pattern holder.

Parameters
  • pattern (str) – Regular expression pattern.

  • pickle (bool) – If True, the compiled pattern will be pickled. This is useful to save compilation time for large patterns.

pattern :str#

Regular expersion(s) to match

compile(self)#

Compile the regular expersion.

classmethod from_cache(cls, cache)#

Load an expression from cache.

Parameters

cache (str) – Name of the cache file.

Returns

Expression.

Return type

Expression

search(self, text)#

Search for the pattern in the input text.

Parameters

text (str) – Text to search in.

Returns

Matched object.

Return type

regex.Match

match(self, text)#

Match the pattern in the input text.

Parameters

text (str) – Text to match in.

Returns

Matched object.

Return type

Match[str]

fullmatch(self, text)#

Match the pattern in the input text.

Parameters

text (str) – Text to match in.

Returns

Matched object.

Return type

Match[str]

sub(self, repl, text)#

Replace all occurrences of the pattern in the input text.

Parameters
  • repl (str) – Replacement string.

  • text (str) – Text to replace.

Returns

Text with replaced occurrences.

Return type

str

parse(self, text)#

Extract values from the input text.

Parameters

text (str) – Text to extract the value from.

Yields

ExpressionResult – Extracted value.

Return type

Iterable[maha.rexy.templates.expression_result.ExpressionResult]

RULE_NAME#
RULE_NUMERAL_ONES#
RULE_NUMERAL_TENS#
RULE_NUMERAL_HUNDREDS#
RULE_NUMERAL_THOUSANDS#
RULE_NUMERAL_MILLIONS#
RULE_NUMERAL_BILLIONS#
RULE_NUMERAL_TRILLIONS#
RULE_NUMERAL_INTEGERS#
RULE_NUMERAL#
RULE_ORDINAL_ONES#
RULE_ORDINAL_TENS#
RULE_ORDINAL_HUNDREDS#
RULE_ORDINAL_THOUSANDS#
RULE_ORDINAL_MILLIONS#
RULE_ORDINAL_BILLIONS#
RULE_ORDINAL_TRILLIONS#
RULE_ORDINAL#
parse_ordinal(match)#
RULE_TIME_YEARS#
RULE_TIME_MONTHS#
RULE_TIME_WEEKS#
RULE_TIME_DAYS#
RULE_TIME_HOURS#
RULE_TIME_MINUTES#
RULE_TIME_AM_PM#
RULE_TIME_NOW#
RULE_TIME#
parse_time(match)#