maha.parsers.functions#
Submodules#
Package Contents#
Functions#
|
Extract dimensions from a given text. |
|
Extracts certain characters/patterns from the given text. |
|
Extract matched strings in the given |
- parse_dimension(text, amount_of_money=None, duration=None, distance=None, numeral=None, ordinal=None, quantity=None, temperature=None, time=None, volume=None, names=None)[source]#
Extract dimensions from a given text.
- Parameters
text (str) – Text to extract dimensions from
amount_of_money (bool, optional) – Extract amount of money using the rule
RULE_AMOUNT_OF_MONEY, by default Noneduration (bool, optional) – Extract duration using the rule
RULE_DURATION, by default Nonedistance (bool, optional) – Extract distance using the rule
RULE_DISTANCE, by default Nonenumeral (bool, optional) – Extract numeral using the rule
RULE_NUMERAL, by default Noneordinal (bool, optional) – Extract ordinal using the rule
RULE_ORDINAL, by default Nonequantity (bool, optional) – Extract quantity using the rule
RULE_QUANTITY, by default Nonetemperature (bool, optional) – Extract temperature using the rule
RULE_TEMPERATURE, by default Nonetime (bool, optional) – Extract time using the rule
RULE_TIME, by default Nonevolume (bool, optional) – Extract volume using the rule
RULE_VOLUME, by default Nonenames (bool | None) –
- Returns
List of
Dimensionobjects extracted from the text- Return type
List[
Dimension]- Raises
ValueError – If no argument is set to True
- parse(text, arabic=False, english=False, arabic_letters=False, english_letters=False, english_small_letters=False, english_capital_letters=False, numbers=False, harakat=False, all_harakat=False, tatweel=False, punctuations=False, arabic_numbers=False, english_numbers=False, arabic_punctuations=False, english_punctuations=False, arabic_ligatures=False, arabic_hashtags=False, arabic_mentions=False, emails=False, english_hashtags=False, english_mentions=False, hashtags=False, links=False, mentions=False, emojis=False, custom_expressions=None, include_space=False)[source]#
Extracts certain characters/patterns from the given text.
To add a new parameter, make sure that its name is the same as the corresponding constant. For the patterns, only remove the prefix
EXPRESSION_from the parameter nameTO DO
Add the ability to combine all expressions before parsing.
- Parameters
text (str) – Text to be processed
arabic (bool, optional) – Extract
ARABICcharacters, by default Falseenglish (bool, optional) – Extract
ENGLISHcharacters, by default Falsearabic_letters (bool, optional) – Extract
ARABIC_LETTERScharacters, by default Falseenglish_letters (bool, optional) – Extract
ENGLISH_LETTERScharacters, by default Falseenglish_small_letters (bool, optional) – Extract
ENGLISH_SMALL_LETTERScharacters, by default Falseenglish_capital_letters (bool, optional) – Extract
ENGLISH_CAPITAL_LETTERScharacters, by default Falsenumbers (bool, optional) – Extract
NUMBERScharacters, by default Falseharakat (bool, optional) – Extract
HARAKATcharacters, by default Falseall_harakat (bool, optional) – Extract
ALL_HARAKATcharacters, by default Falsetatweel (bool, optional) – Extract
TATWEELcharacter, by default Falsepunctuations (bool, optional) – Extract
PUNCTUATIONScharacters, by default Falsearabic_numbers (bool, optional) – Extract
ARABIC_NUMBERScharacters, by default Falseenglish_numbers (bool, optional) – Extract
ENGLISH_NUMBERScharacters, by default Falsearabic_punctuations (bool, optional) – Extract
ARABIC_PUNCTUATIONScharacters, by default Falseenglish_punctuations (bool, optional) – Extract
ENGLISH_PUNCTUATIONScharacters, by default Falsearabic_ligatures (bool, optional) – Extract
ARABIC_LIGATURESwords, by default Falsearabic_hashtags (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_ARABIC_HASHTAGS, by default Falsearabic_mentions (bool, optional) – Extract Arabic mentions using the expression
EXPRESSION_ARABIC_MENTIONS, by default Falseemails (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_EMAILS, by default Falseenglish_hashtags (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_ENGLISH_HASHTAGS, by default Falseenglish_mentions (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_ENGLISH_MENTIONS, by default Falsehashtags (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_HASHTAGS, by default Falselinks (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_LINKS, by default Falsementions (bool, optional) – Extract Arabic hashtags using the expression
EXPRESSION_MENTIONS, by default Falseemojis (bool, optional) – Extract emojis using the expression
EXPRESSION_EMOJIS, by default Falsecustom_expressions (Union[
ExpressionGroup,Expression],) – optional. Include any other string(s), by default Noneinclude_space (bool, optional) – Include the space expression
EXPRESSION_SPACEwith all characters, by default False
- Returns
List of dimensions extracted from the text
- Return type
List[
Dimension]- Raises
ValueError – If no argument is set to True
- parse_expression(text, expressions, dimension_type=DimensionType.GENERAL)[source]#
Extract matched strings in the given
textusing the inputpatterns- Parameters
text (str) – Text to check
expressions (Union[
ExpressionGroup,Expression]) – Expression(s) to usedimension_type (DimensionType) – Dimension type of the input
expressions, by defaultDimensionType.GENERAL
- Returns
List of extracted dimensions
- Return type
List[
Dimension]- Raises
ValueError – If
expressionsare invalid