maha.parsers.functions.parse_fn#

Functions that extracts values from text

Module Contents#

Functions#

parse(text[, arabic, english, ...])

Extracts certain characters/patterns from the given text.

parse_expression(text, expressions[, ...])

Extract matched strings in the given text using the input patterns

parse(text, arabic=False, english=False, arabic_letters=False, english_letters=False, english_small_letters=False, english_capital_letters=False, numbers=False, harakat=False, all_harakat=False, tatweel=False, punctuations=False, arabic_numbers=False, english_numbers=False, arabic_punctuations=False, english_punctuations=False, arabic_ligatures=False, arabic_hashtags=False, arabic_mentions=False, emails=False, english_hashtags=False, english_mentions=False, hashtags=False, links=False, mentions=False, emojis=False, custom_expressions=None, include_space=False)[source]#

Extracts certain characters/patterns from the given text.

To add a new parameter, make sure that its name is the same as the corresponding constant. For the patterns, only remove the prefix EXPRESSION_ from the parameter name

TO DO

Add the ability to combine all expressions before parsing.

Parameters
  • text (str) – Text to be processed

  • arabic (bool, optional) – Extract ARABIC characters, by default False

  • english (bool, optional) – Extract ENGLISH characters, by default False

  • arabic_letters (bool, optional) – Extract ARABIC_LETTERS characters, by default False

  • english_letters (bool, optional) – Extract ENGLISH_LETTERS characters, by default False

  • english_small_letters (bool, optional) – Extract ENGLISH_SMALL_LETTERS characters, by default False

  • english_capital_letters (bool, optional) – Extract ENGLISH_CAPITAL_LETTERS characters, by default False

  • numbers (bool, optional) – Extract NUMBERS characters, by default False

  • harakat (bool, optional) – Extract HARAKAT characters, by default False

  • all_harakat (bool, optional) – Extract ALL_HARAKAT characters, by default False

  • tatweel (bool, optional) – Extract TATWEEL character, by default False

  • punctuations (bool, optional) – Extract PUNCTUATIONS characters, by default False

  • arabic_numbers (bool, optional) – Extract ARABIC_NUMBERS characters, by default False

  • english_numbers (bool, optional) – Extract ENGLISH_NUMBERS characters, by default False

  • arabic_punctuations (bool, optional) – Extract ARABIC_PUNCTUATIONS characters, by default False

  • english_punctuations (bool, optional) – Extract ENGLISH_PUNCTUATIONS characters, by default False

  • arabic_ligatures (bool, optional) – Extract ARABIC_LIGATURES words, by default False

  • arabic_hashtags (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_ARABIC_HASHTAGS, by default False

  • arabic_mentions (bool, optional) – Extract Arabic mentions using the expression EXPRESSION_ARABIC_MENTIONS, by default False

  • emails (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_EMAILS, by default False

  • english_hashtags (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_ENGLISH_HASHTAGS, by default False

  • english_mentions (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_ENGLISH_MENTIONS, by default False

  • hashtags (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_HASHTAGS, by default False

  • links (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_LINKS, by default False

  • mentions (bool, optional) – Extract Arabic hashtags using the expression EXPRESSION_MENTIONS, by default False

  • emojis (bool, optional) – Extract emojis using the expression EXPRESSION_EMOJIS, by default False

  • custom_expressions (Union[ExpressionGroup, Expression],) – optional. Include any other string(s), by default None

  • include_space (bool, optional) – Include the space expression EXPRESSION_SPACE with all characters, by default False

Returns

List of dimensions extracted from the text

Return type

List[Dimension]

Raises

ValueError – If no argument is set to True

parse_expression(text, expressions, dimension_type=DimensionType.GENERAL)[source]#

Extract matched strings in the given text using the input patterns

Parameters
Returns

List of extracted dimensions

Return type

List[Dimension]

Raises

ValueError – If expressions are invalid