:py:mod:`maha.cleaners.functions.normalize_fn` ============================================== .. py:module:: maha.cleaners.functions.normalize_fn .. autoapi-nested-parse:: Special functions that convert similar characters into one common character (Characters that roughly have the same shape) Module Contents --------------- Functions ~~~~~~~~~ .. autosummary:: normalize normalize_lam_alef normalize_small_alef .. py:function:: normalize(text, lam_alef = None, alef = None, waw = None, yeh = None, teh_marbuta = None, ligatures = None, spaces = None, all = False) Normalizes characters in the given text :param text: Text to process :type text: str :param lam_alef: Normalize :data:`~.LAM_ALEF_VARIATIONS` characters to :data:`~.LAM` and :data:`~.ALEF`, by default None :type lam_alef: bool, optional :param alef: Normalize :data:`~.ALEF_VARIATIONS` characters to :data:`~.ALEF`, by default None :type alef: bool, optional :param waw: Normalize :data:`~.WAW_VARIATIONS` characters to :data:`~.WAW`, by default None :type waw: bool, optional :param yeh: Normalize :data:`~.YEH_VARIATIONS` characters to :data:`~.YEH` and :data:`~.ALEF`, by default None :type yeh: bool, optional :param teh_marbuta: Normalize :data:`~.TEH_MARBUTA` characters to :data:`~.HEH`, by default None :type teh_marbuta: bool, optional :param ligatures: Normalize :data:`~.ARABIC_LIGATURES` characters to the corresponding indices in :data:`~.ARABIC_LIGATURES_NORMALIZED`, by default None :type ligatures: bool, optional :param spaces: Normalize space variations using the expression :data:`~.EXPRESSION_ALL_SPACES`, by default None :type spaces: bool, optional :param all: Do all normalization except the ones that are set to False, by default False :type all: bool, optional :returns: Processed text :rtype: str :raises ValueError: If no argument is set to True .. rubric:: Examples .. code:: pycon >>> from maha.cleaners.functions import normalize >>> text = "عن أبي هريرة" >>> normalize(text, alef=True, teh_marbuta=True) 'عن ابي هريره' .. code:: pycon >>> from maha.cleaners.functions import normalize >>> text = "قال رسول الله ﷺ" >>> normalize(text, ligatures=True) 'قال رسول الله صلى الله عليه وسلم' .. code:: pycon >>> from maha.cleaners.functions import normalize >>> text = "قال مؤمن: ﷽ قل هو ﷲ أحد" ... # For space >>> normalize(text, all=True, waw=False) 'قال مؤمن: بسم الله الرحمن الرحيم قل هو الله احد' .. py:function:: normalize_lam_alef(text, keep_hamza = True) Normalize :data:`~.LAM_ALEF_VARIATIONS` to :data:`~.LAM_ALEF_VARIATIONS_NORMALIZED` If ``keep_hamza`` is True. Otherwise, normalize to :data:`~.LAM` and :data:`~.ALEF` :param text: Text to process :type text: str :param keep_hamza: True to preserve hamza and madda characters, by default True :type keep_hamza: bool, optional :returns: Normalized text :rtype: str .. rubric:: Examples .. code:: pycon >>> from maha.cleaners.functions import normalize_lam_alef >>> text = "السﻻم عليكم أحبتي، قالوا في صِفَةِ رَسُولِ الله يتَﻷلأ وَجْهُه" >>> normalize_lam_alef(text) 'السلام عليكم أحبتي، قالوا في صِفَةِ رَسُولِ الله يتَلألأ وَجْهُه' .. code:: pycon >>> from maha.cleaners.functions import normalize_lam_alef >>> text = "اﻵن يا أصحابي" >>> normalize_lam_alef(text, keep_hamza=False) 'الان يا أصحابي' .. py:function:: normalize_small_alef(text, keep_madda = True, normalize_end = False) Normalize :data:`~.ALEF_SUPERSCRIPT` to :data:`~.ALEF`. If ``keep_madda`` is True and :data:`~.ALEF_SUPERSCRIPT` is followed by :data:`HAMZA_ABOVE`, then normalize to :data:`~.ALEF_MADDA_ABOVE` :param text: Text to process :type text: str :param keep_madda: True to preserve madda character, by default True :type keep_madda: bool, optional :param normalize_end: True to normalize :data:`~.ALEF_SUPERSCRIPT` that appear at the end of a word, by default False :type normalize_end: bool, optional :returns: Normalized text :rtype: str .. rubric:: Example .. code:: pycon >>> from maha.cleaners.functions import normalize_small_alef >>> text = "وَٱلصَّٰٓفَّٰتِ صَفّٗا" >>> normalize_small_alef(text) 'وَٱلصَّآفَّاتِ صَفّٗا'