maha.cleaners.functions.keep_fn#
Functions that operate on a string and remove all but certain characters.
Module Contents#
Functions#
|
Keeps only certain characters in the given text and removes everything else. |
|
Keeps only Arabic letters |
|
Keeps only common Arabic characters |
Keeps only common Arabic characters |
|
Keeps only Arabic letters |
|
|
Keeps only the input strings |
- keep(text, arabic=False, english=False, arabic_letters=False, english_letters=False, english_small_letters=False, english_capital_letters=False, numbers=False, harakat=False, all_harakat=False, punctuations=False, arabic_numbers=False, english_numbers=False, arabic_punctuations=False, english_punctuations=False, use_space=True, custom_strings=None)[source]#
Keeps only certain characters in the given text and removes everything else.
To add a new parameter, make sure that its name is the same as the corresponding constant.
- Parameters
text (str) – Text to be processed
arabic (bool, optional) – Keep
ARABICcharacters, by default Falseenglish (bool, optional) – Keep
ENGLISHcharacters, by default Falsearabic_letters (bool, optional) – Keep
ARABIC_LETTERScharacters, by default Falseenglish_letters (bool, optional) – Keep
ENGLISH_LETTERScharacters, by default Falseenglish_small_letters (bool, optional) – Keep
ENGLISH_SMALL_LETTERScharacters, by default Falseenglish_capital_letters (bool, optional) – Keep
ENGLISH_CAPITAL_LETTERScharacters, by default Falsenumbers (bool, optional) – Keep
NUMBERScharacters, by default Falseharakat (bool, optional) – Keep
HARAKATcharacters, by default Falseall_harakat (bool, optional) – Keep
ALL_HARAKATcharacters, by default Falsepunctuations (bool, optional) – Keep
PUNCTUATIONScharacters, by default Falsearabic_numbers (bool, optional) – Keep
ARABIC_NUMBERScharacters, by default Falseenglish_numbers (bool, optional) – Keep
ENGLISH_NUMBERScharacters, by default Falsearabic_punctuations (bool, optional) – Keep
ARABIC_PUNCTUATIONScharacters, by default Falseenglish_punctuations (bool, optional) – Keep
ENGLISH_PUNCTUATIONScharacters, by default Falseuse_space (bool, optional) – False to not replace with space, check
keep_strings()for more information, by default Truecustom_strings (List[str], optional) – Include any other string(s), by default None
- Returns
Processed text
- Return type
str
- Raises
ValueError – If no argument is set to True
Example
>>> from maha.cleaners.functions import keep >>> text = "بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ" >>> keep(text, arabic_letters=True) 'بسم الله الرحمن الرحيم'
- keep_arabic_letters(text)[source]#
Keeps only Arabic letters
ARABIC_LETTERSin the given text.- Parameters
text (str) – Text to be processed
- Returns
Text contains Arabic letters only.
- Return type
str
Example
>>> from maha.cleaners.functions import keep_arabic_letters >>> text = " 1 يا أحلى mathematicians في العالم" >>> keep_arabic_letters(text) 'يا أحلى في العالم'
- keep_arabic_characters(text)[source]#
Keeps only common Arabic characters
ARABICin the given text.- Parameters
text (str) – Text to be processed
- Returns
Text contains the common Arabic characters only.
- Return type
str
Example
>>> from maha.cleaners.functions import keep_arabic_characters >>> text = "أَلمَانِيَا (بالألمانية: Deutschland) رسمِيّاً جُمهُورِيَّة أَلمَانِيَا الاِتِّحَاديَّة" >>> keep_arabic_characters(text) 'أَلمَانِيَا بالألمانية رسمِيّاً جُمهُورِيَّة أَلمَانِيَا الاِتِّحَاديَّة'
- keep_arabic_with_english_numbers(text)[source]#
Keeps only common Arabic characters
ARABICand English numbersENGLISH_NUMBERSin the given text.- Parameters
text (str) – Text to be processed
- Returns
Text contains the common Arabic characters and English numbers only.
- Return type
str
Example
>>> from maha.cleaners.functions import keep_arabic_with_english_numbers >>> text = "تتكون من 16 ولاية تُغطي مساحة 357,021 كيلومتر Deutschland" >>> keep_arabic_with_english_numbers(text) 'تتكون من 16 ولاية تُغطي مساحة 357 021 كيلومتر'
- keep_arabic_letters_with_harakat(text)[source]#
Keeps only Arabic letters
ARABIC_LETTERSand HARAKATHARAKATin the given text.- Parameters
text (str) – Text to be processed
- Returns
Text contains Arabic letters with harakat only.
- Return type
str
Example
>>> from maha.cleaners.functions import keep_arabic_letters_with_harakat >>> text = "إنّ في التّركِ قوة…" >>> keep_arabic_letters_with_harakat(text) 'إنّ في التّركِ قوة'
- keep_strings(text, strings, use_space=True)[source]#
Keeps only the input strings
stringsin the given texttextBy default, this works by replacing all strings except the input
stringswith a space, which means space is kept. This is to help separate texts when unwanted strings are present without spaces. For example, ‘end.start’ will be converted to ‘end start’ if English lettersENGLISH_LETTERSare passed tostrings. To disable this behavior, setuse_spaceto False.Note
Extra spaces (more than one space) are removed by default if
use_spaceis set to True.- Parameters
text (str) – Text to be processed
strings (Union[List[str], str]) – list of strings to keep
use_space (bool) – False to not replace with space, defaults to True
- Returns
Text that contains only the input strings.
- Return type
str
- Raises
ValueError – If no
stringsare provided
Example
>>> from maha.cleaners.functions import keep_strings >>> text = "لا حول ولا قوة إلا بالله" >>> keep_strings(text, "الله") 'الله'