maha.processors.basic_processors#
All basic processors
Module Contents#
Classes#
For processing text input. |
|
For processing file input. |
- class TextProcessor(text)[source]#
Bases:
maha.processors.base_processor.BaseProcessorFor processing text input.
- Parameters
text (Union[List[str], str]) – A text or list of strings to process
- apply(self, fn)[source]#
Applies a function to each line
- Parameters
fn (Callable[[str], str]) – Function to apply
- filter(self, fn)[source]#
Keeps lines for which the input function is True
- Parameters
fn (Callable[[str], bool]) – Function to check
- get_lines(self, n_lines=100)[source]#
Returns a generator of list of strings with length of
n_lines- Parameters
n_lines (int) – Number of lines to yield, Defaults to 100
- Yields
List[str] – List of strings with length of
n_lines. The last list maybe of length less thann_lines.
- set_lines(self, text)[source]#
Overrides text
- Parameters
text (Union[List[str], str]) – New text or list of strings
- property text(self)[source]#
Returns the processed text joined by the newline separator
\n- Returns
processed text
- Return type
str
- classmethod from_text(cls, text, sep=None)[source]#
Creates a new processor from the given text. Separate the text by the input
separgument if provided.- Parameters
text (str) – Text to process
sep (str, optional) – Separator used to split the given text, by default None
- Returns
New text processor
- Return type
- class FileProcessor(path)[source]#
Bases:
TextProcessorFor processing file input.
Note
For large files (>100 MB), use
StreamFileProcessor.- Parameters
path (Union[str,
pathlib.Path]) – Path of the file to process.- Raises
FileNotFoundError – If the file doesn’t exist.
ValueError – If the file is empty.