maha.processors.stream_processors#
Module Contents#
Classes#
For processing a stream of text input. |
|
For processing file stream input. |
- class StreamTextProcessor(lines)[source]#
Bases:
maha.processors.base_processor.BaseProcessorFor processing a stream of text input.
- Parameters
lines (Iterable[str]) – A an iterable of strings to process
- apply(self, fn)[source]#
Applies a function to each line
- Parameters
fn (Callable[[str], str]) – Function to apply
- filter(self, fn)[source]#
Keeps lines for which the input function is True
- Parameters
fn (Callable[[str], bool]) – Function to check
- get_lines(self, n_lines=100)[source]#
Returns a generator of list of strings with length of
n_lines- Parameters
n_lines (int) – Number of lines to yield, Defaults to 100
- Yields
List[str] – List of strings with length of
n_lines. The last list maybe of length less thann_lines.
- class StreamFileProcessor(path, encoding='utf8')[source]#
Bases:
StreamTextProcessorFor processing file stream input.
- Parameters
path (Union[str,
pathlib.Path]) – Path of the file to process.encoding (str) – File encoding.
- Raises
FileNotFoundError – If the file doesn’t exist.
- get_lines(self, n_lines=100)[source]#
Returns a generator of list of strings with length of
n_lines- Parameters
n_lines (int) – Number of lines to yield, Defaults to 100
- Yields
List[str] – List of strings with length of
n_lines. The last list maybe of length less thann_lines.
- process_and_save(self, path, n_lines=100, override=False)[source]#
Process the input file and save the result in the given path
- Parameters
path (Union[str,
pathlib.Path]) – Path to save the filen_lines (int, optional) – Number of lines to process at a time, by default 100
override (bool, optional) – True to override the file if exists, by default False
- Raises
FileExistsError – If the file exists