maha.processors.stream_processors#

Module Contents#

Classes#

StreamTextProcessor

For processing a stream of text input.

StreamFileProcessor

For processing file stream input.

class StreamTextProcessor(lines)[source]#

Bases: maha.processors.base_processor.BaseProcessor

For processing a stream of text input.

Parameters

lines (Iterable[str]) – A an iterable of strings to process

apply(self, fn)[source]#

Applies a function to each line

Parameters

fn (Callable[[str], str]) – Function to apply

filter(self, fn)[source]#

Keeps lines for which the input function is True

Parameters

fn (Callable[[str], bool]) – Function to check

get_lines(self, n_lines=100)[source]#

Returns a generator of list of strings with length of n_lines

Parameters

n_lines (int) – Number of lines to yield, Defaults to 100

Yields

List[str] – List of strings with length of n_lines. The last list maybe of length less than n_lines.

process(self, n_lines=100)[source]#

Applies all functions in sequence to the given iterable

Parameters

n_lines (int, optional) – Number of lines to process at a time, by default 100

Yields

List[str] – A list of processed text, it can be empty.

Raises

ValueError – If no functions were selected.

apply_functions(self, text)[source]#

Applies all functions in sequence to a given list of strings

Parameters

text (List[str]) – List of strings to process

class StreamFileProcessor(path, encoding='utf8')[source]#

Bases: StreamTextProcessor

For processing file stream input.

Parameters
  • path (Union[str, pathlib.Path]) – Path of the file to process.

  • encoding (str) – File encoding.

Raises

FileNotFoundError – If the file doesn’t exist.

get_lines(self, n_lines=100)[source]#

Returns a generator of list of strings with length of n_lines

Parameters

n_lines (int) – Number of lines to yield, Defaults to 100

Yields

List[str] – List of strings with length of n_lines. The last list maybe of length less than n_lines.

process_and_save(self, path, n_lines=100, override=False)[source]#

Process the input file and save the result in the given path

Parameters
  • path (Union[str, pathlib.Path]) – Path to save the file

  • n_lines (int, optional) – Number of lines to process at a time, by default 100

  • override (bool, optional) – True to override the file if exists, by default False

Raises

FileExistsError – If the file exists