`sourced.ml.core.algorithms.token_parser`¶

Module Contents¶

class sourced.ml.core.algorithms.token_parser.TokenStyle[source]¶

Bases:enum.Enum

Metadata that should allow to reconstruct initial identifier from a list of tokens.

DELIMITER = 1[source]¶

TOKEN_UPPER = 2[source]¶

TOKEN_LOWER = 3[source]¶

TOKEN_CAPITALIZED = 4[source]¶

class sourced.ml.core.algorithms.token_parser.TokenParser(stem_threshold=STEM_THRESHOLD, max_token_length=MAX_TOKEN_LENGTH, min_split_length=MIN_SPLIT_LENGTH, single_shot=False, save_token_style=False, attach_upper=True, use_nn=False, nn_model=None)[source]¶

Common utilities for splitting and stemming tokens.

NAME_BREAKUP_RE[source]¶

NAME_BREAKUP_KEEP_DELIMITERS_RE[source]¶

STEM_THRESHOLD = 6[source]¶

MAX_TOKEN_LENGTH = 256[source]¶

MIN_SPLIT_LENGTH = 3[source]¶

use_nn[source]¶

stem_threshold[source]¶

max_token_length[source]¶

min_split_length[source]¶

process_token(self, token)[source]¶

stem(self, word)[source]¶

split(self, token: str)[source]¶: Splits a single identifier.

split_batch(self, tokens: [str])[source]¶: Splits a batch of identifiers.

static reconstruct(tokens)[source]¶

class sourced.ml.core.algorithms.token_parser.NoopTokenParser[source]¶

One can use this class one does not want to do any parsing.

process_token(self, token)[source]¶

`sourced.ml.core.algorithms.token_parser`¶

Module Contents¶

sourced.ml.core

Navigation

Related Topics

sourced.ml.core.algorithms.token_parser¶

Module Contents¶

`sourced.ml.core.algorithms.token_parser`¶