sourced.ml.core.models.df¶
Module Contents¶
-
class
sourced.ml.core.models.df.DocumentFrequencies[source]¶ Bases:
modelforge.ModelDocument frequencies - number of times a source code identifier appeared in different repositories. Each repository counts only once.
-
construct(self, docs: int, tokfreqs: Union[Iterable[Dict[str, int]], Dict[str, int]])[source]¶ Initializes this model.
Parameters: - docs – The number of documents.
- tokfreqs – The dictionary of token -> frequency or the iterable collection of such dictionaries.
Returns: self
-
prune(self, threshold: int)[source]¶ Removes tokens which occur less than threshold times. The operation happens not in-place - a new model is returned. :param threshold: Minimum number of occurrences. :return: The new model if the current one had to be changed, otherwise self.
-
greatest(self, max_size: int)[source]¶ Truncates the model to most frequent max_size tokens. The operation happens not in-place - a new model is returned. :param max_size: The maximum vocabulary size. :return: The new model if the current one had to be changed, otherwise self.
-