sourced.ml.core.models.bow

Module Contents

class sourced.ml.core.models.bow.BOW[source]

Bases:modelforge.Model

Weighted bag of words model. Every word is correspond to an index and its matrix column. Bag is a word set from repository, file or anything else. Word is source code identifier or its part. This model depends on sourced.ml.models.DocumentFrequencies.

NAME = bow[source]
VENDOR = source{d}[source]
DESCRIPTION = Model that contains source code as weighted bag of words.[source]
LICENSE[source]
matrix[source]

Returns the bags as a sparse matrix. Rows are documents and columns are tokens weight.

documents[source]

The list of documents in the model.

tokens[source]

The list of tokens in the model.

construct(self, documents: List[str], tokens: List[str], matrix: sparse.spmatrix)[source]
dump(self)[source]
save(self, output: str, series: str, deps: Iterable = tuple(), create_missing_dirs: bool = True)[source]
convert_bow_to_vw(self, output: str)[source]
documents_index(self)[source]