:mod:`sourced.ml.core.models.bow` ================================= .. py:module:: sourced.ml.core.models.bow Module Contents --------------- .. py:class:: BOW Bases::class:`modelforge.Model` Weighted bag of words model. Every word is correspond to an index and its matrix column. Bag is a word set from repository, file or anything else. Word is source code identifier or its part. This model depends on :class:`sourced.ml.models.DocumentFrequencies`. .. attribute:: NAME :annotation: = bow .. attribute:: VENDOR :annotation: = source{d} .. attribute:: DESCRIPTION :annotation: = Model that contains source code as weighted bag of words. .. attribute:: LICENSE .. attribute:: matrix Returns the bags as a sparse matrix. Rows are documents and columns are tokens weight. .. attribute:: documents The list of documents in the model. .. attribute:: tokens The list of tokens in the model. .. method:: construct(self, documents: List[str], tokens: List[str], matrix: sparse.spmatrix) .. method:: dump(self) .. method:: save(self, output: str, series: str, deps: Iterable = tuple(), create_missing_dirs: bool = True) .. method:: convert_bow_to_vw(self, output: str) .. method:: documents_index(self)