:mod:`sourced.ml.core.extractors` ================================= .. py:module:: sourced.ml.core.extractors Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 bags_extractor/index.rst children/index.rst graphlets/index.rst helpers/index.rst id_sequence/index.rst identifier_distance/index.rst identifiers/index.rst literals/index.rst uast_random_walk/index.rst uast_seq/index.rst Package Contents ---------------- .. function:: get_names_from_kwargs(f) .. function:: register_extractor(cls) .. function:: filter_kwargs(kwargs, func) .. function:: create_extractors_from_args(args: argparse.Namespace) .. py:class:: Extractor Bases::class:`sourced.ml.core.utils.pickleable_logger.PickleableLogger` Converts a single UAST via `algorithm` to anything you need. It is a wrapper to use in `Uast2Features` Transformer in a pipeline. .. attribute:: NAME .. attribute:: ALGORITHM .. attribute:: OPTS .. classmethod:: get_kwargs_fromcmdline(cls, args) .. method:: extract(self, uast: bblfsh.Node) .. py:class:: BagsExtractor(docfreq_threshold=None, weight=None, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.Extractor` Converts a single UAST into the weighted set (dictionary), where elements are strings and the values are floats. The derived classes must implement uast_to_bag(). .. attribute:: DEFAULT_DOCFREQ_THRESHOLD :annotation: = 5 .. attribute:: NAMESPACE .. attribute:: OPTS .. attribute:: docfreq_threhold .. attribute:: ndocs .. method:: extract(self, uast) .. method:: uast_to_bag(self, uast) .. py:class:: RoleIdsExtractor Bases::class:`sourced.ml.core.extractors.bags_extractor.Extractor` .. attribute:: NAME :annotation: = roleids .. attribute:: ALGORITHM .. py:class:: IdentifiersBagExtractor(docfreq_threshold=None, split_stem=True, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.BagsExtractor` .. attribute:: NAME :annotation: = id .. attribute:: NAMESPACE :annotation: = i. .. attribute:: OPTS .. method:: uast_to_bag(self, uast) .. py:class:: LiteralsBagExtractor(docfreq_threshold=None, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.BagsExtractor` .. attribute:: NAME :annotation: = lit .. attribute:: NAMESPACE :annotation: = l. .. attribute:: OPTS .. method:: uast_to_bag(self, uast) .. py:class:: UastRandomWalkBagExtractor(docfreq_threshold=None, **kwargs) Bases::class:`sourced.ml.core.extractors.helpers.BagsExtractor` .. attribute:: NAME :annotation: = node2vec .. attribute:: NAMESPACE :annotation: = r. .. attribute:: OPTS .. method:: uast_to_bag(self, uast) .. py:class:: UastSeqBagExtractor(docfreq_threshold=None, **kwargs) Bases::class:`sourced.ml.core.extractors.helpers.BagsExtractor` .. attribute:: NAME :annotation: = uast2seq .. attribute:: NAMESPACE :annotation: = s. .. attribute:: OPTS .. method:: uast_to_bag(self, uast) .. py:class:: ChildrenBagExtractor(docfreq_threshold=None, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.BagsExtractor` Converts a UAST to the bag of pairs (internal type, quantized number of children). .. attribute:: NAME :annotation: = children .. attribute:: NAMESPACE :annotation: = c. .. attribute:: OPTS .. attribute:: npartitions .. attribute:: levels .. method:: extract(self, uast) .. method:: quantize(self, frequencies: Iterable[Tuple[str, Iterable[Tuple[int, int]]]]) .. py:class:: GraphletBagExtractor(docfreq_threshold=None, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.BagsExtractor` .. attribute:: NAME :annotation: = graphlet .. attribute:: NAMESPACE :annotation: = g. .. attribute:: OPTS .. method:: uast_to_bag(self, uast) .. py:class:: IdentifierDistance(split_stem=False, type='tree', max_distance=DEFAULT_MAX_DISTANCE, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.BagsExtractor` Extractor wrapper for Uast2IdTreeDistance and Uast2IdLineDistance algorithm. Note that this is an unusual BagsExtractor since it returns iterable instead of bag. The class did not wrap with @register_extractor because it does not produce bags as others do. So nobody outside code will see it or use it directly. For the same reason we a free to override NAMESPACE, NAME, OPTS fields with any value we want. TODO(zurk): Split BagsExtractor into two clases: Extractor and BagsExtractor(Extractor), re-inherit this class from Extractor, delete explanations from docstring. .. py:class:: DistanceType .. attribute:: Tree :annotation: = tree .. attribute:: Line :annotation: = line .. attribute:: All .. staticmethod:: resolve(type) .. attribute:: NAMESPACE :annotation: = .. attribute:: NAME :annotation: = Identifier distance .. attribute:: OPTS .. attribute:: DEFAULT_MAX_DISTANCE .. method:: extract(self, uast: bblfsh.Node) .. py:class:: IdSequenceExtractor(split_stem=False, **kwargs) Bases::class:`sourced.ml.core.extractors.bags_extractor.BagsExtractor` Extractor wrapper for Uast2RoleIdPairs algorithm. Note that this is unusual BagsExtractor since it returns iterable instead of bag. The class did not wrap with @register_extractor because it does not produce bags as others do. So nobody outside code will see it or use it directly. For the same reason we a free to override NAMESPACE, NAME, OPTS fields with any value we want. TODO(zurk): Split BagsExtractor into two clases: Extractor and BagsExtractor(Extractor), re-inherit this class from Extractor, delete explanations from docstring. .. attribute:: NAMESPACE :annotation: = .. attribute:: NAME :annotation: = id sequence .. attribute:: OPTS .. method:: extract(self, uast: bblfsh.Node)