sourced.ml.core.extractors

Package Contents

sourced.ml.core.extractors.get_names_from_kwargs(f)[source]
sourced.ml.core.extractors.register_extractor(cls)[source]
sourced.ml.core.extractors.filter_kwargs(kwargs, func)[source]
sourced.ml.core.extractors.create_extractors_from_args(args: argparse.Namespace)[source]
class sourced.ml.core.extractors.Extractor[source]

Bases:sourced.ml.core.utils.pickleable_logger.PickleableLogger

Converts a single UAST via algorithm to anything you need. It is a wrapper to use in Uast2Features Transformer in a pipeline.

NAME
ALGORITHM
OPTS
classmethod get_kwargs_fromcmdline(cls, args)
extract(self, uast: bblfsh.Node)
class sourced.ml.core.extractors.BagsExtractor(docfreq_threshold=None, weight=None, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.Extractor

Converts a single UAST into the weighted set (dictionary), where elements are strings and the values are floats. The derived classes must implement uast_to_bag().

DEFAULT_DOCFREQ_THRESHOLD = 5
NAMESPACE
OPTS
docfreq_threhold
ndocs
extract(self, uast)
uast_to_bag(self, uast)
class sourced.ml.core.extractors.RoleIdsExtractor[source]

Bases:sourced.ml.core.extractors.bags_extractor.Extractor

NAME = roleids
ALGORITHM
class sourced.ml.core.extractors.IdentifiersBagExtractor(docfreq_threshold=None, split_stem=True, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.BagsExtractor

NAME = id
NAMESPACE = i.
OPTS
uast_to_bag(self, uast)
class sourced.ml.core.extractors.LiteralsBagExtractor(docfreq_threshold=None, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.BagsExtractor

NAME = lit
NAMESPACE = l.
OPTS
uast_to_bag(self, uast)
class sourced.ml.core.extractors.UastRandomWalkBagExtractor(docfreq_threshold=None, **kwargs)[source]

Bases:sourced.ml.core.extractors.helpers.BagsExtractor

NAME = node2vec
NAMESPACE = r.
OPTS
uast_to_bag(self, uast)
class sourced.ml.core.extractors.UastSeqBagExtractor(docfreq_threshold=None, **kwargs)[source]

Bases:sourced.ml.core.extractors.helpers.BagsExtractor

NAME = uast2seq
NAMESPACE = s.
OPTS
uast_to_bag(self, uast)
class sourced.ml.core.extractors.ChildrenBagExtractor(docfreq_threshold=None, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.BagsExtractor

Converts a UAST to the bag of pairs (internal type, quantized number of children).

NAME = children
NAMESPACE = c.
OPTS
npartitions
levels
extract(self, uast)
quantize(self, frequencies: Iterable[Tuple[str, Iterable[Tuple[int, int]]]])
class sourced.ml.core.extractors.GraphletBagExtractor(docfreq_threshold=None, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.BagsExtractor

NAME = graphlet
NAMESPACE = g.
OPTS
uast_to_bag(self, uast)
class sourced.ml.core.extractors.IdentifierDistance(split_stem=False, type='tree', max_distance=DEFAULT_MAX_DISTANCE, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.BagsExtractor

Extractor wrapper for Uast2IdTreeDistance and Uast2IdLineDistance algorithm. Note that this is an unusual BagsExtractor since it returns iterable instead of bag.

The class did not wrap with @register_extractor because it does not produce bags as others do. So nobody outside code will see it or use it directly. For the same reason we a free to override NAMESPACE, NAME, OPTS fields with any value we want.

TODO(zurk): Split BagsExtractor into two clases: Extractor and BagsExtractor(Extractor), re-inherit this class from Extractor, delete explanations from docstring.

class DistanceType
Tree = tree
Line = line
All
static resolve(type)
NAMESPACE =
NAME = Identifier distance
OPTS
DEFAULT_MAX_DISTANCE
extract(self, uast: bblfsh.Node)
class sourced.ml.core.extractors.IdSequenceExtractor(split_stem=False, **kwargs)[source]

Bases:sourced.ml.core.extractors.bags_extractor.BagsExtractor

Extractor wrapper for Uast2RoleIdPairs algorithm. Note that this is unusual BagsExtractor since it returns iterable instead of bag.

The class did not wrap with @register_extractor because it does not produce bags as others do. So nobody outside code will see it or use it directly. For the same reason we a free to override NAMESPACE, NAME, OPTS fields with any value we want.

TODO(zurk): Split BagsExtractor into two clases: Extractor and BagsExtractor(Extractor), re-inherit this class from Extractor, delete explanations from docstring.

NAMESPACE =
NAME = id sequence
OPTS
extract(self, uast: bblfsh.Node)