sourced.ml.core.extractors¶
Submodules¶
sourced.ml.core.extractors.bags_extractorsourced.ml.core.extractors.childrensourced.ml.core.extractors.graphletssourced.ml.core.extractors.helperssourced.ml.core.extractors.id_sequencesourced.ml.core.extractors.identifier_distancesourced.ml.core.extractors.identifierssourced.ml.core.extractors.literalssourced.ml.core.extractors.uast_random_walksourced.ml.core.extractors.uast_seq
Package Contents¶
-
class
sourced.ml.core.extractors.Extractor[source]¶ Bases:
sourced.ml.core.utils.pickleable_logger.PickleableLoggerConverts a single UAST via algorithm to anything you need. It is a wrapper to use in Uast2Features Transformer in a pipeline.
-
NAME¶
-
ALGORITHM¶
-
OPTS¶
-
classmethod
get_kwargs_fromcmdline(cls, args)¶
-
extract(self, uast: bblfsh.Node)¶
-
-
class
sourced.ml.core.extractors.BagsExtractor(docfreq_threshold=None, weight=None, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.ExtractorConverts a single UAST into the weighted set (dictionary), where elements are strings and the values are floats. The derived classes must implement uast_to_bag().
-
DEFAULT_DOCFREQ_THRESHOLD= 5¶
-
NAMESPACE¶
-
OPTS¶
-
docfreq_threhold¶
-
ndocs¶
-
extract(self, uast)¶
-
uast_to_bag(self, uast)¶
-
-
class
sourced.ml.core.extractors.RoleIdsExtractor[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.Extractor-
NAME= roleids¶
-
ALGORITHM¶
-
-
class
sourced.ml.core.extractors.IdentifiersBagExtractor(docfreq_threshold=None, split_stem=True, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.BagsExtractor-
NAME= id¶
-
NAMESPACE= i.¶
-
OPTS¶
-
uast_to_bag(self, uast)¶
-
-
class
sourced.ml.core.extractors.LiteralsBagExtractor(docfreq_threshold=None, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.BagsExtractor-
NAME= lit¶
-
NAMESPACE= l.¶
-
OPTS¶
-
uast_to_bag(self, uast)¶
-
-
class
sourced.ml.core.extractors.UastRandomWalkBagExtractor(docfreq_threshold=None, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.helpers.BagsExtractor-
NAME= node2vec¶
-
NAMESPACE= r.¶
-
OPTS¶
-
uast_to_bag(self, uast)¶
-
-
class
sourced.ml.core.extractors.UastSeqBagExtractor(docfreq_threshold=None, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.helpers.BagsExtractor-
NAME= uast2seq¶
-
NAMESPACE= s.¶
-
OPTS¶
-
uast_to_bag(self, uast)¶
-
-
class
sourced.ml.core.extractors.ChildrenBagExtractor(docfreq_threshold=None, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.BagsExtractorConverts a UAST to the bag of pairs (internal type, quantized number of children).
-
NAME= children¶
-
NAMESPACE= c.¶
-
OPTS¶
-
npartitions¶
-
levels¶
-
extract(self, uast)¶
-
quantize(self, frequencies: Iterable[Tuple[str, Iterable[Tuple[int, int]]]])¶
-
-
class
sourced.ml.core.extractors.GraphletBagExtractor(docfreq_threshold=None, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.BagsExtractor-
NAME= graphlet¶
-
NAMESPACE= g.¶
-
OPTS¶
-
uast_to_bag(self, uast)¶
-
-
class
sourced.ml.core.extractors.IdentifierDistance(split_stem=False, type='tree', max_distance=DEFAULT_MAX_DISTANCE, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.BagsExtractorExtractor wrapper for Uast2IdTreeDistance and Uast2IdLineDistance algorithm. Note that this is an unusual BagsExtractor since it returns iterable instead of bag.
The class did not wrap with @register_extractor because it does not produce bags as others do. So nobody outside code will see it or use it directly. For the same reason we a free to override NAMESPACE, NAME, OPTS fields with any value we want.
TODO(zurk): Split BagsExtractor into two clases: Extractor and BagsExtractor(Extractor), re-inherit this class from Extractor, delete explanations from docstring.
-
NAMESPACE=¶
-
NAME= Identifier distance¶
-
OPTS¶
-
DEFAULT_MAX_DISTANCE¶
-
extract(self, uast: bblfsh.Node)¶
-
-
class
sourced.ml.core.extractors.IdSequenceExtractor(split_stem=False, **kwargs)[source]¶ Bases:
sourced.ml.core.extractors.bags_extractor.BagsExtractorExtractor wrapper for Uast2RoleIdPairs algorithm. Note that this is unusual BagsExtractor since it returns iterable instead of bag.
The class did not wrap with @register_extractor because it does not produce bags as others do. So nobody outside code will see it or use it directly. For the same reason we a free to override NAMESPACE, NAME, OPTS fields with any value we want.
TODO(zurk): Split BagsExtractor into two clases: Extractor and BagsExtractor(Extractor), re-inherit this class from Extractor, delete explanations from docstring.
-
NAMESPACE=¶
-
NAME= id sequence¶
-
OPTS¶
-
extract(self, uast: bblfsh.Node)¶
-