sourced.ml.core.algorithms

Package Contents

sourced.ml.core.algorithms.log_tf_log_idf(tf, df, ndocs)[source]
class sourced.ml.core.algorithms.UastIds2Bag(token2index=None, token_parser=None)[source]

Bases:sourced.ml.core.algorithms.uast_ids_to_bag.UastTokens2Bag

Converts a UAST to a bag-of-identifiers.

XPATH = //*[@roleIdentifier]
sourced.ml.core.algorithms.uast2sequence(root)[source]
class sourced.ml.core.algorithms.UastRandomWalk2Bag(p_explore_neighborhood=0.79, q_leave_neighborhood=0.82, n_walks=2, n_steps=10, stride=1, seq_len=(2, 3), seed=42)[source]

Bases:sourced.ml.core.algorithms.uast_struct_to_bag.Uast2StructBagBase

class sourced.ml.core.algorithms.UastSeq2Bag(stride=1, seq_len=(3, 4), node2index=None)[source]

Bases:sourced.ml.core.algorithms.uast_struct_to_bag.Uast2StructBagBase

DFS traversal + preserves the order of node children.

class sourced.ml.core.algorithms.Uast2QuantizedChildren(npartitions: int = 20)[source]

Bases:sourced.ml.core.algorithms.uast_to_bag.Uast2BagThroughSingleScan

Converts a UAST to a bag of children counts.

node2key(self, node: bblfsh.Node)

Return the key for a given Node.

Parameters:node – a node in UAST.
Returns:The string which consists of the internal type of the node and its number of children.
quantize(self, frequencies: Iterable[Tuple[str, Iterable[Tuple[int, int]]]])
quantize_unwrapped(self, children_freq: Iterable[Tuple[int, int]])

Builds the quantization partition P that is a vector of length nb_partitions whose entries are in strictly ascending order. Quantization of x is defined as:

0 if x <= P[0] m if P[m-1] < x <= P[m] n if P[n] <= x
Parameters:children_freq – distribution of the number of children.
Returns:The array with quantization levels.
class sourced.ml.core.algorithms.Uast2GraphletBag[source]

Bases:sourced.ml.core.algorithms.uast_ids_to_bag.Uast2BagBase

Converts a UAST to a bag of graphlets. The graphlet of a UAST node is composed from the node itself, its parent and its children. Each node is represented by the internal role string.

uast2graphlets(self, uast)
Parameters:uast – The UAST root node.
Generate:The nodes which compose the UAST. :class: ‘Node’ is used to access the nodes of the graphlets.
node2key(self, node)

Builds the string joining internal types of all the nodes in the node’s graphlet in the following order: parent_node_child1_child2_child3. The children are sorted by alphabetic order. str format is required for BagsExtractor.

Parameters:node – a node of UAST
Returns:The string key of node
class sourced.ml.core.algorithms.Uast2RoleIdPairs(token2index=None, token_parser=None)[source]

Bases:sourced.ml.core.algorithms.uast_ids_to_bag.UastIds2Bag

Converts a UAST to a list of pairs. Pair is identifier and role, where role is Node role where identifier was found.

__call__ is overridden here and returns list instead of bag-of-words (dist).

static merge_roles(roles: Iterable[int])
class sourced.ml.core.algorithms.Uast2IdLineDistance[source]

Bases:sourced.ml.core.algorithms.uast_id_distance.Uast2IdDistance

Converts a UAST to a list of identifiers pair and code line distance between where applicable.

__call__ is overridden here and return list instead of bag-of-words (dist).

distance(self, point1, point2)
class sourced.ml.core.algorithms.Uast2IdTreeDistance[source]

Bases:sourced.ml.core.algorithms.uast_id_distance.Uast2IdDistance

Converts a UAST to a list of identifiers pair and UAST tree distance between.

__call__ is overridden here and return list instead of bag-of-words (dist).

distance(self, point1, point2)
static calc_tree_distance(last_common_level, level1, level2)
class sourced.ml.core.algorithms.Uast2IdSequence[source]

Bases:sourced.ml.core.algorithms.uast_id_distance.Uast2IdLineDistance

Converts a UAST to a sorted sequence of identifiers. Identifiers are sorted by position in code. We do not change the order if positions are not present.

__call__ is overridden here and return list instead of bag-of-words (dist).

static concat(id_sequence: Iterable)