:mod:`sourced.ml.core.algorithms` ================================= .. py:module:: sourced.ml.core.algorithms Subpackages ----------- .. toctree:: :titlesonly: :maxdepth: 3 id_splitter/index.rst Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 id_embedding/index.rst swivel/index.rst tf_idf/index.rst token_parser/index.rst uast_id_distance/index.rst uast_ids_to_bag/index.rst uast_inttypes_to_graphlets/index.rst uast_inttypes_to_nodes/index.rst uast_struct_to_bag/index.rst uast_to_bag/index.rst uast_to_id_sequence/index.rst uast_to_role_id_pairs/index.rst Package Contents ---------------- .. function:: log_tf_log_idf(tf, df, ndocs) .. py:class:: UastIds2Bag(token2index=None, token_parser=None) Bases::class:`sourced.ml.core.algorithms.uast_ids_to_bag.UastTokens2Bag` Converts a UAST to a bag-of-identifiers. .. attribute:: XPATH :annotation: = //*[@roleIdentifier] .. function:: uast2sequence(root) .. py:class:: UastRandomWalk2Bag(p_explore_neighborhood=0.79, q_leave_neighborhood=0.82, n_walks=2, n_steps=10, stride=1, seq_len=(2, 3), seed=42) Bases::class:`sourced.ml.core.algorithms.uast_struct_to_bag.Uast2StructBagBase` .. py:class:: UastSeq2Bag(stride=1, seq_len=(3, 4), node2index=None) Bases::class:`sourced.ml.core.algorithms.uast_struct_to_bag.Uast2StructBagBase` DFS traversal + preserves the order of node children. .. py:class:: Uast2QuantizedChildren(npartitions: int = 20) Bases::class:`sourced.ml.core.algorithms.uast_to_bag.Uast2BagThroughSingleScan` Converts a UAST to a bag of children counts. .. method:: node2key(self, node: bblfsh.Node) Return the key for a given Node. :param node: a node in UAST. :return: The string which consists of the internal type of the node and its number of children. .. method:: quantize(self, frequencies: Iterable[Tuple[str, Iterable[Tuple[int, int]]]]) .. method:: quantize_unwrapped(self, children_freq: Iterable[Tuple[int, int]]) Builds the quantization partition P that is a vector of length nb_partitions whose entries are in strictly ascending order. Quantization of x is defined as: 0 if x <= P[0] m if P[m-1] < x <= P[m] n if P[n] <= x :param children_freq: distribution of the number of children. :return: The array with quantization levels. .. py:class:: Uast2GraphletBag Bases::class:`sourced.ml.core.algorithms.uast_ids_to_bag.Uast2BagBase` Converts a UAST to a bag of graphlets. The graphlet of a UAST node is composed from the node itself, its parent and its children. Each node is represented by the internal role string. .. method:: uast2graphlets(self, uast) :param uast: The UAST root node. :generate: The nodes which compose the UAST. :class: 'Node' is used to access the nodes of the graphlets. .. method:: node2key(self, node) Builds the string joining internal types of all the nodes in the node's graphlet in the following order: parent_node_child1_child2_child3. The children are sorted by alphabetic order. str format is required for BagsExtractor. :param node: a node of UAST :return: The string key of node .. py:class:: Uast2RoleIdPairs(token2index=None, token_parser=None) Bases::class:`sourced.ml.core.algorithms.uast_ids_to_bag.UastIds2Bag` Converts a UAST to a list of pairs. Pair is identifier and role, where role is Node role where identifier was found. __call__ is overridden here and returns list instead of bag-of-words (dist). .. staticmethod:: merge_roles(roles: Iterable[int]) .. py:class:: Uast2IdLineDistance Bases::class:`sourced.ml.core.algorithms.uast_id_distance.Uast2IdDistance` Converts a UAST to a list of identifiers pair and code line distance between where applicable. __call__ is overridden here and return list instead of bag-of-words (dist). .. method:: distance(self, point1, point2) .. py:class:: Uast2IdTreeDistance Bases::class:`sourced.ml.core.algorithms.uast_id_distance.Uast2IdDistance` Converts a UAST to a list of identifiers pair and UAST tree distance between. __call__ is overridden here and return list instead of bag-of-words (dist). .. method:: distance(self, point1, point2) .. staticmethod:: calc_tree_distance(last_common_level, level1, level2) .. py:class:: Uast2IdSequence Bases::class:`sourced.ml.core.algorithms.uast_id_distance.Uast2IdLineDistance` Converts a UAST to a sorted sequence of identifiers. Identifiers are sorted by position in code. We do not change the order if positions are not present. __call__ is overridden here and return list instead of bag-of-words (dist). .. staticmethod:: concat(id_sequence: Iterable)