sourced.ml.core.models.id_splitter

Module Contents

class sourced.ml.core.models.id_splitter.IdentifierSplitterBiLSTM(**kwargs)[source]

Bases:modelforge.Model

Bidirectional LSTM Model. Splits identifiers without need for a conventional pattern. Reference: https://arxiv.org/abs/1805.11651

NAME = id_splitter_bilstm[source]
VENDOR = source{d}[source]
DESCRIPTION = Weights of the BiLSTM network to split source code identifiers.[source]
LICENSE[source]
DEFAULT_MAXLEN = 40[source]
DEFAULT_PADDING = post[source]
DEFAULT_MAPPING[source]
DEFAULT_BATCH_SIZE = 4096[source]
model[source]

Return the wrapped keras model.

batch_size[source]

Return the batch size used to run the model.

construct(self, model: keras.models.Model, maxlen: int = DEFAULT_MAXLEN, padding: str = DEFAULT_PADDING, mapping: Dict[str, int] = DEFAULT_MAPPING, batch_size: int = DEFAULT_BATCH_SIZE)[source]

Construct IdentifierSplitterBiLSTM model.

Parameters:
  • model – keras model used for identifier splitting.
  • maxlen – Maximum length of input identifers.
  • padding – Where to pad the identifiers of length < maxlen. Can be “left” or “right”.
  • mapping – Mapping of characters to integers.
  • batch_size – Batch size of input data fed to the model.
Returns:

BiLSTM based source code identifier splitter.

dump(self)[source]
prepare_input(self, identifiers: Sequence[str])[source]

Prepare input by converting a sequence of identifiers to the corresponding ascii code 2D-array and the list of lowercase cleaned identifiers.

load_model_file(self, path: str)[source]

Load a compatible Keras model file. Used for compatibility.

split(self, identifiers: Sequence[str])[source]

Split identifiers in a list, using the model.