keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32')
Transform a list of nb_samples sequences (lists of scalars) into a 2D numpy array of shape (nb_samples, nb_timesteps). nb_timesteps is either the maxlen argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than nb_timesteps are padded with zeros at the end.
Return: 2D numpy array of shape (nb_samples, nb_timesteps).
Arguments:
keras.preprocessing.sequence.skipgrams(sequence, vocabulary_size,
window_size=4, negative_samples=1., shuffle=True,
categorical=False, sampling_table=None)
Transforms a sequence of word indexes (list of int) into couples of the form:
Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space
Return: tuple (couples, labels).
couples is a list of 2-elements lists of int: [word_index, other_word_index]. labels is a list of 0 and 1, where 1 indicates that other_word_index was found in the same window as word_index, and 0 indicates that other_word_index was random.Arguments:
(vocabulary_size,) where sampling_table[i] is the probability of sampling the word with index i (assumed to be i-th most common word in the dataset).keras.preprocessing.sequence.make_sampling_table(size, sampling_factor=1e-5)
Used for generating the sampling_table argument for skipgrams. sampling_table[i] is the probability of sampling the word i-th most common word in a dataset (more common words should be sampled less frequently, for balance).
Return: numpy array of shape (size,).
Arguments: