keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32')
Transform a list of nb_samples sequences
(lists of scalars) into a 2D numpy array of shape (nb_samples, nb_timesteps)
. nb_timesteps
is either the maxlen
argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than nb_timesteps
are padded with zeros at the end.
Return: 2D numpy array of shape (nb_samples, nb_timesteps)
.
Arguments:
keras.preprocessing.sequence.skipgrams(sequence, vocabulary_size,
window_size=4, negative_samples=1., shuffle=True,
categorical=False, sampling_table=None)
Transforms a sequence of word indexes (list of int) into couples of the form:
Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space
Return: tuple (couples, labels)
.
couples
is a list of 2-elements lists of int: [word_index, other_word_index]
. labels
is a list of 0 and 1, where 1 indicates that other_word_index
was found in the same window as word_index
, and 0 indicates that other_word_index
was random.Arguments:
(vocabulary_size,)
where sampling_table[i]
is the probability of sampling the word with index i (assumed to be i-th most common word in the dataset).keras.preprocessing.sequence.make_sampling_table(size, sampling_factor=1e-5)
Used for generating the sampling_table
argument for skipgrams
. sampling_table[i]
is the probability of sampling the word i-th most common word in a dataset (more common words should be sampled less frequently, for balance).
Return: numpy array of shape (size,)
.
Arguments: