keras.preprocessing.text.text_to_word_sequence(text,
filters=base_filter(), lower=True, split=" ")
Split a sentence into a list of words.
Return: List of words (str).
Arguments:
keras.preprocessing.text.one_hot(text, n,
filters=base_filter(), lower=True, split=" ")
One-hot encode a text into a list of word indexes in a vocabulary of size n.
Return: List of integers in [1, n]. Each integer encodes a word (unicity non-guaranteed).
Arguments: Same as text_to_word_sequence
above.
keras.preprocessing.text.Tokenizer(nb_words=None, filters=base_filter(),
lower=True, split=" ")
Class for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).
Arguments: Same as text_to_word_sequence
above.
Methods:
fit_on_texts(texts):
texts_to_sequences(texts)
texts_to_sequences_generator(texts): generator version of the above.
texts_to_matrix(texts):
(len(texts), nb_words)
.fit_on_sequences(sequences):
sequences_to_matrix(sequences):
(len(sequences), nb_words)
.Attributes: