Solved – Keras neural network input shapes

deep learningword embeddings

In this example from the Keras repository, this convolutional neural network is trained:

>>> model.summary()
------------------------------------------
Initial input shape: (None, 5000)  -------
------------------------------------------
Layer (name)                  Output Shape                  Param #             
------------------------------------------
Embedding (embedding)         (None, 100, 100)              500000              
Dropout (dropout)             (None, 100, 100)              0                   
Convolution1D (convolution1d) (None, 98, 250)               75250               
MaxPooling1D (maxpooling1d)   (None, 49, 250)               0                   
Flatten (flatten)             (None, 12250)                 0                   
Dense (dense)                 (None, 250)                   3062750             
Dropout (dropout)             (None, 250)                   0                   
Activation (activation)       (None, 250)                   0                   
Dense (dense)                 (None, 1)                     251                 
Activation (activation)       (None, 1)                     0                   
------------------------------------------
Total params: 3638251
------------------------------------------

The imdb dataset contains reviews, each composed of different words. So given a 5000-word vocabulary, according to the Initial input shape the first embedding layer should get a vector of 5000 counters for each review. Each counter i saying how many times the i-th word appears, I suppose.
However when I dump any set's input shape, I get:

>>> X_train.shape
(20000, 100)

Where 20000 is the number of the reviews (let's ignore it here), and 100 is the number of words. From there the embedding layer maps it to

(None, 100, 100)

as seen in the column Output shape. The first 100 is the number of words in the review, and the second 100 is a mapping of each word to a 100-dimension space.

I cannot understand the discrepancy between Initial input shape and X.shape where X is a training or test set. What is happening here?

Best Answer

From what I think, your X datasets are 20K sequences of 100 "words" each, where each word is a string. Embedding layer is 1st converting it to one-hot i.e. Initial input shape and then to the embedding dimension.