Option 1 (adding an unknown word token) is how most people solve this problem.
Option 2 (deleting the unknown words) is a bad idea because it transforms the sentence in a way that is not consistent with how the LSTM was trained.
Another option that has recently been developed is to create a word embedding on-the-fly for each word using a convolutional neural network or a separate LSTM that processes the characters of each word one at a time. Using this technique your model will never encounter a word that it can't create an embedding for.
One simple technique that seems to work reasonably well for short texts (e.g., a sentence or a tweet) is to compute the vector for each word in the document, and then aggregate them using the coordinate-wise mean, min, or max.
Based on results in one recent paper, it seems that using the min and the max works reasonably well. It's not optimal, but it's simple and about as good or better as other simple techniques. In particular, if the vectors for the $n$ words in the document are $v^1,v^2,\dots,v^n \in \mathbb{R}^d$, then you compute $\min(v^1,\dots,v^n)$ and $\max(v^1,\dots,v^n)$. Here we're taking the coordinate-wise minimum, i.e., the minimum is a vector $u$ such that $u_i = \min(v^1_i, \dots, v^n_i)$, and similarly for the max.
The feature vector is the concatenation of these two vectors, so we obtain a feature vector in $\mathbb{R}^{2d}$. I don't know if this is better or worse than a bag-of-words representation, but for short documents I suspect it might perform better than bag-of-words, and it allows using pre-trained word embeddings.
TL;DR: Surprisingly, the concatenation of the min and max works reasonably well.
Reference:
Representation learning for very short texts using weighted word embedding aggregation. Cedric De Boom, Steven Van Canneyt, Thomas Demeester, Bart Dhoedt. Pattern Recognition Letters; arxiv:1607.00570. abstract, pdf. See especially Tables 1 and 2.
Credits: Thanks to @user115202 for bringing this paper to my attention.
Best Answer
Maybe "From Words to Paragraphs, Attempt 2: Clustering" section of this article help you. It's using word embeddings as inputs to the K-means clustering algorithm.