At least in the social sciences you often have panel data that has large N and small T asymptotics, meaning that you observe each entity for a relatively short period of time. This is why applied work with panel data is often somewhat less concerned with the time series component of the data.
Nevertheless time-series elements are still important in the treatment of panel data. For instance, the degree of auto-correlation determines whether fixed effects or first differences is more efficient. In difference in differences proper treatment of the standard errors to account for autocorrelation is important for correct inference (see Bertrand et al., 2004). Dynamic panels using estimators for small N, large T asymptotics are also available, you often find such data in macroeconomics. There you may run into known time-series issues like panel non-stationarity.
An excellent treatment of these topics is provided in Wooldridge (2010) "Econometric Analysis of Cross Section and Panel Data".
You are right, fixed effect and first differencing are inconsistent with substantial downwards bias in small $T$.
The standard approach for a dynamic model and an unobserved fixed effect is to remove the fixed effect by first differencing and then finding instruments for the transformed regressors. All this assumes no serial correlation of the errors. If this is not the case, the parameters in your model are not identified and cannot be consistently estimated.
For your model, we get:
$s_{it} - s_{it-1} = \beta_0 s_{it-1} + \beta_1 ( s_{it-1}^2 + \text{ cross terms } ) + \epsilon_{it}-\epsilon_{it-1}$
As it is, both regressors must correlate with the error term. The only valid instruments will be $s_{it-2}$ or further back in time. Of course, instruments have to be good predictors of the regressors as well, otherwise you can have large biases ("weak instruments").
In theory you could use many of the valid lags as instruments (or in GMM terminology, moment conditions) as you want and there are ways of cleverly doing that using GMM estimation that do not make your $T$ smaller than it already is (1 observation is lost by first differencing alone)$.
References for these approaches would be the Arellano-Bond estimator and the Blundell-Bond estimator.
Best Answer
They deal with estimating different parameters but indeed share common features:
Nickell (Econometrica 1981) bias:
The time demeaning operation of fixed effects in a dynamic panel data model $$ y_{it}=\alpha_i+\beta y_{it-1}+\epsilon_{it} $$ leads to a transformed regression model $$ y_{it}-y_{i\cdot}=\beta (y_{it-1}-y_{i\cdot-1})+(\epsilon_{it}-\epsilon_{i\cdot}) $$ where dots indicate time averages. Here, error terms $(\epsilon_{it}-\epsilon_{i\cdot})$ and regressors $(y_{it-1}-y_{i\cdot-1})$ are correlated even as $N\to\infty$, where $N$ is the number of units in the panel. This can be shown formally, but essentially follows from the observation that $y_{i\cdot}$ contains future $y_{it}$ which are generated by past $y_{it}$ which, in turn, are generated by past $\epsilon_{it}$ which are contained in $\epsilon_{i\cdot})$
Hence, even as $N\to\infty$, the FE estimator will not consistently estimate $\beta$.
Incidental parameter problem:
The classical Neyman and Scott (Econometrica 1948) case is an an example that MLEs need not be consistent. Consider a random sample of size $N\equiv nr$, $$X_{11},\ldots,X_{1r},X_{21},\ldots,X_{2r},\ldots,X_{nr},$$ where we have $n$ subsamples of size $r$, $X_{\alpha 1},\ldots,X_{\alpha r}$, $\alpha=1,\ldots,n$ which are distributed as $N(\theta_\alpha,\sigma^2)$. Hence, each subsample has a different mean $\theta_\alpha$, but a common variance $\sigma^2$.
It can be shown that the MLE for $\sigma^2$ is given by $$ \hat{\sigma}^2=\frac{1}{rn}\sum_{\alpha=1}^n\sum_{j=1}^r(X_{\alpha j}-X_{\alpha \cdot})^2 $$ One may show that $$\hat{\sigma}^2\to_pE(S_\alpha^2)=\frac{r-1}{r}\sigma^2\neq\sigma^2$$ Hence, the MLE is not consistent as $n\to\infty$.
So they are related through the fact that both FE and $\hat{\sigma}^2$ inconsistent estimators, that however both are consistent as the "other" dimension also goes to infinity - $r$ in the incidental parameter problem and $T$, the number of time series observations per panel unit in the Nickell bias case.
Nickell shows the inconsistency to be approximately equal to $$ -\frac{1+\beta}{T-1} $$ for $T$ "reasonably" large.