It is quite hard to compare kNN and linear regression directly as they are very different things, however, I think the key point here is the difference between "modelling $f(x)$" and "having assumptions about $f(x)$".
When doing linear regression, one specifically models the $f(x)$, often something among the lines of $f(x) = \mathbf{wx} + \epsilon$ where $\epsilon$ is a Gaussian noise term. You can work it out that the maximum likelihood model is equivalent to the minimal sum-of-squares error model.
KNN, on the other hand, as your second point suggests, assumes that you could approximate that function by a locally constant function - some distance measure between the $x$-ses, without specifically modelling the whole distribution.
In other words, linear regression will often have a good idea of value of $f(x)$ for some unseen $x$ from just the value of the $x$, whereas kNN would need some other information (i.e. the k neighbours), to make predictions about $f(x)$, because the value of $x$, and just the value itself, will not give any information, as there is no model for $f(x)$.
EDIT: reiterating this below to re-express this clearer (see comments)
It is clear that both linear regression and nearest neighbour methods aim at predicting value of $y=f(x)$ for a new $x$. Now there are two approaches. Linear regression goes on by assuming that the data falls on a straight line (plus minus some noise), and therefore the value of y is equal to the value of $f(x)$ times the slope of the line. In other words, the linear expression models the data as a straight line.
Now nearest neighbour methods do not care about whether how the data looks like (doesn't model the data), that is, they do not care whether it is a line, a parabola, a circle, etc. All it assumes, is that $f(x_1)$ and $f(x_2)$ will be similar, if $x_1$ and $x_2$ are similar. Note that this assumption is roughly true for pretty much any model, including all the ones I mentioned above. However, a NN method could not tell how value of $f(x)$ is related to $x$ (whether it is a line, parabola, etc.), because it has no model of this relationship, it just assumes that it can be approximated by looking into near-points.
A learning curve is a plot of the training and cross-validation (test, in your case) error as a function of the number of training points. not the share of data points used for training. So it show how train/test errors evolve as the total data set increases. See here for examples and more detail.
The 'train error' would be the error (according to your loss function) achieved for the training set, and the 'test error' means the same for the test set. See here for more detail.
If I interpret your chart correctly, then the fraction of data you are using to test your model increases to 90%, while the error decreases for the 'test' data, while it increases for the (simultaneously shrinking) train set.
In other words, as you are training your SVM model using fewer and fewer data, the 'train error' increases, which makes sense. It is a bit odd that the test error would be decreasing as you reduce the size of the train set, so perhaps I am misinterpreting your chart?
Best Answer
I will share a picture with you to clear your ambiguities.
Assume you've got the training data in 2D space that are labeled either red or green. On the left figure, you've got a test data point (in gray). According to k-NN (the equation that you wrote) $$\hat{y}(x) = \frac{1}{k}\sum_{x_i\in N_k(x)}y_i$$ The $y_i$'s are the training data, where the $x$ is the testing point. So, after we compute this equation, (see the right figure), we can judge where this point belongs (either red or green in our case).