Let us for simplicity consider just classical point mechanics (i.e. a $0+1$ dimensional world volume) with only one variable $q(t)$. (The generalization to classical field theory on an $n+1$ dimensional world volume with several fields is straightforward.)
Let us reformulate the title(v1) as follows:
Why can't the Lagrangian $L$ always be written as a total derivative $\frac{dF}{dt}$?
In short, it is because:
In physics, the action functional $S[q]$ should be local, i.e. of the form
$S[q]=\int dt~L$, where the $L$ is a function of the form
$$L~=~L(q(t), \dot{q}(t), \ddot{q}(t), \ldots, \frac{d^Nq(t)}{dt^N};t),$$
and where $N\in\mathbb{N}_{0}$ is some finite order. (In most physics applications $N=1$, but this is not important in what follows. Note that
the Euler-Lagrange equations get modified with higher-order terms if $N>1$.)
Similarly, we demand that $F$ is of local form
$$F~=~F(q(t), \dot{q}(t), \ddot{q}(t), \ldots, \frac{d^{N-1}q(t)}{dt^{N-1}};t),$$
We stress that $L$ and $F$ only refer to the same time instant $t$. In other words, if $t$ is now, then $L$ and $F$ does not depend on the past nor the future.
The special intermediate role played by the $q$ variable in between $L$ and $t$. Note that there can be both implicit and explicit time-dependence of $L$ and $F$.
Counterexample: Consider
$$L~=~-\frac{k}{2}q(t)^2.$$
Then we can write $L=\frac{dF}{dt}$ as a total time derivative by defining
$$F=-\frac{k}{2}\int_0^t dt'~q(t')^2. $$
($F$ is unique up to a functional K[q] that doesn't depend on $t$.) But $F$ is not on local form as it also depends on the past $t'<t$.
Finally, let us mention that one can prove (still under assumption of locality in above sense plus assuming that the configuration space is contractible, due to an algebraic Poincare lemma of the so-called bi-variational complex, see e.g. Ref. 1) that
$$ \text{The Lagrangian density is a total divergence} $$ $$\Updownarrow$$
$$\text{The Euler-Lagrange equations are identically satisfied}. $$
References:
- G. Barnich, F. Brandt and M. Henneaux, Local BRST cohomology in gauge theories, Phys. Rep. 338 (2000) 439, arXiv:hep-th/0002245.
Decomposing the E-field: since this is a vector, it can be expressed equivalently in terms of any three basis vectors that span the $\mathbb{R}^3$ vector space (which is intuitively just the set of "arrows" in three dimensions). The standard basis choice is $\{\hat{x},\hat{y},\hat{z}\}$, but any three linearly independent vectors will do. We shall almost always want to choose orthonormal vectors (mutually perpendicular, all having length 1). In this case, being that we are near a surface of charge, we want basis vectors that are useful in this small local region: one unit vector normal to the surface (usually denoted $\hat{n}$), and two that are tangent to the surface at this particular point (call them $\hat{s}_1$ and $\hat{s}_2$). So to actually perform the decomposition given $\vec{E}=E_x\hat{x}+E_y\hat{y}+E_z\hat{z}$, we would first need to find formulas for $\hat{n}$ and $\hat{s}_i$ in the xyz-basis also, then invert these formulas to solve for $\{\hat{x},\hat{y},\hat{z}\}$ in terms of $\{\hat{n},\hat{s}_i\}$, then simply plug these into the equation for $\vec{E}$ and regroup.
I am not sure how to answer "why" the normal component is discontinuous and the parallel (i.e. tangent) components are discontinuous. The explanation is given right there in Griffiths: it is due to Gauss' Law and the vanishing of the curl. Perhaps you could elaborate on what you are not understanding about it.
You seem to be severely confused when you talk about the "direction" changing of a particular component of the E-field. Strictly speaking, each component is just a number, so it doesn't have any direction at all, let alone one that can change. For instance, in standard Cartesian xyz-coordinates, the component $E_x$ just tells us the amount of E-field that points in the x-direction. The direction corresponding to each component is fixed by the unit vector it is paired with, and the total E-field is the vector sum of all the basis vectors weighted by the components in each direction. This is why we talk about the normal component, and the tangent components (note the plural, since there are two independent directions in the plane tangent to the charge surface at the point were are focussing on).
For continuity, I'm not sure where you're getting anything about $\mathbb{R}^2$ from. Continuity in this case just refers to each component as a function of a single variable (the distance from the surface). We are concerned with what happens above and below this surface, so imagine an axis that is normal to the surface, piercing it at the particular point we are interested in. Since this is a one-dimensional object, we can label points along it with a single variable which we'll call $n$ (pop quiz: what unit vector corresponds to this coordinate variable?). Since all the points on this axis are just points in space, the E-field is defined along the n-axis, and hence we can consider $\vec{E}(n)$, a vector function of a single variable. But "vector function" just means three regular functions, one for each component: $E_n(n), E_{||,1}(n),$ and $E_{||,2}(n)$. One normal component, and two tangent components, which despite the names are only actually normal/tangent AT the surface itself, it is just that we are still talking about the field in terms of the basis vectors we defined at the surface even though we are considering points which are above and below the surface. If we call $n=0$ where the normal axis intersects the surface, then all we are saying is that these three functions are (dis)continuous at $n=0$. Note that we can combine the two tangent components into a 2D vector $\vec{E}_{||}$, which is what Griffiths does.
The equation $$\Delta\vec{E}=\vec{E}_{above} - \vec{E}_{below} = \frac{\sigma}{\epsilon_0}\hat{n}$$ just summarizes everything Griffiths has just deduced about the three different components into a single vector equation. Remember that we're writing our vectors in terms of three orthonormal basis vectors, and due to the orthonormality you can find the components by dotting your vector with each basis unit (e.g. $\hat{x}\cdot\vec{E} = E_x$). This tells us that $\Delta\vec{E}$ has no tangent components (try dotting either $\hat{s}_i$ into it), i.e. the tangent components are continuous. Similarly, there is a difference in the normal component of the E-field given by $\sigma/\epsilon_0$.
Best Answer
We have also the same notions of derivation, curl, etc... for functions that are less regular. When you write Maxwell's equations, you are writing a system of partial differential equations.
To investigate them, you have to specify the type of solution you look for (in the language of PDEs: classic, mild, weak...) and the functional space you set your theory in. A natural space for the electric and magnetic fields is $L^2(\mathbb{R}^3)$, because this is the energy space (where the energy $\int_{\mathbb{R}^3}(E(x)^2+B(x)^2)dx$ is defined). Also more regular subspaces, such as the Sobolev spaces with positive index, or bigger spaces as the Sobolev spaces with negative index are often considered.
These spaces rely on the concept of almost everywhere, i.e. they can behave badly, but only in a set of points with zero measure. Also, the Sobolev spaces generalize, roughly speaking, the concept of derivative. I suggest you take a look at some introductory course in PDEs and functional spaces. A standard reference may be the book by Evans, or also the monumental work by Hörmander.
Comment to the edit: it is not true that
Consider, e.g. the static equation \begin{equation*} \nabla\cdot E=\rho \; . \end{equation*} To investigate this equation, you have to give it a precise meaning. What are $E$ and $\rho$? Let's assume, as you said, that $\rho$ is some discontinuous function. Then it is quite strange to look for solutions of $E$ that are smooth and well behaved! We have mathematical objects that can behave even worse than discontinuous functions, and are called distributions. In particular, we are interested in the distributions dual to functions of rapid decrease, that are called $\mathscr{S}'(\mathbb{R}^3)$. Without entering into details, all functions in $L^p(\mathbb{R}^3)$, $1\leq p \leq \infty$ are distributions in $\mathscr{S}'$, as well as Dirac's delta function and its derivatives. And mathematically, it is perfectly legitimate to look at the divergence equation above in the sense of distributions: i.e. to search a distribution $E\in(\mathscr{S}'(\mathbb{R}^3))^3$ such that its distributional divergence $\nabla\cdot E \in \mathscr{S}'(\mathbb{R}^3)$ is equal to $\rho\in\mathscr{S}'(\mathbb{R}^3)$. Suppose that equation admits a solution, then this solution would not, in general, be a regular function, but a distribution. It may be, for example, a discontinuous function in $L^1$, or a sum of derivatives of the delta function.
Anyways, as I already wrote, it is necessary that you understand better the concept of Cauchy and boundary value problems for PDEs in functional spaces, and also the concept of classical, mild and weak solutions to understand fully the machinery behind Maxwell's equations, and the mathematical meaning of a solution for such a problem.