As someone who started out their career thinking of statistics as a messy discipline, I'd like to share my epiphany regarding the matter. For me, the insight came from Linear Algebra, so I would urge you to push in that direction.
Specifically, once you realize that the sum of squares, $\sum_i X_i^2$, and sum of products, $\sum_i X_i Y_i$, are both inner products (aka dot products), you realize that nearly all of statistics can be thought of as various operations from linear algebra.
If you sample $n$ values from a population, you have an $n$-dimensional vector. The sample mean is a projection of this vector onto the $n$-dimensional all-ones vector. The standard deviation is projection onto the $(n-1)$-dimensional hyperplane normal to the all-ones vector (finally an intuitive reason for the "$n-1$" in the denominator!). Specifically, for the sample variance $s^2$ for sample $X$, here is the linear algebra:
First, we work with deviations from the mean. The mean in linear algebra terms is
$\bar{X}=\frac{\langle X,\mathbf{1}\rangle}{\langle \mathbf{1},\mathbf{1}\rangle} \mathbf{1}$
where $\langle \cdot, \cdot \rangle$ is the inner product and $\mathbf{1}$ is the $n$-dimensional ones vector. Then the deviation from the mean is
$x = X - \bar{X}$
Note that $x$ is constrained to an $(n-1)$-dimensional subspace. The usual equation for variance is
$s^2 = \dfrac{\sum_i (X_i - \bar{X})^2}{n-1}$
For us, that's
$s^2 = \dfrac{\langle x, x \rangle}{\langle \mathbf{1}, \mathbf{1} \rangle}$
which, without going into too much detail (too late) is a normalized deviation. The trick there is that the new $\mathbf{1}$ has dimension $n-1$.
The other good example is that correlation between two samples is related to the angle between them in that $n$-dimensional space. To see this, consider that the angle between two vectors $v$ and $w$ is:
$\theta = \arccos \dfrac{\langle v, w \rangle}{\|v\|\|w\|}$
where $\|\cdot\|$ is vector length. Compare this to one of the forms for the Pearson Correlation and you will see that $r = \cos \theta$.
There are many other examples, and these have barely been explained here, but I just hope to give an impression of how you can think in these terms.
Mathematics, as you know, is an extremely broad subject with lots of different subfields to specialize in. Each branch of math is a life's work in itself. There's no real trick to understanding research papers - it's just time and effort devoted to understanding a field of math. So if you want to be able to understand research papers, I recommend you first try to understand research papers in one specific area by studying it intensely and reading books about it.
I faced this same problem. As a high school junior very eager to start math research, I started by tackling the hardest books at my disposal - Spivak's Calculus on Manifolds, Serre's A Course in Arithmetic, etc. These books were even recommended to me by a professor! But as I read them and pretended to understand them in periodic meetings I had with my mentor, I began to grow greatly discouraged, thinking that I'd never be able to understand these papers. For instance, in A Course in Arithmetic, I couldn't even read through the first page without having to look up terms that I never learned because I had never taken a formal class in algebra, like the characteristic of a finite field. I began to believe that math wasn't what I wanted to do with my life anymore.
But my feelings changed when my mentor referred me to a professor, and I enrolled in his 300-level undergraduate math class in number theory. Though it's a bit simple, it really fortified my foundations in proof-writing, and I'm becoming more familiar with concepts that I needed to read works like Serre's - I feel like I'm definitely making more progress than when I tackled math papers cold-turkey. Unlike before, I feel like I'm fast on-track to being able to understand the math papers that so discouraged me.
So my point is: start with what you understand, namely books and papers that are simpler, and if possible, enroll in a proof-based math class. Even if you consider such material too easy, it'll help for later, and you'll pick up the techniques of proofs on the way.
Also, if you're really intent on understanding research, pick one specific field to do so, and study it intensely from the ground up.
Best Answer
As this is a matter of opinion, I can only offer my opinion. If you write 5 pages for 6 pages of reading (as you mention in the comments), you should certainly change something. Personally, I always attempt to maximize the ratio content/length. Generally speaking it also depends on what I intend to do; if it is supposed to be a conceptual summary, I would focus on intuition, some key examples, and without proofs. Also I would attempt to write down the intuitive meaning of certain terms as precisely as possible, and omit the formal definition (which you have to know anyhow). As for proofs, I generally omit them. (Not because I think that they are unimportant, but because you can look them more easily.) Perhaps write a few things about some standard proof techniques, e.g. partition of unity in differential geometry, or some key ideas. In any case don't attempt to rewrite the textbook, it is likely a waste of time. Also always express things in your own words. If you want a concrete example, tell me about the 6 pages which you read, and I make a one page note.
I would summarize the content of chapter 2 as follows. Note that I did not include any definitions, but emphasized intuitive meaning, structures, and how the notions are in relation to one another. Also I emphasized what you can do with $\mathbf{R}$ (perform all kinds of operations, compare elements, etc.) and not "what $\mathbf{R}$ is" (i.e. how $\mathbf{R}$ is constructed). For thinking, this is often more convenient. Consider the related question: what are the natural numbers $\mathbf{N}$? The construction is: $0:=\emptyset$, $1:=\{\emptyset\}$, $2:=\{ \emptyset,\{\emptyset\}\}$,... but it is not useful to think about $\mathbf{N}$ in this way, because it wastes brain capacity and the only things you really use are the Peano axioms, the principle of induction, and the well-ordering principle.
The real numbers form a complete ordered field, and by this property it is determined up to unique order isomorphism (2-9). Thus the real numbers are endowed with the following structures: i) an algebraic structure (2-1 to 2-3), the field structure which governs the arithmetic ($+$ and $\cdot$) of $\mathbf{R}$. ii) an order structure which admits comparing the elements of $\mathbf{R}$. This order structure is compatible with the algebraic structure (this is the order axiom, 2-3). The order structure induces a metric space structure (via the absolute value), giving a notion of distance(2-5 to 2-6). By the order structure we have a notion of boundedness for subsets of $\mathbf{R}$, and for bounded above (resp. bounded below sets) one as has the notion of a supremum/least upper bound (resp. infimum/greatest lower bound). There is a certain duality between the supremum and the infimum, replacing $\leqslant$ by $\geqslant$. [If you want to understand this in detail: see here.]
The completeness axiom states that every nonempty bounded above subset of $\mathbf{R}$ has a supremum. (Intuitively this means that $\mathbf{R}$ "has no holes", like $\mathbf{Q}$ (this is related to the fact that $\sqrt{2}\notin\mathbf{Q}$, learn the proof of this: every student has to know it). There is a unique element of $\mathbf{R}$ that is both positive and squares to $2$, it is denoted $\sqrt{2}$ (this is proven by the Archimidean principle, read the proof but there is no need to learn it by heart or something). [Theorem 2.3.3 is quite important for technical purposes, but I would return to it when I need it.]
The real numbers contains the rational numbers, which is not a complete field. In fact the real numbers is the completion of $\mathbf{Q}$ (i.e. "$\mathbf{R}$ is $\mathbf{Q}$ made complete"). Since $\sqrt{2}$ is irrational, $\mathbf{Q}$ is properly included in $\mathbf{R}$ and in fact there are more irrational numbers than rational numbers: $\mathbf{Q}$ is countable, but $\mathbf{R}$ is not (proof by Cantor's diagonal argument).
In response to the comments: When reading something new, I first try to figure out what is the core of the text. ("Try" because what consider to be the core will necessarily depend on the level of your understanding, and thus the "core" is time-dependent.) Meaning, I understand the most important definitions first, then the main theorems (which form the core) without looking at proofs or anything. Then I consider more specific results. For instance for chapter 3: Definitions: Sequence, Cauchy-Sequence, Convergence of a sequence, Subsequence. Theorems: Thm. 3.1.1., Cor. 3.1.3, Thm. 3.1.4, Thm. 3.1.5, Thm. 3.3.3, Thm. 3.4.1, Thm. 3.6.1. At a first reading I would omit sections 3.2 and 3.5 altogether, e.g. becaue it is rather specific material (but useful, come back later!). Always try to make the link to what you have learned already. (E.g. how is the fact that every Cauchy-sequence in $\mathbf{R}$ converges related to the completeness axiom?) Pictures can help, but shouldn't be taken literally. For finding out what is the core, I cannot really tell you how to do it, it seems to be a matter of experience. I never said that you should not write something, you must write. But don't just copy all the theorems, try to understand them in several ways, their relation to one another, consider examples. At some point come back, and ask yourself what you have learned: and write a short note as I did. It might also help you if you know why you are reading the text: do you want to know something specific? Do you want to get a general overview? Do you want to calculate something?