As requested, I'll post the comment above as an answer:
The OP is right that there are inevitably patterns in large data sets, and in fact often ones of the sort that we want to find. Here are a couple of very common examples.
In statistics, traditionally you do "hypothesis testing" where you try to find evidence for or against the hypothesis that a parameter has a particular value, for example that the mean height of American males is 5'11''. You do this by measuring the mean height of a sample and then seeing if it is "significantly" different from 5'11''. The problem is, if your sample is big enough, it is always "significantly different", because significance, whatever that is, increases with the size of the sample.
Another example is finance, where people called technical analysts look for support and resistance patterns, which is where a price keeps declining after reaching a certain value (say \$20), and then bouncing back up again. This is evidence that people are selling when the price reaches \$20 and taking their profits. However, such patterns also very commonly appear in random walks, so it is often not clear whether they represnt anything real about the financial situation.
It is said that "If you torture the data for long enough, it will confess." Some people use the term Data Mining in a perjorative sense to refer to finding patterns which aren't really there, in a sense that all sufficiently large data sets should contain such patterns just by chance.
One of our main defences against this problem is to split the data randomly into subsets and look for patterns in one subset, then see whether they generalize to the other subsets. In machine learning, people say "training" instead of "fitting a statistical model" or "looking for patterns" and then talk about "validation". The idea is that the validation should show you how your model is likely to perform on unseen data. If your model learns a spurious pattern, then you hope that this will show up as a poor performance on the validation data. Such a model is said to be over-fitted.
Splitting up the whole data set into $k$ subsets and validating a model fitted to one of the subsets on the rest of the data, and doing this $k$ times, once for each subset, is called $k$--fold cross-validation and is a common way of estimating how your model will perform on new data. This is why the Stackexchange statistics site is called Cross Validated.
The motivation for this procedure is that we really want to see how our patterns generlize to unseen data. If they are real patterns, they should show up in unseen data as well. But if we don't have any unseen data yet, we just pretend that some of the data we have got is unseen, and use it to validate the model. And it makes perfect sense, because usually our reason for wanting to fit models/find patterns/learn is to make predictions about new data that we haven't seen yet.
Let me elaborate on my comment. Assume we have some polynomial algorithm 3COL, which returns a valid 3-coloring 3COL(G) of any triangle-free 3-colorable graph $G$ in polynomial time.
As 3COL is polynomial there are some $a,b,c \in \mathbb{N}$, such that on a graph with $n$ vertices, 3COL will always therminate after $a n^b+c$ instructions.
I claim that there is a polynomial algorithm to decide if a given triangle-free graph is 3-colorable (in particular it will decide if the graph is not 3-colorable):
- Let $G$ be any triangle-free graph on $n$ vertices.
- We execute $a n^b+c+1$ instructions of 3COL(G). If the algorithm doesn't terminate $G$ has no 3-coloring.
- Assume the algorithm terminates with some coloring. Either it is valid (then we are done and the graph is 3-colorable) or it is invalid. If it is invalid, we know that the graph can't be 3-colorable as 3COL is correct on 3-colorable triangle-free graphs.
This algorithm runs in polynomial time and decides the $NP$-complete problem if a triangle-free graph is 3-colorable, thus it can't exist unless $P=NP$.
I didn't see this kind of argument before, so let me know if you see some gap in my proof.
Best Answer
There are many ways to construct triangle-free graphs with large chromatic number. One well-known construction is Jan Mycielski's, which is described in the Wikipedia article that you cited; I guess you had trouble understanding it. Instead of explaining Mycielski's construction, I will describe another one, which is even less "efficient" (meaning that it produces a bigger graph for a given chromatic number) but maybe slightly simpler.
For a given natural number $n\in\mathbb N$, let $G_n$ be the following graph with $\binom n2$ vertices and $\binom n3$ edges: the vertices are the pairs $(x,y)$ of integers with $1\le x\lt y\le n$; for each triple $(x,y,z)$ with $1\le x\lt y\lt z\le n$, there is an edge joining vertex $(x,y)$ to vertex $(y,z)$. It is plain to see that $G_n$ is triangle-free. I will show that $G_n$ is $k$-colorable if and only if $n\le2^k$. Therefore, if $2^{1525}\lt n\le2^{1526}$, then $\chi(G_n)=1526$.
First, suppose $G_n$ is $k$-colorable; I will show that $n\le2^k$. Let $f:V(G_n)\to[k]=\{1,\dots,k\}$ be a proper vertex-coloring of $G_n$; i.e., for $1\le x\lt y\le n$, the vertex $(x,y)$ gets color $f(x,y)\in[k]$. For each $x\in[n]=\{1,\dots,n\}$, let $S_x=\{f(u,x):1\le u\lt x\}\subseteq[k]$. Note that, since $f$ is a proper coloring, the sets $S_1,\dots,S_n$ must be distinct; namely, if $1\le x\lt y\le n$, then $f(x,y)\in S_y\setminus S_x$, showing that $S_x\ne S_y$. Thus there are $n$ distinct subsets of $[k]$, i.e., $n\le2^k$.
Now, suppose $n\le2^k$; I will show that $G_n$ is $k$-colorable. Let $S_1,\dots,S_n$ be $n$ distinct subsets of $[k]$, indexed in order of increasing size, so that $|S_1|\le|S_2|\le\dots\le|S_n|$; it follows that $S_y\not\subseteq S_x$ whenever $1\le x\lt y\le n$. Finally, we get a proper coloring $f:V(G)\to[k]$ by assigning to each vertex $(x,y)$ a color $f(x,y)\in S_y\setminus S_x$.