The most simple example used to illustrate this is the XOR problem (see image below). Imagine that you are given data containing of $x$ and $y$ coordinated and the binary class to predict. You could expect your machine learning algorithm to find out the correct decision boundary by itself, but if you generated additional feature $z=xy$, then the problem becomes trivial as $z>0$ gives you nearly perfect decision criterion for classification and you used just simple arithmetic!
So while in many cases you could expect from the algorithm to find the solution, alternatively, by feature engineering you could simplify the problem. Simple problems are easier and faster to solve, and need less complicated algorithms. Simple algorithms are often more robust, the results are often more interpretable, they are more scalable (less computational resources, time to train, etc.), and portable. You can find more examples and explanations in the wonderful talk by Vincent D. Warmerdam, given on from PyData conference in London.
Moreover, don't believe everything the machine learning marketers tell you. In most cases, the algorithms won't "learn by themselves". You usually have limited time, resources, computational power, and the data has usually a limited size and is noisy, neither of these helps.
Taking this to the extreme, you could provide your data as photos of handwritten notes of the experiment result and pass them to the complicated neural network. It would first learn to recognize the data on pictures, then learn to understand it, and make predictions. To do so, you would need a powerful computer and lots of time for training and tuning the model and need huge amounts of data because of using a complicated neural network. Providing the data in a computer-readable format (as tables of numbers), simplifies the problem tremendously, since you don't need all the character recognition. You can think of feature engineering as a next step, where you transform the data in such a way to create meaningful features so that your algorithm has even less to figure out on its own. To give an analogy, it is like you wanted to read a book in a foreign language, so that you needed to learn the language first, versus reading it translated in the language that you understand.
In the Titanic data example, your algorithm would need to figure out that summing family members makes sense, to get the "family size" feature (yes, I'm personalizing it in here). This is an obvious feature for a human, but it is not obvious if you see the data as just some columns of the numbers. If you don't know what columns are meaningful when considered together with other columns, the algorithm could figure it out by trying each possible combination of such columns. Sure, we have clever ways of doing this, but still, it is much easier if the information is given to the algorithm right away.
Best Answer
You question title and the content seems mismatched to me. If you are using linear model, add a total feature in addition to attack and defense will make things worse.
First I would answer why feature engineering work in general.
A picture is worth a thousand words. This figure may tell you some insights on feature engineering and why it works (picture source):
The data in Cartesian coordinates is more complicated, and it is relatively hard to write a rule / build a model to classify two types.
The data in Polar coordinates is much easy:, we can write a simple rule on $r$ to classify two types.
This tell us that the representation of the data matters a lot. In certain space, it is much easier to do certain tasks than other spaces.
Here I answer the question mentioned in your example (total on attack and defense)
In fact, the feature engineering mentioned in this sum of attack and defense example, will not work well for many models such as linear model and it will cause some problems. See Multicollinearity. On the other hand, such feature engineering may work on other models, such as decision tree / random forest. See @Imran's answer for details.
So, the answer is that depending on the model you use, some feature engineering will help on some models, but not for other models.