Max pooling loses information in a sense that it tells you whether a filtered feature was encountered or not, but forgets where in the data, how many times etc.
Suppose your filter is looking for vertical stripes in the image. Without max pooling it will output all stripes found. With max pooling, it will tell you whether there were stripes in the filter output or not. Pretty much zero or one outputs, as opposed to the whole image with stripes marked on it with ones. Max pooling can be viewed as a very crude form of compression in this regard.
It's quite surprising that max pooling actually works given how crude it is. One reason why it does work is because you usually run a battery of filters. For instance, you may run a vertical, horizontal, and stripes at -45 and +45 degrees stripes filters then max pool their output. If you're looking for a rectangular box in the image, having ONE output for -45 and +45 degree stripes, and ZERO output from vertical and horizontal stripe filters after max pooling may suggest that your box is inclined in your image.
Does the code below mean we are doing 2 convolutions before max pooling? Yes, it means you are doing two convolutions before pooling.
If so, why are we doing it twice and then pooling? Why not? This is a just a different model. The results are not going to change a whole lot and by no means it's wrong to do this. In fact, this will probably improve the accuracy of the model, since more convolutions before reducing the size of the feature maps with the pooling can lead to more interesting representations of the data.
The intuition is: before doing pooling, you have more pixels than after (and before the first pooling you even have all the original pixels). Thus, the filters will be able to slide more times along the image and perform more convolutional operations, leading to a richer representation.
The trade-off of course is computational time. That is why more modern models started stacking many more convolutional layers before the pooling layers.
Best Answer
Max pooling layers (and pooling layers more generally) are used to make training deep convolutional nets easier. When you insert a max pooling layer after a convolutional layer, it is effectively downsampling the output of the conv layer. In other words, it's reducing the amount of data that will be sent to the next layer (by a factor of 4 if your pooling stride is 2 horizontal and 2 vertical). Since max pool layers throw away information from the previous layers, if you insert a lot of pooling layers there's a good chance you will see a decrease in the performance of the network (although it will train faster).
For this reason (amongst others) there's been a recent trend of throwing away max pooling layers (such as in convolutional nets with residual connections).