You are correct, XGBoost ('eXtreme Gradient Boosting') and sklearn's GradientBoost are fundamentally the same as they are both gradient boosting implementations.
However, there are very significant differences under the hood in a practical sense. XGBoost is a lot faster (see http://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/) than sklearn's. XGBoost is quite memory-efficient and can be parallelized (I think sklearn's cannot do so by default, I don't know exactly about sklearn's memory-efficiency but I am pretty confident it is below XGBoost's).
Having used both, XGBoost's speed is quite impressive and its performance is superior to sklearn's GradientBoosting.
For a regression, the loss of each point in a node is
$\frac{1}{2}(y_i - \hat{y_i})^2$
The second derivative of this expression with respect to $\hat{y_i}$ is $1$. So when you sum the second derivative over all points in the node, you get the number of points in the node. Here, min_child_weight means something like "stop trying to split once your sample size in a node goes below a given threshold".
For a binary logistic regression, the hessian for each point in a node is going to contain terms like
$\sigma(\hat{y_i})(1 - \sigma(\hat{y_i}))$
where $\sigma$ is the sigmoid function. Say you're at a pure node (e.g., all of the training examples in the node are 1's). Then all of the $\hat{y_i}$'s will probably be large positive numbers, so all of the $\sigma(\hat{y_i})$'s will be near 1, so all of the hessian terms will be near 0. Similar logic holds if all of the training examples in the node are 0. Here, min_child_weight means something like "stop trying to split once you reach a certain degree of purity in a node and your model can fit it".
The Hessian's a sane thing to use for regularization and limiting tree depth. For regression, it's easy to see how you might overfit if you're always splitting down to nodes with, say, just 1 observation. Similarly, for classification, it's easy to see how you might overfit if you insist on splitting until each node is pure.
Best Answer
XGBoost does not produce a decision tree/trees with leaf node values of 0 or 1.
Instead, it uses multiple regression trees with continuous value "weights" on its leaf nodes e.g. in the range 0 ~ 1. Hence, regularization is applied in the same manner as for similar regression based loss functions.
The tree boosting algorithm applies the trees additively i.e. sums the weights for all trees to arrive at the final value for the given input.
This continuous valued "score" needs to be interpreted by you as a class label by applying a cutoff, e.g. predicted class = 1 if $\hat y > 0.5 $, if cutoff = 0.5.