There are two common ways of utilizing the maximum-margin hyperplane of a trained SVM.
(1) Prediction for new data points
Based on a given training dataset, an SVM hyperplane is fully specified by its slope $w$ and an intercept $b$. (These variable names derive from a tradition established in the neural-networks literature, where the two respective quantities are referred to as 'weight' and 'bias.') As noted before, a new data point $x \in \mathbb{R}^d$ can then be classified as
\begin{align}
f(x) = \textrm{sgn}\left(\langle w,x \rangle + b \right)
\end{align}
where $\langle w,x \rangle$ represents the inner product. Thanks to the Karush-Kuhn-Tucker complementarity conditions, the discriminant function can be rewritten as
\begin{align}
f(x) = \sum_{i \in SV} \alpha_i \langle x_i, x \rangle + b,
\end{align}
where the hyperplane is implicitly encoded by the support vectors $x_i$, and where $\alpha_i$ are the support vector coefficients. The support vectors are those training data points which are closest to the separating hyperplane. Thus, predictions can be made very efficiently, since only inner products (alternatively: kernel functions) between some training points and the test point have to be evaluated.
Some have suggested also considering the distance between a new data point $x$ and the hyperplane, as an indicator of how confident the model was in its prediction. However, it is important to note that hyperplane distance itself does not afford inference; there is no probability associated with a new prediction, which is why an SVM is sometimes referred to as a point classifier. If probabilistic output is desired, other classifiers may be more appropriate, e.g., the SVM's probabilistic cousin, the relevance vector machine (RVM).
(2) Reconstructing feature weights
There is another way of putting an SVM model to use. In many classification analyses it is interesting to examine which features drove the classifier, i.e., which features played the biggest role in shaping the separating hyperplane. Given a trained SVM model with a linear kernel, these feature coefficients $w_1, \ldots, w_d$ can be reconstructed easily using
\begin{align}
w = \sum_{i=1}^n y_i \alpha_i x_i
\end{align}
where $x_i$ and $y_i$ represent the $i^\textrm{th}$ training example and its corresponding class label.
An important caveat of this approach is that the resulting feature weights are simple numerical coefficients without inferential quality; there is no measure of confidence associated with them. Thus, we cannot readily argue that some features were 'more important' than others, and we cannot infer that a feature with a particularly low coefficient was 'not important' in the classification problem. In order to allow for inference on feature weights, we would need to resort to more general-purpose approaches, such as the bootstrap, a permutation test, or a feature-selection algorithm embedded in a cross-validation scheme.
The RVM places an Automatic Relevance Determination (ARD) prior on the weights in a regularized regression/logistic regression setup. (The ARD prior is a just a weak gamma prior on the precision of a gaussian random variable). Marginalizing out the weights and maximizing the likelihood of the data with respect to the precision causes many of the precision parameters to become large, which would push the associated weights to zero. If you use feature vectors given by a design matrix, then this strategy selects a small set of examples that predict the target variable well.
The IVM strategy is fundamentally different from the RVM's strategy. The IVM is a Gaussian Process method that selects a small set of points from the training set using a greedy selection criterion (based on change in entropy of the posterior GP) and combines this strategy with standard GP regression/classification on the sparse set of points.
Unlike the SVM, for both the IVM and RVM there is not an obvious geometric interpretation of relevant or informative vectors. Basically, both of the algorithms find sparse (the SVM and IVM are dual sparse, but the RVM should probably be considered primal sparse) solutions for regression/classification problems but they use different approaches to do so.
Best Answer
You are right if you are talking about hard SVM and the two classes are linearly separable. LR finds any solution that separates the two classes. Hard SVM finds "the" solution among all possible ones that has the maximum margin.
In case of soft SVM and the classes not being linearly separable, you are still right with a slight modification. The error cannot become zero. LR finds a hyperplane that corresponds to the minimization of some error. Soft SVM tries to minimize the error (another error) and at the same time trades off that error with the margin via a regularization parameter.
One difference between the two: SVM is a hard classifier but LR is a probabilistic one. SVM is sparse. It chooses the support vectors (from the training samples) that has the most discriminatory power between the two classes. Since it does not keep other training points beyond that at the test time, we do not have any idea about about the distribution of any of the two classes.
I have explained how LR solution (using IRLS) breaks in case of linearly separability of the two classes and why it stops being a probabilistic classifier in such a case: https://stats.stackexchange.com/a/133292/66491