What you are asking doesn't really fall into the framework of the SVM. There is some work on incorporating prior knowledge into SVMs (see e.g. here but these approaches are generally not on an example by example basis.
I can think of one way in which you could approach this, if you have a lot of samples. You could use the weights as probabilities for inclusion in random subsets. You would then learn the SVM on each subset, and your final classifier is then a linear combination of these subsets. This is a variation on bootstrapping, which normally works over subsets of the features (see e.g. here, and might be quite interesting to analyse.
[Edit 1]:
Based on the answers from Jeff and Dikran it occured to me that you can just incorporate into the SVM objective. Normally the primal form looks like:
$\min_{\mathbf{w},\mathbf{\xi}, b } \left\{\frac{1}{2} \|\mathbf{w}\|^2 + C \sum_{i=1}^n \xi_i \right\}$
subject to (for any $i=1,\dots n$)
$y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1 - \xi_i, ~~~~\xi_i \ge 0 .$
but you could just include another vector of confidence values, e.g. $0 < \delta_i \leq 1, ~~~~i=1,\dots n$:
$\min_{\mathbf{w},\mathbf{\xi}, b } \left\{\frac{1}{2} \|\mathbf{w}\|^2 + \frac{C}{\delta_i} \sum_{i=1}^n \xi_i \right\}$
subject to (for any $i=1,\dots n$)
$y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1 - \xi_i, ~~~~\xi_i \ge 0 .$
which would mean that instances with low probability would receive a greater penalty in the objective. Note that now the $C$ parameter performs two roles - as a regulariser and as a scaling factor for the confidence scores. This may cause its own problems, so it might be better to split it into two parts, but then of course you would have an extra hyperparameter to tune.
[Edit 2]:
This can be done with libSVM (MATLAB and Python interfaces are included). There is also code available in several languages for the SMO algorithm which can solve the SVM problem efficiently. Alternatively you could use an optimisation package, such as quadprog in matlab or CVX, to write a custom solver.
Classifiers usually try to find the best fit for all the data. In the case of imbalance where you have much more negative than positive samples the classifier will pay more attention to the negative class in order to obtain a small overall error. Imbalance can be intrinsic or extrinsic, i.e. intrinsic imbalances are a direct result caused by the nature of the data space (e.g. rare diseases) and extrinsic imbalances are a result of certain limitations (time, space, money, etc.) where the data space is in reality not imbalanced. In addition, it might happen that only either the training or the testing data set are imbalanced. Personally, I would start with stratified cross-validation where it is ensured that the ratio between positive and negative class is the same in each fold and the same as in the overall data set.
To address the imbalance itself there are several methods that do this. A simple way would be to increase the weight of samples from the positive class compared to the negative class, this makes the classifier kind of cost-sensitive. An introduction to all the available methods can be found in
- Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284.
- Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the Class Imbalance Problem. 2008 Fourth International Conference on Natural Computation (pp. 192-201).
Best Answer
If you are using hard margins, there is no difference because the best margin is the same either way.
If you are using soft margins, then duplicating a data point can matter since the penalty is a sum over data points within the margin, and duplicating these data points affects the size of the penalty.
Here are $1$-dimensional pictures showing what might be the best soft-margin classifiers without and with duplication.
$XXX~~~~~~~~~~X~|~~~~~~~~~~~OOOO$
$XXX~~~~~~XXX~~~~~|~~~~~OOOO$