What you are asking doesn't really fall into the framework of the SVM. There is some work on incorporating prior knowledge into SVMs (see e.g. here but these approaches are generally not on an example by example basis.
I can think of one way in which you could approach this, if you have a lot of samples. You could use the weights as probabilities for inclusion in random subsets. You would then learn the SVM on each subset, and your final classifier is then a linear combination of these subsets. This is a variation on bootstrapping, which normally works over subsets of the features (see e.g. here, and might be quite interesting to analyse.
[Edit 1]:
Based on the answers from Jeff and Dikran it occured to me that you can just incorporate into the SVM objective. Normally the primal form looks like:
$\min_{\mathbf{w},\mathbf{\xi}, b } \left\{\frac{1}{2} \|\mathbf{w}\|^2 + C \sum_{i=1}^n \xi_i \right\}$
subject to (for any $i=1,\dots n$)
$y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1 - \xi_i, ~~~~\xi_i \ge 0 .$
but you could just include another vector of confidence values, e.g. $0 < \delta_i \leq 1, ~~~~i=1,\dots n$:
$\min_{\mathbf{w},\mathbf{\xi}, b } \left\{\frac{1}{2} \|\mathbf{w}\|^2 + \frac{C}{\delta_i} \sum_{i=1}^n \xi_i \right\}$
subject to (for any $i=1,\dots n$)
$y_i(\mathbf{w}\cdot\mathbf{x_i} - b) \ge 1 - \xi_i, ~~~~\xi_i \ge 0 .$
which would mean that instances with low probability would receive a greater penalty in the objective. Note that now the $C$ parameter performs two roles - as a regulariser and as a scaling factor for the confidence scores. This may cause its own problems, so it might be better to split it into two parts, but then of course you would have an extra hyperparameter to tune.
[Edit 2]:
This can be done with libSVM (MATLAB and Python interfaces are included). There is also code available in several languages for the SMO algorithm which can solve the SVM problem efficiently. Alternatively you could use an optimisation package, such as quadprog in matlab or CVX, to write a custom solver.
There are plenty possibilities to construct one-class-classifiers. I wrote a number of simple algorithms in the context of authorship verification. Here, only positive samples of one author X are given, so that the task is to judge if a given document was written by X or not. However, it can be adapted to other fields besides authorship verification by just adjusting the features. Here are two of my papers:
Oren Halvani, Lukas Graner, Inna Vogel. Authorship Verification in the Absence of Explicit Features and Thresholds In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science, vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_34
O. Halvani and M. Steinebach, "An Efficient Intrinsic Authorship Verification Scheme Based on Ensemble Learning," 2014 Ninth International Conference on Availability, Reliability and Security, 2014, pp. 571-578, doi: 10.1109/ARES.2014.84.
Best Answer
If the goal is to determine for new samples whether you can apply the classifier that you've already built, then the correct answer is to use a one-class SVM (as implemented here). A one-vs-all scheme, which your second idea is known as, is useful in order to extend binary classifiers to a multi-task setting, but they are incapable of telling you if you're getting outliers.
edit: As for the why -- it may be the case that there are features of your data in these 10 classes that makes them unique from the unobserved classes, but are not predictive of any single class. These would be missed by the one-vs-all scheme, but would be taken into account (depending on things like the kernel you use) by a one-class SVM.