The Spy EM algorithm solves exactly this problem.
S-EM is a text learning or classification system that learns from a set of positive and unlabeled examples (no negative examples). It is based on a "spy" technique, naive Bayes and EM algorithm.
The basic idea is to combine your positive set with a whole bunch of randomly crawled documents. You initially treat all the crawled documents as the negative class, and learn a naive bayes classifier on that set. Now some of those crawled documents will actually be positive, and you can conservatively relabel any documents that are scored higher than the lowest scoring true positive document. Then you iterate this process until it stablizes.
If you're doing this yourself, as opposed to using a package, it's fairly straightforward to do all three of these things. If you're using an off-the-shelf implementation, it would depend on what you were using as to whether this was possible. I'm going to assume the attributes take categorical values (as most simple versions of NB will) in these explanations. I'll describe for a single continuous valued feature (the frequency of some word in your text, normalised by the document length, say $f_w$), with three bins in the histogram:
very rare: $0 <= f_w < 0.001$
rare: $0.001 <= f_w < 0.01$
frequent: $0.01 <= f_w <= 1$
thus for our word $w$, its feature value must always be in exactly one of these three intervals. Now to answer your questions:
1) The parameters can be updated as you see new examples by maintaining counts over the three bins for all the documents you've seen. The probability of the bin in subsequent documents is this count divided by the sum of the counts. Each time you see a document, increment the counts
2) Technically the NB model is the likelihood---you train a model as above for each class. Multiply the likelihood by the prior to get the posterior probability of the class, but be aware that in NB your likelihoods often swamp your priors because the assumption of independence leads to very sharp distributions (see this paper by Hand and Yu)
3) Easy---just change the feature $f_w$ from the normalised frequency of $w$ to the $tf-idf$ of $w$. Be aware that you'll need to specify new, sensible bins in your histogram if you stick with the categorical approach (the alternative is to specify continuous distributions on your features, but it's tricky to come up with good ones)
Best Answer
Unfortunately, there is no set answer - you have to try what works (start with whatever's easiest) for your given problem. What works can also vary by topic.
My favorite example of this Joachims 98
They are comparing algorithms and average across several feature solutions, but my point is if you look at Figure 2, Naive Bayes works really well for some topics and really poorly for others.
I generally start by taking the top 20% by TF-IDF across all classes, and use Naive Bayes to get a baseline of performance for each class. This is quick, and often all I need in my domain, insurance. Then you may want to dig deeper on any classes that perform poorly - like you said maybe do TF-IDF within the class and look at the terms that you can leverage.
Looking at the terms by class can really help - one time I noticed medical terms were important in one particular class - I downloaded a list of medical terms, turned it into a regular expression, and used it to set a flag on the documents which really improved classification on that class.
Obviously, this is very domain/topic specific, and also depends on your domain expertise. That's the way it goes with text classification in my experience - there is no standard answer for what will work for any given problem. You may have to try several solutions and you stop when performance is adequate.