The SARSA algorithm is a stochastic approximation to the Bellman equations for Markov Decision Processes. One way of writing the Bellman equation for $q_{\pi}(s,a)$ is:
$$q_{\pi}(s,a) = \sum_{s',r}p(s',r|s,a)( r + \gamma \sum_{a'}\pi(a'|s') q_{\pi}(s',a'))$$
Where $p(s',r|s,a)$ is the probability of transition to state $s'$ with reward $r$ given current state $s$ and action $a$.
In Dynamic Programming solutions to MDPs, the Bellman equation is used directly as an update mechanism that converges to correct values for $q_{\pi}(s,a)$. For estimating the value of a fixed policy, this works because the only stationary points of these updates are when the two sides are equal, and there is one equation for each $q_{\pi}(s,a)$ relating it linearly to one or more other $q_{\pi}(s',a')$, so it has a computable solution.
When you add the search for an optimal solution, such as in SARSA, then the policy $\pi$ changes. There is a proof that changing the policy to pick the action $\pi'(s) = argmax_a q_{\pi}(s,a)$ will always either improve the policy or be optimal. This is called the Policy Improvement Theorem, and is based on the inequality $v_{\pi}(s) \le q_{\pi}(s,\pi'(s))$ - there is an extension to the theorem that covers $\epsilon$-greedy policies used in learners such as Monte Carlo Control or SARSA.
TD learning, including SARSA and Q-Learning, uses the ideas of Dynamic Programming in a sample-based environment where the equalities are true in expectation. But essentially you can see how the update $q_{\pi}(s,a) = \sum_{s',r}p(s',r|s,a)( r + \gamma \sum_{a'}\pi(a'|s') q_{\pi}(s',a'))$ has turned into SARSA's update:
The weighted sum over state transition and reward probabilities happens in expectation as you take many samples. So $Q(S,A) = \mathbb{E}[ \text{Sampled}(R) + \gamma \sum_{a'}\pi(a'|S') q_{\pi}(S',a')]$ (technically you have to sample R and S' together)
Likewise the weighting of the current policy happens in expectation. So $Q(S,A) = \mathbb{E}[ \text{Sampled}(R + \gamma Q(S',A'))]$
To change this expectation into an incremental update, allowing for non-stationarity as the policy improves over time, we add a learning rate and move each estimate towards the latest sampled value: $Q(S,A) = Q(S,A) +\alpha[R + \gamma Q(S',A') - Q(S,A)]$
For a more thorough explanation of the building blocks of algorithms like SARSA and Q-Learning, you can read Reinforcement Learning: An Introduction. Or for a more concise and mathematically rigorous approach you can read Algorithms for Reinforcement Learning.
Best Answer
I agree with what Sean said. I'll add this little bit to answer this question you asked more concretely:
The SARSA update rule can converge to different values than the Q-learning rule (which is, like Sean said, essentially what you suggested). This is due to the difference between on-policy and off-policy that he also described. An on-policy algorithm (like the SARSA update rule) converges to the optimal values for the policy that your agent is also using to gather experience. Off-policy algorithms converge to values for a policy that is different from the policy being followed by the agent to gather experience.
The behaviour policy (the one that the agent uses to gather experience) is typically going to be something like $\epsilon$-greedy, where with some nonzero probability you select suboptimal (random) actions. An on-policy algorithm like SARSA takes this into account, it converges to values that are still correct given the knowledge that your agent is sometimes going to be "stupid" and do something random. Q-learning (off-policy learning with a pure greedy policy as "target policy", the policy you're computing values for) is going to converge to values that are actually only going to be correct later on, when your agent switches over to a completely greedy policy.
This distinction can, for example, be important in situations where you care about learning "safe" behaviour during the learning process, where you don't just care about learning optimal behaviour to run after the learning process. Suppose, for example, that you have a robot who starts near a cliff, and needs to walk to another point along the same cliff.