False-position method is another name for regula falsi.
The difference to the secant method is the bracketing interval.
Yes, bracketing interval methods ensure convergence, as they shrink the interval that is known---via intermediate-value theorem---to contain a root.
The secant method, if it converges to a simple root, has the golden ratio $\frac{\sqrt5+1}2=1.6180..$ as superlinear order of convergence.
Bisection, in only considering the length of the bracketing interval, has convergence order 1, that is, linear convergence, and convergence rate $0.5$ from the halving of the interval in every step.
Regula falsi in most cases does not converge in the sense of shrinking the interval towards zero length. In such a situation, one side of the interval will converge to the root linearly, with order 1. And most likely a very unfortunate convergence rate close to $1$.
That is why anti-stalling versions of the method exist. For instance the Illinois modification shrinks the interval systematically towards zero length and reduces the error by the third power every 3 steps in a quite periodic pattern, so has a superlinear order of convergence of $\sqrt[3]3= 1.4422..$. One might claim that this is still faster than the $\sqrt2=1.4142..$ per function evaluation of the Newton method.
See Accuracy of approximation using linear interpolation
Best Answer
Assume first that the function has a root in $x=0$ and looks locally around $x=0$ like $f(x)=cx(x+d)$ with $c,d>0$ and the bracketing interval $[a,b]$ satisfying $-d<a<0<b$.
Now show that the function value at the mid-point $$ m=\frac{af(b)-bf(a)}{f(b)-f(a)}=\frac{ab·c(b-a)}{c(b-a)(b+a+d)}=\frac{ab}{a+b+d}<0 $$ is always negative (or show $m=a\frac{b}{b+(a+d)}>a>-d$), $$ f(m)=c·\frac{ab}{a+b+d}·\frac{ab+da+db+d^2}{a+b+d}=c·\frac{ab(a+d)(b+d)}{((a+d)+b)^2}<0, $$ as $a+d>0$. This means that the midpoint $m$ is always used to replace the left point $a$. For sufficiently small $a$, $$a_+=m\approx\beta a$$ with $\beta=\frac{b}{b+d}$ which establishes the linear convergence.
As $d$ is determined by the curvature of the function $f$, the convergence speed depends only on $b$, the farther $b$ is from $0$, the closer $β$ is to $1$, thus the slower the convergence. This also tells that any measure, no matter how crude, that decreases $b$ will substantially increase the speed of convergence.
To translate this to the more general case, compare the quadratic approximation $$cx(x+d)=cdx+cx^2$$ with the quadratic Taylor polynomial $$f(x^*+x)=f(x^*)+f'(x^*)x+\frac12f''(x^2)x^2+o(x^2)$$ in the root $f(x^*)=0$.
We find $c=2f''(x^*)$ and $cd=f'(x^*)$, which of course only makes sense if both quantities are different from zero. Apply reflections on the $x$ and $y$ axes to get both derivatives and thus both of $c$ and $d$ to be positive.