Summary
The slope of a surface is determined by the angle made by its normal (perpendicular) vector to the "straight up" direction. The relevant vectors are the 3D gradient of the surface (thought of as the zero set of a differentiable function) and the "straight up" vector (0,0,1). Angles are determined by dot products, leading to straightforward formulas. (The use of dot products is equivalent to the Pythagorean Theorem, which therefore seems to pop up everywhere.) Because gradients are involved, the DEM is used to provide discrete approximations to the derivatives. Different slope algorithms result from different approximations to the derivatives.
These ideas are easily extended to compute directional slopes, leading to algorithms for all the first-order topographic calculations performed by GISes, including contouring, flow tracking, and much more. Understanding these concepts gives you the ability to find efficient ways of solving many problems of topographic analysis and to create new algorithms where needed.
The conceptual setting
A DEM is a discrete representation of a surface which we may abstractly think of as the graph of a function f. This means f is thought of as associating a unique real number to each point within the raster's extent, not just the centers of its cells. To compute a slope for a DEM cell, we begin (at least conceptually) with this functional model to derive a formula and then hope to use information in the DEM to approximate the formula's results.
A technical requirement
Letting x and y be Cartesian coordinates for the DEM locations, we may think of f as being a function of those two variables. It has a slope if and only if it is differentiable with respect to (x,y): this rules out locations where folds, sharp peaks, cusps, cliffs, and edges appear.
How slopes are defined
The surface represented by f is, strictly speaking, a set of points in three dimensions. They are all of the form (x, y, f(x,y)). Equivalently, these points all have (Cartesian) coordinates (x,y,z) which are solutions to the equation
z - f(x,y) = 0.
Let's call the left hand side F(x,y,z). Its gradient is a vector with components given by the partial derivatives of F. From the definition of F, these components are (-fx, -fy, 1) where fx is the partial derivative of f with respect to x (conceptually, the slope in the x direction) and fy is the partial derivative of f with respect to y (the slope in the y direction). The gradient is always perpendicular to the surface and the slope is defined to be the angle between the gradient and "straight up", which is parallel to (and in the direction of) the unit vector (0,0,1).
Interpretation
All slope information is contained in this normal vector (-fx, -fy, 1). The third component represents a unit vertical change (of the normal vector, not of the surface!): its "rise". The first two components represent "runs" along the two coordinate axes: a distance of fx backwards along the x-axis and a distance of fy backwards along the y-axis. The Pythagorean Theorem tells us the total run within the XY plane must equal sqrt(fx^2 + fy^2).
When the slope is zero, the normal vector points straight up: it will be parallel to (0,0,1). It has no "run" at all. When the slope is extremely steep--almost a cliff--the normal vector will point almost sideways. To do so, the "run" sqrt(fx^2 + fy^2) must be much larger than the "rise" of 1. Evidently--because we are looking at the normal vector rather than at vectors lying within the surface itself--the usual definition of slope must be reversed: it equals the ratio of the run to the rise for the normal vector. Because we have a unit rise, we can read the slope right off the gradient, obtaining
slope = sqrt(fx^2 + fy^2).
That's the key result. The slope as an angle is the inverse tangent of this quantity (as proclaimed in the ArcGIS documentation, for instance).
Generalization to directional slopes
More generally, the slope in some direction given by a horizontal unit vector (u,v,0) in the XY plane is determined by the complement of the angle it makes to the normal vector (-fx, -fy, 1). The sine of this angle is given by the normalized dot product
sin(directional slope angle) = (u*fx + v*fy) / sqrt(1 + fx^2 + fy^2),
because (u,v,0) * (-fx, -fy, 1) = -(u*fx + v*fy) and the length of (-fx, -fy, 1) is sqrt(1 + fx^2 + fy^2). From this it follows immediately that the maximum slope is in the direction parallel to (fx, fy), the minimum is in the opposite direction (parallel to (-fx, -fy)), and halfway between (parallel to (-fy, fx) in either direction) the slope is always zero. Whence (-fy, fx) always points along a contour line and (fx, fy) always points outwards from a contour line (perpendicular to it) towards higher values of f. Given formulas for the derivatives fx and fy (see the next section), these results provide ways to track along contours or down/up from contours, as is done with contouring algorithms, stream flow and watershed calculations, pathdistance calculations based on slopes encountered along routes, and particle-tracking algorithms.
Computing derivatives
All methods of computing slopes of DEMs ultimately use the key result, typically by estimating the partial derivatives (that is, directional slopes) fx and fy by means of finite differences. For instance, a crude and obvious estimate of fx is to compute the ratio rise:run by looking from the immediate right neighbor of a cell to its immediate left neighbor, subtracting the values of the DEM at those points, and dividing by their distance apart (two cell lengths). Similarly, fy can be estimated by looking from the upper neighbor to the lower neighbor, subtracting the values of the DEM at those points, and dividing by two cell widths.
(ArcGIS uses a more refined estimate. It can be understood either as the slope obtained by approximating f by a biquadratic surface determined by all nine points in a 3 by 3 neighborhood, performing a least-squares fit of a plane to those points, or by using a weighted average of fx (as computed crudely) across the top, middle, and lower horizontal strips and, similarly, a weighted average of fy across the left, middle, and right vertical strips.)
I think this post in the SAGA GIS forum might prove useful in answering your question about how slope is calculated:
https://sourceforge.net/p/saga-gis/discussion/354013/thread/27ecdc6b/
Also, based on my understanding of TWI (as a PhD hydrology student involved in hydrologic modeling), the D-Inf (Tarboton), MFD-md (Qin), DEMON (Costa-Cabral), and MFD (Quinn) with the exponent p=1.1 (Freemann) are the best options for determining the accumulation area 'a' in the TWI calculation.
I think Sorensen's work, and Qin's work, lends credence to my own semi-professional opinion. However, Qin's improved algorithm (MFD-md) hasn't been as throughly tested and used as much as the others have.
When I have used SAGA to calculate TWI, I first calculate the slope using the slope commend in the Terrain Analysis / Morphometry / Slope, Aspect, Curvature module, using the default algorithm mentioned in the forum post. Then I calculate catchment area using the Terrain Analysis / Hydrology / Catchment Area / Catchment Area (Parallel) choice, using either the MFD algorithm or the D-INF algorithm with 1.1 as the convergence factor (p=1.1, from Freeman).
Then I run the TWI option, in Terrain Analysis / Hydrology / Topographic Indices / Topographic Wetness Index (TWI), with the options to convert area to "1/cell size" and use standard calculation. I convert to specific catchment area because this is what the original formulation by Beven and Kirkby called for. As for the difference between the "Standard" and "TOPMODEL", I am not sure what it is - looking into that myself right now.
Reference page 106 onward of the linked pdf for more specific help: http://sourceforge.net/projects/saga-gis/files/SAGA%20-%20Documentation/SAGA%
20Documents/SagaManual.pdf/download
I forgot to add, this all assumes that the DEM has been preprocessed by filling sinks. That is another complicated topic, with two (main) separate options.
I hope this helps!
Tom
P.S. Edits for @reima:
This is something I have only recently dug into, and I can admit I don't think I've reached the bottom yet! I prefer the method of Lindsay and Creed, the minimum impact approach that chooses breeching or filling based on minimizing total topographical impact (officially named "Impact Reduction Algorithm" - IRA) which I thought was implemented in his Terrain Analysis Software tool (formerly TAS, now WhiteBox GAT - link: http://www.uoguelph.ca/~hydrogeo/Whitebox/).
However, even his tool seems to implement other filling schemes:
The sink/depression filling algorithm (basic, but insanely fast) by Wang and Liu (2006) - which I don't believe operates in the IRA manner, but similar to the way ArcMap fills sinks/depressions, just straight up without any breeching.
And the sink/depression filling of Planchon and Darboux (2001), which floods a DEM and then removes the water bit by bit - it can enforce a slope on the filed area, which I think might improve TI calculations.
ArcMap has a new "de-pitting" add-on (http://blogs.esri.com/esri/arcgis/2013/03/05/optimized-tool-for-dem-pit-removal-now-available/) that seems similar to Lindsay and Creeds IRA, but I haven't read the cited paper yet to determine how similar. This method might be worth a look.
I'm also interested in scrutinizing my assumption that TI calculations need filled DEMs. I have three different sized watershed DEMs (<100 sq km, 100-1000 sq km, >1000 sq km), clipped using a shape file from 10 m NED data. These are not filled, since the shape file already provided the watershed delineation. I am going to run the SAGA GIS TI calculation (MFD, p=1.1) on all three watersheds, on both filled and unfilled DEMs, using ArcMaps filling scheme (old and new), and the Wang and Liu algorithm (in Whitebox, maybe in SAGA), and the Planchon and Darboux algorithm (in Whitebox, maybe in SAGA). I will also be calculating the TI values using the TI calculation embedded in my hydrological model.
If you want, I can share these results with you. I might not have them for a month or so though, as I have other more pertinent research that my focus is currently on, but I need to refine my TI calculation process by mid May at the latest.
Best Answer
You need to know something about the meaning, method of acquisition, and processing of the elevation measurements, because slope calculations are fairly sensitive to resolution. You will get lower average slopes, typically, with a coarser resolution or when cell values are cell average elevations rather than spot elevations. In particular, if your grid has been processed by any kind of resampling procedure, that will change the slopes (sometimes dramatically). Note, too, that the average slope within a region is not the same as the slope based on a comparable average of elevations within the same region: the former is going to be at least as great as the latter, and can be tremendously larger. As an extreme example, the average slope in the deeply incised plateaus of West Virginia is high, reflecting the rugged terrain, but a slope based on elevations averaged over large areas (say, over 15 minute intervals) will be almost zero.
Edit
A few years ago I obtained three DEMs of the same area (in Idaho) at 30m resolution, 10m resolution, and a LIDAR dataset (c. 1m resolution) and compared their slope distributions. Here is one graphic from that study:
It shows that as the resolution gets finer, the proportion of high-slope areas goes up. The change from 30m to LIDAR is substantial: the mean slope increases by about 10 degrees. This graph rewards a closer look, too: you can see little change in the low slope areas. Apparently, high-slope rugged areas in the LIDAR DEM get smoothed out in the 10m and 30m DEMs where they become medium-slope areas. Really extreme slopes (above 75 degrees or so) appear only in the LIDAR dataset. Although there may be questions as to which of these datasets is closer to the "truth," clearly the conclusions one draws about slope distributions will vary with resolution.