Under the assumption that buildings are quite a bit higher than their surrounding environment, you could perform a cluster analysis on your height data. Depending on your data, this could lead to several clusters: high buildings, low buildings, surrounding landscape. There are some issues, for example, a high tree might be just as high as a low building.
Alternatively, you could perform some kind of (un)supervised classification in which you could use the height information and possibly other source of information such as not only the height information at the current location, but also the surrounding height.
Once you've determined which area of the map could be classified as city, or urban area, you could provide statistics such as mean and variance to describe the height and variations in the height.
Which analysis works well also depends on which data you are going to use. Very high resolution LIDAR data supports other analysis than very coarse SRTM images. Also take care that some height products have compensated for buildings as they where not interested in them.
Then there is the question of how to do this kind of analysis. I use R and other high level programming languages to do this. These tools have a steep learning curve, but provide ultimate flexibility. I don't use GUI tools such as ArcGIS, so I'm not up to speed how these support the kind of analyses I suggested. You could also take a look at QGis, GRASS, or SAGA. These are open source (and free) GIS tools.
I got good results with mDenoise. This tool uses the Sun's denoising algorithm which removes noise without filtering sharp edges like ridges or peaks. Good for mountainous areas especially high mountains.
You can define the threshold and the number of iterations. You have to try something around to get the best result.
Before denoising ASTER GDEM2:
After denoising ASTER GDEM2 (60 iterations [-n], threshold 0,98 [-t]
):
Best Answer
As Felix suggested, the Zonal Statistics tool worked. The mistake made before was not using a unique ID for the polygon 'zones'. The output worked well: