You don't have enough data.
@Michael Miles-Stimson pretty much covered it in his comment. You need a dense LiDAR survey and then a 3rd party plugin or software (there are no OOTB tools for this in ArcGIS) for feature extraction. Even then the features will likely need to be vectorized and cleaned up manually in 3D editing software (Again ArcGIS has an ArcScene viewer and 3D geoprocessing functionality but no actual 3D editing features).
Photogrametry may be an option if your imagery is of sufficient resolution, but you will need to use 3rd party software. Esri does not have any tools for this. Furthermore, from the little architectural photogrametry I have done I found it to be a very time consuming process. There are white papers describing automated building extraction using photogrametry and probably specialized software that can acheive semi-automation but LiDAR building extraction may have diminished these efforts.
What you describe is possible with typical urban vector datasets provided they contain building heights but sans the roofs. In my city we have this type of data and I can extrude then entire city in one shot using 3D Analyst and although the buildings are correctly extruded footprints and good enough for basic analysis, much manual work, as well as additional survey data for each building, is needed to build realistic roof tops.
With what you have (satellite and aerial imagery) you still don't have enough data even if you were to digitize manually because you still need to assign each footprint either an initial "ground" height as well as the actual height to extrude to. if you already have building heights you still need to asign the initial "ground level" elevation and for this you will need the surface DEM or TIN.
Here is the typical workflow to create a 3D building in ArcGIS:
Digitize footprint to a vector polygon
Use DEM to obtain footprint ground level elevation OR
convert footprint to 3D polygon
store building height in the footprint attributes
Extrude footprint and convert to a 3D multipatch feature
Convert 3D multipatch feature to a COLLADA file
Import COLLADA to SketchUp and draw a rooftop or other addtional details (ensure all surfaces are closed and you have no dangling lines)
back in ArcScene - replace the original multipatch feature with edited SketchUp file.
Repeat for every building in the city
NOTE: you might be able to find some older techniques techniques for doing this with an ArcGIS plugin for SketchUp. This no longer aplies - ESRI no longer maintains this plugin and if you can find it it only works with ArcGIS version 9.3 and older and SketchUp version 6 and older. It was possible to export TINs, polygons, polylines and points export from ArcGIS to SketchUp using the old ESRI plugin and model entire city blocks in SketchUp. ESRI removed the functionality of "easily exporting ArcGIS data to SketchUp" in version 10 and the software adopted the "COLLADA" workflow approach instead which enable users to add detail to individual 3D features using an external editor (SketchUp). This forces ArcGIS users to build their entire 3D models composed of many features inside ArcGIS but no longer allows to easily export and build models based on GIS data outside of ArcGIS.
BTW, if you want to try photogrametry, the free Sketchup version has architectural photogrametry tools
Best Answer
If you want to use QGIS I suggest to use this workflow:
Data needed:
Software need:
Workflow:
You will have something similar to the first image
in the QGis2threejs tool, you chose COLLADA model as object type for the point layer.(see image 2)
- as Collada file, you either choose one which is exported via Sketchup or a file downloaded. - The values for scale and rotation I mostly use trial and error approach and use fixed values. - As DEM use the file with the heights and use it as the DEM layer - When clicking on run you will get a website which allows flying through the model as shown in image 3
I've tested the workflow with QGIS 2.18.12 SRTM Data and fast created Collada models