I am working with a lot of old black and white maps extracted from scanned PDFs. When I use the QGIS georeference the resulting files are much larger than the original 10x to 100x. Is there a way to prevent this. The starting TIFFs are true black and white with 1 bit per pixel encoding (CCITT Group 4 Fax Encoding) but the resulting files are greyscale with 16 bits. Is there a way to get QGIS to make and work with true B&W TIFFs?
Reduce file size of B&W GeoTIFFs
georeferencinggeotiff-tiffqgis
Related Solutions
Manually rotating and cropping 500 images scares me a bit and one factor will need to be considered in all solutions, which is that manual rotation and cropping will mean inconsistent image sizes as you out put unless you also scale each one to uniform dimensions as part of the process. Please also note that skewing an image is different to rotating it. I suspect your scans will not be skewed but will rather be rotated.
The usual way of 'glueing' your images together is to merge them but that normally requires some georeferencing of the individual images beforehand (so the GIS knows how they align).
One solution is to programatically glue the images as they are predictably named. You need the GDAL Python API with Numpy and SciPy. I would create a loop to iterate over my files in order, open each one with GDAL to detect its size in pixels and then use SciPy to scale the image and create a regularized array for the current tile then append that to a 'master array' in the appropriate location, which I would ultimately output as the final merged image. Creating the regular array first helps deal with size inconsistencies as a result of the offset scanning.
Alternatively, if you are confident of your skewing and cropping operations, you could write a simpler script that georeferences all the images for you. This assumes you know where the first image is located and the extents or each subsequent image (I presume each covers a standard sized area). The skewing and cropping means you can't be certain of pixel sizes for an individual image but, as part of your manual operation you could scale each image for a standardized size as mentioned above. Once the size of all images is consistent and since the naming is logical, writing a very simple script to auto-generate world files for each image is easy.
And finally... you could assemble the final image or sections of it in the image processing package you are using to rotate and crop the scanned images. Both Photoshop and GIMP allow you to define a grid and snap layers to it. So, you need to normalize all your images (rotate, crop and scale to a uniform size). Then set your output image size and grid-guide to be a multiple of the individual image dimensions and turn snapping on. Import your images as layers and drag them to the right locations (the snapping ensures they will align correctly). Flatten the image to merge layers down to a single layer, export and georeference. This can be a quick operation and will be quicker than the initial rotation and cropping of each image, so if you have the patience to do that, this will be a breeze (though the most I have every done in one go is 100)! The size of the final output is crucial here as you could easily crash a computer with this number of images depending on their size!
I agree with Vince. Use a mosaic dataset in ArcGIS. It does a good job of on-the-fly cropping of map marginalia by using the footprint of the actual data area. See here :
Best Answer
The right parameters for the compression that you want to use are documented in https://gdal.org/drivers/raster/gtiff.html and they are
By the documentation "The apparent pixel type should be Byte" and that can be forced by the gdalwarp parameter
-ot byte
https://gdal.org/programs/gdalwarp.html#cmdoption-gdalwarp-ot.The problem with the QGIS georeferencer is that if exposes only a small subset of the GDAL options in the GUI. For example, user can select only these compression methods:
What you can do is to generate the GDAL commands with the QGIS georeferencer and edit them manually.
An example: QGIS creates commands like this:
Edited commands:
I changed also the file format of the gdal_translate output into virtual raster (VRT) https://gdal.org/drivers/raster/vrt.html. That is not necessary but there is no benefit at all in writing a physical temporary TIFF file.
I made a test with some real data and the warped output is a 1-bit binary image. Gdalinfo reports such images like this: