R – How to Split a Large .jp2 File into GeoTIFF Tiles Using R

geotiff-tiffjpeg 2000rrasterrgdal

I have the 2018 orthophoto of Luxembourg (RGB IR, entire country, 25 GB) and it is way too large to work with. Furthermore, it is in .jp2 format.

I want to translate it into smaller GeoTIFF tiles, preferably using R or Python.
I tried a few things, but they didn't work. First, I created a grid using a GADM shapefile of the country borders (just so you know where all the variables come from):

## Boundary Luxembourg (GADM)
Lux <- st_read(file.path(wd, "dat", "shp", "gadm36_LUX_0.shp"))
Lux <- st_transform(Lux, crs(landcover))

## create grid
grid <- sf::st_make_grid(Lux, cellsize = c(5000, 5000),
                 crs = proj4string(landcover), what = "polygons")
grid <- grid[sf::st_within(grid, Lux, sparse = FALSE),]

Now, my first attempt was to create virtual raster files to work with:

rast <- gdalbuildvrt(gdalfile = file.path(wd, "dat", "ortho2018_CIR_pays.jp2"),
                     output.vrt = "tmp.vrt",
                     te = st_bbox(grid[1]))
tile <- readGDAL("tmp.vrt")

resulting in

Error in getRasterData(x, band = band, offset = offset, region.dim = region.dim,  : 
  Failure during raster IO
In addition: Warning messages:
1: In getProjectionRef(x, OVERRIDE_PROJ_DATUM_WITH_TOWGS84 = OVERRIDE_PROJ_DATUM_WITH_TOWGS84,  :
  Discarded datum Luxembourg_1930 in CRS definition: +proj=tmerc +lat_0=49.8333333333333 +lon_0=6.16666666666667 +k=1 +x_0=80000 +y_0=100000 +ellps=intl +towgs84=-189.6806,18.3463,-42.7695,-0.33746,-3.09264,2.53861,0.4598 +units=m +no_defs
2: In showSRID(uprojargs, format = "PROJ", multiline = "NO") :
  Discarded datum Unknown based on International 1909 (Hayford) ellipsoid in CRS definition,
 but +towgs84= values preserved

I tried around a bit and when it didn't work, I tried to just translate the whole file into smaller tiles:

path0 <- file.path(wd, "dat", "ortho2018_CIR_pays.jp2")
path1 <- str_replace(path0, ".jp2", "")

for(i in 1:length(grid)){
  cell <- grid[i]
  gdal_translate(src_dataset = path0,
                 dst_dataset = paste0(path1, i, ".tif"),
                 projwin = st_bbox(cell),
                 projwin_srs = crs(landcover)) # landcover is another file that has the same CRS as the jp2
}

which also failed with the following warning (repeated for i in 1:length(grid)):

1: In system(cmd, intern = TRUE) :
  running command '"C:\Program Files\QGIS 3.14.16\bin\gdal_translate.exe" -projwin 69644.0729541355 62126.9030193782 74644.0729541355 67126.9030193782 -of "GTiff" -projwin_srs "+proj=tmerc +lat_0=49.8333333333333 +lon_0=6.16666666666667 +k=1 +x_0=80000 +y_0=100000 +ellps=intl +units=m +no_defs" "D:/Dateien/Projekt46/dat/ortho2018_CIR_pays.jp2" "D:/Dateien/Projekt46/dat/ortho2018_CIR_pays1.tif"' had status 1

Is there a way to get the second attempt working (or the first one, if easier)? I have no clue what the cause of the error could be (data size, data format, reference system, wrong usage of the R commands?)

Edit

gdalinfo on the original file gives:

 [1] "Driver: JP2ECW/ERDAS JPEG2000 (SDK 5.3)"                                                                                                   
 [2] "Files: D:/Dateien/Projekt46/dat/ortho2018_CIR_pays.jp2"     
 [3] "Size is 295000, 415000"                                                                                                                    
 [4] "Coordinate System is:"                                                                                                                     
 [5] "PROJCRS[\"Luxembourg_1930_Gauss\","                                                                                                        
 [6] "    BASEGEOGCRS[\"Luxembourg 1930\","                                                                                                      
 [7] "        DATUM[\"Luxembourg 1930\","                                                                                                        
 [8] "            ELLIPSOID[\"International 1924\",6378388,297.000000000005,"                                                                    
 [9] "                LENGTHUNIT[\"metre\",1]]],"                                                                                                
[10] "        PRIMEM[\"Greenwich\",0,"                                                                                                           
[11] "            ANGLEUNIT[\"degree\",0.0174532925199433]],"                                                                                    
[12] "        ID[\"EPSG\",4181]],"                                                                                                               
[13] "    CONVERSION[\"unnamed\","                                                                                                               
[14] "        METHOD[\"Transverse Mercator\","                                                                                                   
[15] "            ID[\"EPSG\",9807]],"                                                                                                           
[16] "        PARAMETER[\"Latitude of natural origin\",49.8333333333333,"                                                                        
[17] "            ANGLEUNIT[\"degree\",0.0174532925199433],"                                                                                     
[18] "            ID[\"EPSG\",8801]],"                                                                                                           
[19] "        PARAMETER[\"Longitude of natural origin\",6.16666666666667,"                                                                       
[20] "            ANGLEUNIT[\"degree\",0.0174532925199433],"                                                                                     
[21] "            ID[\"EPSG\",8802]],"                                                                                                           
[22] "        PARAMETER[\"Scale factor at natural origin\",1,"                                                                                   
[23] "            SCALEUNIT[\"unity\",1],"                                                                                                       
[24] "            ID[\"EPSG\",8805]],"                                                                                                           
[25] "        PARAMETER[\"False easting\",80000,"                                                                                                
[26] "            LENGTHUNIT[\"metre\",1],"                                                                                                      
[27] "            ID[\"EPSG\",8806]],"                                                                                                           
[28] "        PARAMETER[\"False northing\",100000,"                                                                                              
[29] "            LENGTHUNIT[\"metre\",1],"                                                                                                      
[30] "            ID[\"EPSG\",8807]]],"                                                                                                          
[31] "    CS[Cartesian,2],"                                                                                                                      
[32] "        AXIS[\"easting\",east,"                                                                                                            
[33] "            ORDER[1],"                                                                                                                     
[34] "            LENGTHUNIT[\"metre\",1]],"                                                                                                     
[35] "        AXIS[\"northing\",north,"                                                                                                          
[36] "            ORDER[2],"                                                                                                                     
[37] "            LENGTHUNIT[\"metre\",1]],"                                                                                                     
[38] "    ID[\"EPSG\",2169]]"                                                                                                                    
[39] "Data axis to CRS axis mapping: 1,2"                                                                                                        
[40] "Origin = (48000.000000000000000,139000.000000000000000)"                                                                                   
[41] "Pixel Size = (0.200000000000000,-0.200000000000000)"                                                                                       
[42] "Metadata:"                                                                                                                                 
[43] "  COLORSPACE=RGB"                                                                                                                          
[44] "  COMPRESSION_RATE_TARGET=0"                                                                                                               
[45] "Corner Coordinates:"                                                                                                                       
[46] "Upper Left  (   48000.000,  139000.000) (  5d43' 7.15\"E, 50d10'59.10\"N)"                                                                 
[47] "Lower Left  (   48000.000,   56000.000) (  5d43'31.76\"E, 49d26'12.84\"N)"                                                                 
[48] "Upper Right (  107000.000,  139000.000) (  6d32'40.85\"E, 50d11' 0.00\"N)"                                                                 
[49] "Lower Right (  107000.000,   56000.000) (  6d32'20.09\"E, 49d26'13.72\"N)"                                                                 
[50] "Center      (   77500.000,   97500.000) (  6d 7'54.97\"E, 49d48'39.07\"N)"                                                                 
[51] "Band 1 Block=256x256 Type=Byte, ColorInterp=Red"                                                                                           
[52] "  Description = Red"                                                                                                                       
[53] "  Overviews: 147500x207500, 73750x103750, 36875x51875, 18437x25937, 9218x12968, 4609x6484, 2304x3242, 1152x1621, 576x810, 288x405, 144x202"
[54] "Band 2 Block=256x256 Type=Byte, ColorInterp=Green"                                                                                         
[55] "  Description = Green"                                                                                                                     
[56] "  Overviews: 147500x207500, 73750x103750, 36875x51875, 18437x25937, 9218x12968, 4609x6484, 2304x3242, 1152x1621, 576x810, 288x405, 144x202"
[57] "Band 3 Block=256x256 Type=Byte, ColorInterp=Blue"                                                                                          
[58] "  Description = Blue"                                                                                                                      
[59] "  Overviews: 147500x207500, 73750x103750, 36875x51875, 18437x25937, 9218x12968, 4609x6484, 2304x3242, 1152x1621, 576x810, 288x405, 144x202"

Besides, when I try to load the entire file using readGDAL it returns Error: cannot allocate vector of size 1368.2 Gb while the file itself is about 25 Gb. Is this normal?

Edit #2

I also tried to use readGDAL setting an extent:

# file path
RGBIR_path.jp2 <- file.path(wd, "dat", "ortho2018_CIR_pays.jp2")
# gdal info for x, y resolution and origin
info <- rgdal::GDALinfo(RGBIR_path.jp2)
# function to get parameters for readGDAL
return_region <- function(gdal_info, extent, left = NA, right = NA,
                          top = NA, bottom = NA){
  if(!missing(extent)){
    left <- extent[1]
    right <- extent[2]
    bottom <- extent[3]
    top <- extent[4]
  }
  x_res <- gdal_info[6]
  y_res <- gdal_info[7]
  rast_bottomleft_x <- gdal_info[4]
  rast_bottomleft_y <- gdal_info[5]
  offset_x <- round((left - rast_bottomleft_x) / x_res, 0)
  offset_y <- round((bottom - rast_bottomleft_y) / y_res, 0)
  columns <- (right - left) / x_res
  rows <- (top - bottom) / y_res
  return(list(c(offset_y, offset_x), c(rows, columns)))
}

cell <- grid[1] # grid is the same as in the examples above
cell <- as_Spatial(cell)
ex <- extent(cell)

offs <- return_region(info, ex)[[1]]
dims <- return_region(info, ex)[[2]]
rast <- rgdal::readGDAL(fname = RGBIR_path.jp2,
                        offset = offs,
                        region.dim = dims)

This also resulted in Error in getRasterData(x, band = band, offset = offset, region.dim = region.dim, : Failure during raster IO

Edit #3

Also, when I try to open the file in Python using

def dataset(ds, wd = wd):
    names = {"IR": "ortho2018_CIR_pays.jp2",
             "RGB": "ortho2018_RGB_pays.jp2",
             "Landcover": "Landcover2018_raster" + os.sep + "LC_2018_20cm.tif"}
    return os.path.join(wd, "dat", names[ds])

from osgeo import gdal, ogr, osr
source_ds = dataset("IR")
rast = gdal.Open(source_ds)

It can tell me the reference system:

crs = rast.GetProjectionRef()

But when I run

stats = rast.GetRasterBand(1).GetStatistics(0, 1)

it returns

[0.0, 0.0, 0.0, -1.0]

for what should be the min, mean, max and stdev values for the first raster band. This is a bit strange. I can open the .jp2 file in programs such as ArcGIS Pro, though… and it has colours in it, not just zeros…

Edit #4

I tried the gdal_retile.py answer, which resulted in

ERROR 1: Marker is not compliant with its position

ERROR 1: opj_decode() failed
ERROR 1: ortho2018_CIR_pays.jp2, band 1: IReadBlock failed at X offset 0, Y offs
et 0: opj_decode() failed
ortho2018_CIR_pays.jp2, band 1: IReadBlock failed at X offset 0, Y offset 0: opj_decode() failed
Traceback (most recent call last):
  File "C:\Users\Manuel\Python\Scripts\gdal_retile.py", line 11, in <module>
    sys.exit(main(sys.argv))
  File "C:\Users\Manuel\Python\lib\site-packages\osgeo\utils\gdal_retile.py", line 920, in main
    dsCreatedTileIndex = tileImage(g, minfo, ti)
  File "C:\Users\Manuel\Python\lib\site-packages\osgeo\utils\gdal_retile.py", line 354, in tileImage
    createTile(g, minfo, offsetX, offsetY, width, height, tilename, OGRDS, feature_only)
  File "C:\Users\Manuel\Python\lib\site-packages\osgeo\utils\gdal_retile.py", line 521, in createTile
    s_fh = minfo.getDataSet(dec.ulx + offsetX * dec.scaleX, dec.uly + offsetY * dec.scaleY + height * dec.scaleY,
  File "C:\Users\Manuel\Python\lib\site-packages\osgeo\utils\gdal_retile.py", line 270, in getDataSet
    t_band.WriteRaster(tw_xoff, tw_yoff, tw_xsize, tw_ysize, data)
  File "C:\Users\Manuel\Python\lib\site-packages\osgeo\gdal.py", line 3518, in WriteRaster
    return _gdal.Band_WriteRaster(self, *args, **kwargs)
TypeError: not a unicode string or a bytes

Best Answer

The system gdal tools come with a python script for doing this. For example to split dem.tif into 512x512 tiles plus remainder:

$ gdal_retile.py -ps 512 512 -targetDir . dem.tif 
0...10...20...30...40...50...60...70...80...90...100 - done.

Giving a set of tiles:

$ ls -1hs
total 6.1M
1.1M dem_1_1.tif
632K dem_1_2.tif
892K dem_2_1.tif
548K dem_2_2.tif
3.1M dem.tif

If you really need to run this from R use the system call with a string constructed for your file name and tile size:

> system("gdal_retile.py -ps 512 512 -targetDir . dem.tif")
0...10...20...30...40...50...60...70...80...90...100 - done.

See: https://gdal.org/programs/gdal_retile.html