Shapely's affinity
module supports affine transformations on any geometry, for example:
from shapely import affinity
from shapely.geometry import LineString
# Example geometry
line = LineString([(1, 3), (1, 1), (4, 1)])
# Rotate 30 degrees CCW from origin at the center of bbox
line_rot_center = affinity.rotate(line, 30, 'center')
# Rotate from origin at (1, 1)
line_rot_11 = affinity.rotate(line, 30, (1, 1))
It is not clear precisely what form the data are in, but ultimately any solution will have to use the equations for the Mercator projection of an ellipsoid. It sounds like it's possible to identify the latitude and longitude for at least two pixels, such as a given one and one at an origin. Along with the datum, this will suffice.
The datum, among other things, describes the size and shape of the ellipsoid. We will need its eccentricity e, which is related to the flattening via
e^2 = f * (2 - f).
For WGS 84, for instance,
1 / f = 298.257 223 563
giving e^2 = 0.00669438 and e is approximately 0.08181919084. (For the sphere, f and e are both zero, considerably simplifying the following equations.)
We also need the equatorial radius (semi-major axis) of the ellipsoid, a; for WGS 84, a = 6,378,137.0 meters.
Every image or map has a (global) scale: it is the amount by which the projection coordinates are uniformly multiplied to place projected points (which are in meters) on the images (which is in pixels). Let's call this scale S; its unit of measurement is therefore pixels per meter. Then, relative to an origin on the image, the x (easting) and y (northing) coordinates of the Mercator projection of the point with longitude lambda and latitude phi are
x = S * a * lambda
y = S * a * [log(tan(pi/4 + phi/2)) + (e/2) * (log(1 - e * sin(phi)) - log(1 + e * sin(phi)))]
Although this may look nasty, notice that e and a are known constants, S is an unknown constant, and phi is known for each pixel (presumably, it is the latitude of the pixel's center). Let's encapsulate that information by abbreviating the formula for y as
y = S * a * M(phi; e, a)
(Incidentally, M(phi; 0, a) = log(tan(pi/4 + phi/2)) has a particularly simple form for the sphere.)
Suppose we have a pixel at row i0 and column j0 corresponding to (lambda0, phi0) on the earth, and another pixel at row i1 and column j1 corresponds to (lambda1, phi1). Then we can deduce S in two ways from the two equations. The x equation will work provided the pixels are in different columns and the y equation will work provided the pixels are in different rows:
S = (j1 - j0) / (lambda1 - lambda0) / a
S = (i1 - i0) / (M(phi1; e, a) - M(phi0; e, a)) / a
At least one of these equations will work. With S now known, we can compute the Mercator coordinates for any point (lambda, phi) on the globe directly. We can also invert the projection numerically or by using a (rapidly) iterative algorithm or with a trigonometric series, as described by Snyder on p. 44.
From the information in the question we might guess that S equals 1 pixel / 1 Km = 0.001 pixels per meter. However, that presumes the projection has true scale at the Equator. The allusion to a reference latitude in a comment to the question suggests that the projection has already been rescaled by some amount. That is why it appears necessary to compute S. Otherwise, we would not need information about two pixels: the latitude, longitude, row, and column of a single pixel would suffice.
Reference
Snyder, John P. Map Projections--A Working Manual. USGS professional paper 1395, 1987.
Best Answer
In other words, you want to create a World file from the coordinates of the 4 corners and the width and height of the image
You get the width and height of the image with osgeo.gdal, rasterio or any other libraries to open image files as Pillow and others.
you need to extract the x and y values of the corners from the XML file (I don't know the structure of the XML file)
My corners:
create a World file with a simple script (without gdal or rasterio) with only 2 corners points
with gdal and the 4 corners points as ground control points.
There are other solutions like tab2tfw.py of Michael Kalbermatten (this is exactly the same problem as MapInfo tab file) or using affine6p, nudged-py or Affine_Fit to estimate the affine transformation parameters between two sets of 2D points but but be careful that raster data, coming from its image processing origins, uses a different referencing system to access pixels (see Python affine transforms)
Example with Numpy and the 4 corners points (Affine transformation) ( 0,0 origin in the upper left )
Example of result with nugged ( 0,0 origin in the upper left )