[Tex/LaTex] Pixel-perfect vertical alignment of image-rendered TeX snippets

csshtmlpngrenderingtext-mode

What is the most accurate way to automatically calculate the text baseline in an image that has been rendered from a TeX snippet—so that the rendered image can be given proper vertical alignment in a block of text?

My current approach (which isn’t working in all cases):

Here’s an example of what I’m trying to do. Note that these are screenshots of a web page (HTML + CSS + Latin Modern fonts for web) and not of a TeX document. The web page is mostly paragraphs of text, but contains embedded PNG images (rendered TeX snippets) for the formulas involving square roots. Here’s how I want it to look:

big good

But here’s what I’m getting…

big bad

The second—the smaller—square root formula is aligned correctly, and it was done automatically. I’m calculating the baseline by first rendering a snippet consisting of only a “.” character, and then by measuring the height of the resulting image, after cropping away everything below the “.”, this tells me how much I need to lower the image (using CSS’s vertical-align style) in order to align it with the surrounding text’s baseline. This works well for formulas that aren’t too tall.

Where it fails is, well, taller formulas, as shown above. In the case of the first—the larger—square root formula, it needs to be lowered less than normal, because it extends higher than normal. My calculations for this are currently wrong, and I’m wondering how I can fix this.

Alternatives?

What are some ways of measuring the baseline (in pixels) of a snippet? I can’t really use \documentclass{standalone} for this because it crops the page as tightly as possible, which produces different image heights for $x$, $X$, and \sqrt{x}. I’m thinking I may need to render a calibration snippet consisting of two blank lines prior to a lone . (or perhaps a bottom-aligned horizontal rule) instead of just a single . character—but that seems a bit kludgey.

Is there a way to coax TeX into not placing a formula lower on the page when it is taller than standard text? That is, is there some way I can cause a formula at the top of a page to protrude upward into the top margin?

A second problem

I also noticed that I’m seeing sub-pixel alignment problems. Below are screenshots scaled up to 400% actual size: This formula is ¾ pixel too low:

too low

At first, I thought I was calculating the vertical-align value wrong, so I manually moved it up by one pixel, but then it turns out that it's ¼ pixel too high—which means the problem lies within the image rather than the alignment value:

too high

I suspect this is fixable by making sure I round the image heights up to the nearest multiple of 4 before I downsample them for embedding in the page. Just wondering if anyone has tackled this problem before, and has any tips. I’m encouraged by these results so far, but doing this correctly turns out to be a lot more subtle than I expected it would be. Naïvely, when I first started this, I hadn’t considered tall formulas or even vertical alignment at all.

Best Answer

Answering my own question after much research, experimentation, and testing. stevem's pointer to to the Mac OS X TeX Toolbox approach (store the TeX snippet in a box and write the height, width, and depth to a file) was the crucial key to the puzzle. I followed that approach, made some adjustments and additions, and came up with a solution that is not only pixel-perfect but is also subpixel-perfect, and also holds up under magnification.

First, a screenshot demonstration of the results before discussing the technique. The following paragraph is, by design, a very ugly mess. However, the baselines of all the rendered TeX snippets do all line up properly—which is the goal:

Size +0

I don’t actually use Times Roman in my HTML pages—I use HTML/CSS versions of TeX’s Latin Modern fonts—but I wanted to make the transitions between paragraph text and embedded TeX snippets here visually obvious.

Technique

Doing this correctly is not easy. There are many places where subtle errors can be introduced—especially if shortcuts are taken. Correct vertical alignment cannot be a single-step process. To achieve proper baseline alignment, it is necessary to measure, pad, re-measure, crop, re-measure, re-crop, re-measure, and finally re-pad the image before downsampling.

Here are the fundamental steps:

  1. Write the (La)TeX snippet to a file, encapsulated by specially designed preamble/postamble which will cause TeX to write the width, height, and depth (in TeX points) of the snippet to a file. This preamble uses the geometry package to specify a specific page size with enough margin padding (I use 4pt) to avoid clipping anomalies with glyphs whose physical size exceed their virtual size (a very common occurrence).
  2. Invoke pdflatex to compile the TeX file to PDF.
  3. Invoke gs (Ghostscript) to convert the PDF to a PNM image. Specify 4-bit anti-aliasing and a DPI representing 16x oversampling. The exact value of the DPI is not obvious and works out to 1850.112 dpi. (Ghostscript does take fractional DPI values on the command line.) I’ll explain the derivation of this number later below.
  4. Read the width and height from the PNM image and determine the actual snippet depth in pixels, using the image height and the dimensions written by TeX.
  5. Crop whitespace from only the bottom of the image and re-measure the new height of the image. The difference is crucial later in calculating the exact proper value for the vertical-align property of the <img>.
  6. Now crop whitespace from the top and sides of the image.
  7. Now prepare to pad the image again with whitespace. This time, however, we won't add 4pt of padding (which would be a lot), but only just enough to round up the size to a multiple of 16 pixels, so that downsampling is perfect. First calculate the bottom padding amount as the snippet depth plus the page margin minus the amount cropped from the bottom, and round this up to the next multiple of 16. This does not necessarily result in an overall image height that is a multiple of 16; only the distance from the baseline to the bottom will be. Thus, we still need to pad the top, so compute that value after the bottom padding is added, and set the top padding such that overall image height will be a multiple of 16. Padding the left and right is easier: just calculate the new image width to be the next higher multiple of 16 and divide the difference between the two sides.
  8. Now pad the image with whitespace using the precise values calculated in the previous step. The size of the resulting image will be a multiple of 16 in both dimensions, and has the additional property that the baseline of the text is also an exact multiple of 16 pixels from the bottom of the image. OK, now all the hard work is done. The rest is easy.
  9. Downsample the PNM file by a factor of 4 (not 16!). Darken this slightly using a gamma curve adjustment. Convert the result to PNG.
  10. Read the PNG image and encode it as base-64 data inside an <img> tag for direct embedding in the HTML file. Set the height= and width= attributes to 1/4 of the size of the PNG image (which is 1/16 of the original image). This will cause the web browser to scale the image down on-the-fly to 1/4 actual size, but will also allow the user to magnify the font size in the web page and still have the TeX snippets look great. Set the vertical-align: property of the style= attribute to be the negation of the padded snippet depth divided by 16. This will raise or—more commonly—lower the image below the text baseline when the paragraph is rendered on the HTML page.

Those are the fundamental steps. The details are a bit more subtle, so I’ll include at the bottom of this answer a Perl program which converts arbitrary blocks of text with $-delimited TeX snippets.

Why 1850.112 dpi?

The number 1850.112 is (96 × 12 ÷ 10) × (72.27 ÷ 72) × (4 × 4).

  • 96 is the screen dpi that modern web browsers assume.
  • 12÷10 is the ratio 12pt/10pt. Typical default font size in modern web browsers is 12pt, versus the TeX default of 10pt (which is in use in the snippet template).
  • 72.27÷72 is the ratio of TeX points to HTML points. This ratio is very close to 1, but without it there will be an error of approximately 1 pixel per 300 pixels.
  • 4×4 is the oversampling factor. The first 4 is for oversampling at the rendering step (PDF-to-PNG) and the second 4 is for oversampling at the display step (on-the-fly image scaling in the browser).

You could probably get away with omitting the 72.27/72 factor without anyone noticing (this would give 1843.2 dpi instead of 1850.112 dpi), but the important thing is not to settle for some arbitrarily chosen dpi like 1200 or 600. Good results depend on integer-multiple downsampling, and that means telling Ghostscript whatever weird dpi should happen to be necessary to make that happen.

Wait, really?

Yup. An in fact, the 96 × 12 ÷ 10 portion is actually 96 × ((16 ÷ 96 × 72) ÷ 10). Here is the full derivation, with units:

96 Hpx/in × ((16 Hpx ÷ 96 Hpx/in) × 72 Hpt/in) ÷ 10 Tpt) × (72.27 Tpt/in ÷ 72 Hpt/in) × (4 Ppx/Hpx × 4 Rpx/Ppx)

where Tpt is TeX points (1/72.27 in), Hpt is HTML points (1/72 in), Hpx is HTML pixels, Ppx are PNG pixels, and Rpx and rendering pixels.

This reduces to:

96 Hpx/in × (16 ÷ 96 × 72 ÷ 10 Hpt/Tpt) × (72.27 ÷ 72 Tpt/Hpt) × (4 × 4 Rpx/Hpx)

or:

96 × 16 ÷ 96 × 72 ÷ 10 × 72.27 ÷ 72 × 4 × 4 Hpx/in Hpt/Tpt Tpt/Hpt Rpx/Hpx

Cancelling out terms and units gives:

1850.112 Rpx/in

or in other words 1850.112 dpi. Note that this is 115.632 dpi with 16x oversampling.

Stepping through font sizes from smallest to largest

Here is the same page from above but now shown in different font sizes. This is Safari on Mac OS X. The page was loaded with default settings, then Command - and Command + were used to shrink and grow the text size. The baseline alignment is correct at all sizes.

Size -4 Size -3 Size -2 Size -1 Size +0 Size +2 Size +4 Size +6

Program for automation of this technique

Below is a Perl program which converts an input paragraph of text containing $-delimited TeX snippets to an HTML page with embedded PNG images. It is assumed that you have Ghostscript the PNM Tools.

#!/usr/bin/perl -w
#==============================================================================
#
#   CONVERT SIMPLE PLAIN TEXT TO HTML WITH TEX MATH SNIPPETS
#
#   This program takes on standard input a simple text file containing TeX
#   arbitrary math snippets (delimited by '$'s) and produces on standard
#   output an HTML document with PNG images embedded in <IMG> tags.
#
#   This program demonstrates conversion techniques and is not intended for
#   production use.
#
#   Todd S. Lehman
#   February 2012
#

use strict;


#------------------------------------------------------------------------------
#
#   RUN EXTERNAL COMMAND VIA BOURNE SHELL
#

sub run_command (@) {
    my $origcmdline = join(" ", grep {defined} @_);
    return if $origcmdline eq "";

    my $cmdline = $origcmdline;
    $cmdline =~ s/(["\\])/\\$1/g;
    $cmdline = qq{/bin/sh -c "($cmdline) 2>&1"};

    my $output = `$cmdline`;

    my ($exit_value, $signal_num, $dumped_core) = ($?>>8, $?&127, $?&128);
    $exit_value == 0 or die
      "FAILED: $origcmdline\n" .
      "   \$! = $!\n" .
      "   \$@ = $@\n" .
      "   EXIT_VALUE = $exit_value\n" .
      "   SIGNAL_NUM = $signal_num\n" .
      "   DUMPED_CORE = $dumped_core\n" .
      "   OUTPUT = $output\n";

    return $output;
}


#------------------------------------------------------------------------------
#
#   ROUND NUMBER UP TO THE NEXT HIGHER MULTIPLE
#

sub round_up ($$) {
    my ($num, $mod) = @_;
    return $num + ($num % $mod == 0?  0 : ($mod - ($num % $mod)));
}


#------------------------------------------------------------------------------
#
#   FETCH WIDTH AND HEIGHT FROM PNM FILE
#

sub pnm_width_height ($) {
    my ($filename) = @_;
    $filename =~ m/\.pnm$/ or die "$filename: not .pnm";

    open(PNM, '<', $filename) or die "$filename: can't read";
    my $line = <PNM>;  # Skip first line.
    do { $line = <PNM> }
        while $line =~ m/^#/;  # Read next line, skipping comments
    close(PNM);

    my ($width, $height) = ($line =~ m/^(\d+)\s+(\d+)$/);
    defined($width) && defined($height)
        or die "$filename: Couldn't read image size";
    return ($width, $height);
}


#------------------------------------------------------------------------------
#
#  COMPILE LATEX SNIPPET INTO HTML
#
#  This routine caches results in the /tmp directory.  Snippets are named and
#  indexed by their SHA-1 hash.
#

sub tex_to_html ($$) {
    my ($tex_template, $tex_snippet) = @_;

    my $render_antialias_bits = 4;
    my $render_oversample = 4;
    my $display_oversample = 4;
    my $oversample = $render_oversample * $display_oversample;
    my $render_dpi = 96*1.2 * 72.27/72 * $oversample;  # This is 1850.112 dpi.


    # --- Generate SHA-1 hash of TeX input for caching.

    (my $tex_input = $tex_template) =~ s{<SNIPPET>}{$tex_snippet};
    my $hash = do { use Digest::SHA; uc(Digest::SHA::sha1_hex($tex_input)); };
    my $file = "/tmp/tex-$hash";


    # --- If the image has already been compiled, then simply return the
    #     cached result.  Otherwise, continue and create the image.

    if (open(HTML, '<', "$file.html")) {
        my $html = do { local $/; <HTML> };
        close(HTML);
        return $html;
    }


    # --- Write TeX source and compile to PDF.

    open(TEX, '>', "$file.tex") and print TEX $tex_input and close(TEX)
        or die "$file.tex: can't write";

    run_command(
        "pdflatex",
        "-halt-on-error",
        "-output-directory=/tmp",
        "-output-format=pdf",
        "$file.tex",
        ">$file.err 2>&1"
    );


    # --- Convert PDF to PNM using Ghostscript.

    run_command(
        "gs",
        "-q -dNOPAUSE -dBATCH",
        "-dTextAlphaBits=$render_antialias_bits",
        "-dGraphicsAlphaBits=$render_antialias_bits",
        "-r$render_dpi",
        "-sDEVICE=pnmraw",
        "-sOutputFile=$file.pnm",
        "$file.pdf"
    );

    my ($img_width, $img_height) = pnm_width_height("$file.pnm");
    #print "# img_width=$img_width\n";
    #print "# img_height=$img_height\n";
    #print "# \n";


    # --- Read dimensions file written by TeX during processing.
    #
    #     Example of file contents:
    #       snippetdepth = 6.50009pt
    #       snippetheight = 13.53899pt
    #       snippetwidth = 145.4777pt
    #       pagewidth = 153.4777pt
    #       pageheight = 28.03908pt
    #       pagemargin = 4.0pt

    my $dimensions = {};
    do {
        open(DIMENSIONS, '<', "$file.dimensions")
            or die "$file.dimensions: can't read";
        while (<DIMENSIONS>) {
            if (m/^(\S+)\s+=\s+(-?[0-9\.]+)pt$/) {
                my ($value, $length) = ($1, $2);
                $length = $length / 72.27 * $render_dpi;
                $dimensions->{$value} = $length;
            } else {
                die "$file.dimensions: invalid line: $_";
            }
        }
        close(DIMENSIONS);
    };

    #foreach (keys %$dimensions) { print "# $_=$dimensions->{$_}px\n"; }
    #print "# \n";


    # --- Crop bottom, then measure how much was cropped.

    run_command("pnmcrop -white -bottom $file.pnm >$file.bottomcrop.pnm");

    my ($img_width_bottomcrop, $img_height_bottomcrop) =
        pnm_width_height("$file.bottomcrop.pnm");

    my $bottomcrop = $img_height - $img_height_bottomcrop;
    #printf "# Cropping bottom:  %d pixels - %d pixels = %d pixels cropped\n",
    #    $img_height, $img_height_bottomcrop, $bottomcrop;


    # --- Crop top and sides, then measure how much was cropped from the top.

    run_command("pnmcrop -white $file.bottomcrop.pnm >$file.crop.pnm");

    my ($cropped_img_width, $cropped_img_height) =
        pnm_width_height("$file.crop.pnm");

    my $topcrop = $img_height_bottomcrop - $cropped_img_height;
    #printf "# Cropping top:  %d pixels - %d pixels = %d pixels cropped\n",
    #    $img_height_bottomcrop, $cropped_img_height, $topcrop;


    # --- Pad image with specific values on all four sides, in preparation for
    #     downsampling.

    # Calculate bottom padding.
    my $snippet_depth =
        int($dimensions->{snippetdepth} + $dimensions->{pagemargin} + .5)
            - $bottomcrop;
    my $padded_snippet_depth = round_up($snippet_depth, $oversample);
    my $increase_snippet_depth = $padded_snippet_depth - $snippet_depth;
    my $bottom_padding = $increase_snippet_depth;
    #printf "# Padding snippet depth:  %d pixels + %d pixels = %d pixels\n",
    #    $snippet_depth, $increase_snippet_depth, $padded_snippet_depth;


    # --- Next calculate top padding, which depends on bottom padding.

    my $padded_img_height = round_up(
        $cropped_img_height + $bottom_padding,
        $oversample);
    my $top_padding =
        $padded_img_height - ($cropped_img_height + $bottom_padding);
    #printf "# Padding top:  %d pixels + %d pixels = %d pixels\n",
    #    $cropped_img_height, $top_padding, $padded_img_height;


    # --- Calculate left and right side padding.  Distribute padding evenly.

    my $padded_img_width = round_up($cropped_img_width, $oversample);
    my $left_padding = int(($padded_img_width - $cropped_img_width) / 2);
    my $right_padding = ($padded_img_width - $cropped_img_width)
                        - $left_padding;
    #printf "# Padding left = $left_padding pixels\n";
    #printf "# Padding right = $right_padding pixels\n";


    # --- Pad the final image.

    run_command(
        "pnmpad",
        "-white",
        "-bottom=$bottom_padding",
        "-top=$top_padding",
        "-left=$left_padding",
        "-right=$right_padding",
        "$file.crop.pnm",
        ">$file.pad.pnm"
    );


    # --- Sanity check of final size.

    my ($final_pnm_width, $final_pnm_height) =
        pnm_width_height("$file.pad.pnm");
    $final_pnm_width % $oversample == 0
        or die "$final_pnm_width is not a multiple of $oversample";
    $final_pnm_height % $oversample == 0
        or die "$final_pnm_height is not a multiple of $oversample";


    # --- Convert PNM to PNG.

    my $final_png_width  = $final_pnm_width  / $render_oversample;
    my $final_png_height = $final_pnm_height / $render_oversample;

    run_command(
        "cat $file.pad.pnm",
        "| ppmtopgm",
        "| pamscale -reduce $render_oversample",
        "| pnmgamma .3",
        "| pnmtopng -compression=9",
        "> $file.png"
    );


    # --- Convert PNG to HTML.

    my $html_img_width  = $final_png_width  / $display_oversample;
    my $html_img_height = $final_png_height / $display_oversample;

    my $html_img_vertical_align = sprintf("%.0f",
        -$padded_snippet_depth / $oversample);

    (my $html_img_title = $tex_snippet) =~
        s{([&<>'"])}{sprintf("&#%d;",ord($1))}eg;

    my $png_data_base64 = do {
        open(PNG, '<', "$file.png") or die "$file.png: can't open";
        binmode PNG;
        my $png_data = do { local $/; <PNG> };
        close(PNG);
        use MIME::Base64;
        MIME::Base64::encode_base64($png_data);
    };
    #$png_data_base64 =~ s/\s+//g;

    my $html =
        qq{<img\n} .
        qq{ width=$html_img_width} .
        qq{ height=$html_img_height} .
        qq{ style="vertical-align:${html_img_vertical_align}px;"} .
        qq{ title="$html_img_title"} .
        qq{ src="data:image/png;base64,\n$png_data_base64" />};

    open(HTML, '>', "$file.html") and print HTML $html and close(HTML)
        or die "$file.html: can't write";


    # --- Clean up and return result to caller.

    run_command(
        "rm -f",
        "${file}{.*,}.{tex,aux,dvi,err,log,dimensions,pdf,pnm,png}"
    );

    return $html;
}



#------------------------------------------------------------------------------
#
#   MAIN CONTROL
#

binmode(STDIN,  ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");

my $tex_template = do { local $/; <DATA> };
my $input = do { local $/; <STDIN> };

(my $html = $input) =~ s{\$(.*?)\$}{tex_to_html($tex_template,$1)}seg;

$html =~ s{([^\s<>]*<img.*?>[^\s<>]*)}
          {<span style="white-space:nowrap;">$1</span>}sg;

print <<EOT;
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html 
 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title></title>
</head>
<body>
<p>
$html
</p>
</body>
</html>
EOT

exit(0);


#------------------------------------------------------------------------------
#
#   LATEX TEMPLATE
#

__DATA__
\documentclass[10pt]{article}
\pagestyle{empty}
\setlength{\topskip}{0pt}
\setlength{\parindent}{0pt}
\setlength{\abovedisplayskip}{0pt}
\setlength{\belowdisplayskip}{0pt}

\usepackage{geometry}

\usepackage{amsmath}

\newsavebox{\snippetbox}
\newlength{\snippetwidth}
\newlength{\snippetheight}
\newlength{\snippetdepth}
\newlength{\pagewidth}
\newlength{\pageheight}
\newlength{\pagemargin}

\begin{lrbox}{\snippetbox}%
$<SNIPPET>$%
\end{lrbox}

\settowidth{\snippetwidth}{\usebox{\snippetbox}}
\settoheight{\snippetheight}{\usebox{\snippetbox}}
\settodepth{\snippetdepth}{\usebox{\snippetbox}}

\setlength\pagemargin{4pt}

\setlength\pagewidth\snippetwidth
\addtolength\pagewidth\pagemargin
\addtolength\pagewidth\pagemargin

\setlength\pageheight\snippetheight
\addtolength{\pageheight}{\snippetdepth}
\addtolength\pageheight\pagemargin
\addtolength\pageheight\pagemargin

\newwrite\foo
\immediate\openout\foo=\jobname.dimensions
  \immediate\write\foo{snippetdepth = \the\snippetdepth}
  \immediate\write\foo{snippetheight = \the\snippetheight}
  \immediate\write\foo{snippetwidth = \the\snippetwidth}
  \immediate\write\foo{pagewidth = \the\pagewidth}
  \immediate\write\foo{pageheight = \the\pageheight}
  \immediate\write\foo{pagemargin = \the\pagemargin}
\closeout\foo

\geometry{paperwidth=\pagewidth,paperheight=\pageheight,margin=\pagemargin}

\begin{document}%
\usebox{\snippetbox}%
\end{document}

Update to code: I just added -compression=9 to the pnmtopng command line and added ppmtopgm (convert to grayscale) in the final conversion pipeline. These together reduce the PNG image sizes by 20%. By the way, the average file size of the 24 PNG images in the sample screenshots shown above is 3534.83 bytes. The HTML document is 120,053 bytes. Keep in mind that these PNG images are 4 times larger (in each dimension height and width) than what appears on the screen at the default font size. If display-time oversampling is disabled, then the PNG images average 732.8 bytes each and the HTML document goes down to 29,209 bytes. I’m not particularly worried about HTML and image file sizes anymore like I was in the 1990s, but I thought this was worth noting anyway. (Note: A pnmgamma adjustment of .5 or so should be used instead of .3 if display-time oversampling is diabled.)

Related Question