[Tex/LaTex] Steganography in Latex

steganography

I am a teacher. An ongoing problem is that students post assignments to websites like chegg.com to have other people write projects for them. I don't know if the students are providing JPGs or PDFs — but what I see on the website is a text version of the assignment. See for example:
https://www.chegg.com/homework-help/questions-and-answers/part-2-confidence-intervals-recovery-great-recession-2007-2009-economic-situation-many-fam-q37153607

For my assignments, the assignment is written in LaTex. I'm wondering how I might embed a unique identifier in each student's assignment so that I can tell by looking at the assignment who posted it. Let's imagine that I'm teaching 500 students (9 bits), and it would be good to encode the semester and year (5 bits), plus a parity bit for good measure– so let's just say I need to encode 16 bits.

Requirements:

Cannot use font changes — the font information is lost in the
chegg posting.
Cannot use watermarks or other imagery — again, this information
has been stripped from the posting.
Cannot use spacing changes — mainly because this runs counter to
how LaTex is formatting the page, and I'm not sure that subtle
spacing changes would survive to the posting. (in other words, no
stegsnow)
The optimal answer would embed the identifier multiple times so that
even if only a portion of the assignment is posted, it is possible to
identify the source.

What this means, I believe, is that I basically need something that is visible, but easily overlooked in the text itself. Ideas:

Double a particular word that that a reader is likely to overlook –depending on which word is doubled, identity is know (such as my doubling the word "that" in the preceding text).
Using extra punctuation marks that might be easily overlooked.. (see double periods at the end).
Inserting occasoinal letter flips that look like sloppy spell checking but actually encode the identifier. (see spelling of occasional)

Anyone know of something that has already been implemented? Any suggestions about the easiest way to do this?

Here's a potential starting point for a solution. In the optimal implementation, the \stenagbox could include "complicated" text such as begin/end{enumerate}, multiple paragraphs with formatting, etc.

\documentclass[10pt]{article}

\newcommand{\stenagbox}[2]{#2}

\begin{document}

  \stenagbox{16000}{When moving to a new area, it is important to
    understand the climate that you will be living in.  Does it rain
    more or less than you are used to?  Will it typically be hotter or
    colder than the city that you are coming from?  Just knowing where
    a city is located on a map is not sufficient.  In some areas,
    nearby mountains may block the wind and make the climate hotter or
    colder than expected.  In other areas, the ocean may keep the
    region cool in the summer time and warm in the winter.  Using
    statistics, the climate in two areas can be compared to determine
    what to expect.}

\end{document}

Best Answer

I create a \bitstream[<total tests>]{<test number>} (default 256 total tests) that writes a token register of the binary bits comprising the test number. I demonstrate how it works in the MWE (you don't need that in your test preparation, it was only for demonstration)

Then, to encode your test versions, one uses \dobit{<output A>}{<output B>} to place slight differences into the output stream (i.e., the printed test). Each time it is invoked, it sucks the high-order bit from the \bits token register and uses it to decide output A versus B.

In the MWE, I created an 8-test matrix, requiring 3 bits (2^3=8), and so 3 \dobit choices are encountered to create 8 unique versions of the test. The versions have a comma included or not, spell "versions" correctly or not, and repeat the word "the" or not.

Whereas I just do a \bigskip to separate the test versions, presumably, one would use a \clearpage so that individual tests would appear on separate pieces of paper.

\documentclass{article}
\usepackage{pgffor}
\newcounter{bitreg}
\newcounter{bitval}
\newtoks\bits
\newcommand{\addtotoks}[2]{#1\expandafter{\the#1#2}}
\newcommand\bitstream[2][256]{%
  \setcounter{bitreg}{#2}%
  \setcounter{bitval}{\the\numexpr#1/2\relax}%
  \bits{}%
  \bitstreamaux%
}
\newcommand\bitstreamaux{%
  \addtocounter{bitreg}{-\thebitval}%
  \ifnum\thebitreg>-1\relax\addtotoks\bits{1}\else
    \addtotoks\bits{0}\addtocounter{bitreg}{\thebitval}\fi
  \ifnum\thebitval=1\relax\else%
    \setcounter{bitval}{\the\numexpr\thebitval/2\relax}%
    \expandafter\bitstreamaux%
  \fi
}
\newcommand\dobit[2]{%
  \expandafter\checkbit\the\bits\relax
  \ifnum\thisbit=1\relax#2\else#1\fi
}
\def\checkbit#1#2\relax{\gdef\thisbit{#1}\bits{#2}}
\begin{document}
\bitstream{255} \the\bits

\bitstream{128} \the\bits

\bitstream{53} \the\bits

\bitstream[8]{5} \the\bits

\foreach\x in{0,...,7}{\bitstream[8]{\x}
Test \x:
This is a test\dobit{}{,} of % COMMA IN OR NOT
multiple vers\dobit{io}{oi}ns. The test %VERSIONS MISSPELLED OR NOT
is for all \dobit{the the}{the} marbles.\bigskip\par% THE REPEATED OR NOT.
}
\end{document}

SUPPLEMENT

One additional note of interest. While the <total tests> are normally expected to be a power of 2, it seems to be the case that if they are not, unique tests will still be generated. However, the \bitstream will not correspond to the binary representation of the <test number>. For example, the following bit streams for 9 total tests,

\bitstream[9]{0}\the\bits\par
\bitstream[9]{1}\the\bits\par
\bitstream[9]{2}\the\bits\par
\bitstream[9]{3}\the\bits (4 not 3)\par
\bitstream[9]{4}\the\bits (5 not 4)\par
\bitstream[9]{5}\the\bits (8 not 5)\par
\bitstream[9]{6}\the\bits (9 not 6)\par
\bitstream[9]{7}\the\bits (10 not 7)\par
\bitstream[9]{8}\the\bits (12 not 8)\par

produces 9 unique results, just not the bitstreams corresponding to the numbers 0 through 8. The artifact arises from the integer arithmetic associated with the /2 division operation on numbers that are not powers of 2.

DOUBLE SUPPLEMENT

One possible gotcha (user error) is if you fail to issue enough \dobit calls to match the number of bits allocated to your \bitstream. Then, the digits that differentiate the cases never make it into the test, and so some cases might not be differentiated.

A fix for that user error is to build the \bitstream starting with the LSB (least significant bit), rather than the MSB (most significant bit). That way, even an incomplete number of \dobit invocations would still provide differentiation.

Here is a version of the answer that builds the \bitstream from LSB to MSB, rather than the opposite.

\documentclass{article}
\usepackage{pgffor}
\newcounter{bitreg}
\newcounter{bitval}
\newtoks\bits
\newcommand{\apptoks}[2]{#1\expandafter{\expandafter#2\the#1}}
\newcommand\bitstream[2][256]{%
  \setcounter{bitreg}{#2}%
  \setcounter{bitval}{\the\numexpr#1/2\relax}%
  \bits{}%
  \bitstreamaux%
}
\newcommand\bitstreamaux{%
  \addtocounter{bitreg}{-\thebitval}%
  \ifnum\thebitreg>-1\relax\apptoks\bits{1}\else
    \apptoks\bits{0}\addtocounter{bitreg}{\thebitval}\fi
  \ifnum\thebitval=1\relax\else%
    \setcounter{bitval}{\the\numexpr\thebitval/2\relax}%
    \expandafter\bitstreamaux%
  \fi
}
\newcommand\dobit[2]{%
  \expandafter\checkbit\the\bits\relax
  \ifnum\thisbit=1\relax#2\else#1\fi
}
\def\checkbit#1#2\relax{\gdef\thisbit{#1}\bits{#2}}
\begin{document}
\bitstream{255} \the\bits

\bitstream{128} \the\bits

\bitstream{53} \the\bits

\bitstream[8]{5} \the\bits

\foreach\x in{0,...,7}{\bitstream[8]{\x}
Test \x:
This is a test\dobit{}{,} of % COMMA IN OR NOT
multiple vers\dobit{io}{oi}ns. The test %VERSIONS MISSPELLED OR NOT
is for all \dobit{the the}{the} marbles.\bigskip\par% THE REPEATED OR NOT.
}
\end{document}

Related Solutions

[Tex/LaTex] How to attach AND HIDE a file in a PDF using Latex

The "hidden" embedded file in the blog post is not an embedded file in the sense of the PDF standard, so the question is what you really want:

If you only want to include the content from the file in the generated PDF, you can add a PDF stream: If you write \immediate\pdfobj file{some-filename.tex}, the file some-filename.tex is copied into the PDF as a stream. If you want to see this without writing a PDF parser, you can use

\documentclass{article}
\pdfobjcompresslevel=0% Don't hide the objects
...
\begin{document}
...
% Disable compression for this one object
{\pdfcompresslevel=0\immediate\pdfobj file{some-filename.tex}}
...
\end{document}

If you open the resulting PDF file in an editor, somewhere you will something like: (The first number may vary)

11 0 obj
<Here comes the content of some-filename.tex>
endobj

This object will not be visible in any PDF viewer.

Of course, this isn't really embedded. A second attempt: Embed the file, but do not list it in /EmbeddedFiles. You can use

\documentclass{article}
\usepackage{embedfile}
\pdfobjcompresslevel=0% Don't hide the objects
...
\begin{document}
...
{\pdfcompresslevel=0\embedfile{some-filename.tex}}
\makeatletter
\global\let\EmFi@list\empty
\makeatother
...
\end{document}

I partially disabled compression again so that you can find the file in the resulting PDF. The \global\let\EmFi@list\empty makes the embedfile package forget about all the files up to this point, so they will never be written into the list of embedded files, but the /EmbeddedFile PDF object with the file content and some metadata is still written. You can't easily make this visible, because the catalog entries are missing.

If you try to reproduce the blog post you referenced and change the case of /EmbeddedFiles, you have to replace the output routine of embedfile:

\documentclass{article}
\usepackage{embedfile}
\pdfobjcompresslevel=0% Don't hide the objects, otherwise you can't see
                      % /Embeddedfiles, so you also can't change it back
\makeatletter
% The following is mostly copied from embedfile.sty, (C) by Heiko Oberdiek
% But all the errors are propably introduced by me
\def\embedfilefinish{%
  \ifEmFi@finished
    \EmFi@Error{%
      Too many invocations of \string\embedfilefinish
    }{%
      The list of embedded files is already written.%
    }%
  \else
    \ifx\EmFi@list\empty
    \else
      \global\EmFi@finishedtrue
      \begingroup
        \def\do##1##2{%
          (##1)##2%
        }%
        \immediate\pdfobj{%
          <<%
            /Names[\EmFi@list]%
          >>%
        }%
        \pdfnames{%
          % Changed name to make this invalid
          /Embeddedfiles \the\pdflastobj\space 0 R%
        }%
      \endgroup
    \fi
  \fi
}
\makeatother
\begin{document}
...
\embedfile{hidden.tex}
...
\end{document}