[Tex/LaTex] Steganography in Latex

steganography

I am a teacher. An ongoing problem is that students post assignments to websites like chegg.com to have other people write projects for them. I don't know if the students are providing JPGs or PDFs — but what I see on the website is a text version of the assignment. See for example:
https://www.chegg.com/homework-help/questions-and-answers/part-2-confidence-intervals-recovery-great-recession-2007-2009-economic-situation-many-fam-q37153607

For my assignments, the assignment is written in LaTex. I'm wondering how I might embed a unique identifier in each student's assignment so that I can tell by looking at the assignment who posted it. Let's imagine that I'm teaching 500 students (9 bits), and it would be good to encode the semester and year (5 bits), plus a parity bit for good measure– so let's just say I need to encode 16 bits.

Requirements:

  1. Cannot use font changes — the font information is lost in the
    chegg posting.
  2. Cannot use watermarks or other imagery — again, this information
    has been stripped from the posting.
  3. Cannot use spacing changes — mainly because this runs counter to
    how LaTex is formatting the page, and I'm not sure that subtle
    spacing changes would survive to the posting. (in other words, no
    stegsnow)
  4. The optimal answer would embed the identifier multiple times so that
    even if only a portion of the assignment is posted, it is possible to
    identify the source.

What this means, I believe, is that I basically need something that is visible, but easily overlooked in the text itself. Ideas:

  1. Double a particular word that that a reader is likely to overlook –depending on which word is doubled, identity is know (such as my doubling the word "that" in the preceding text).
  2. Using extra punctuation marks that might be easily overlooked.. (see double periods at the end).
  3. Inserting occasoinal letter flips that look like sloppy spell checking but actually encode the identifier. (see spelling of occasional)

Anyone know of something that has already been implemented? Any suggestions about the easiest way to do this?

Here's a potential starting point for a solution. In the optimal implementation, the \stenagbox could include "complicated" text such as begin/end{enumerate}, multiple paragraphs with formatting, etc.

\documentclass[10pt]{article}

\newcommand{\stenagbox}[2]{#2}

\begin{document}

  \stenagbox{16000}{When moving to a new area, it is important to
    understand the climate that you will be living in.  Does it rain
    more or less than you are used to?  Will it typically be hotter or
    colder than the city that you are coming from?  Just knowing where
    a city is located on a map is not sufficient.  In some areas,
    nearby mountains may block the wind and make the climate hotter or
    colder than expected.  In other areas, the ocean may keep the
    region cool in the summer time and warm in the winter.  Using
    statistics, the climate in two areas can be compared to determine
    what to expect.}

\end{document}

Best Answer

I create a \bitstream[<total tests>]{<test number>} (default 256 total tests) that writes a token register of the binary bits comprising the test number. I demonstrate how it works in the MWE (you don't need that in your test preparation, it was only for demonstration)

Then, to encode your test versions, one uses \dobit{<output A>}{<output B>} to place slight differences into the output stream (i.e., the printed test). Each time it is invoked, it sucks the high-order bit from the \bits token register and uses it to decide output A versus B.

In the MWE, I created an 8-test matrix, requiring 3 bits (2^3=8), and so 3 \dobit choices are encountered to create 8 unique versions of the test. The versions have a comma included or not, spell "versions" correctly or not, and repeat the word "the" or not.

Whereas I just do a \bigskip to separate the test versions, presumably, one would use a \clearpage so that individual tests would appear on separate pieces of paper.

\documentclass{article}
\usepackage{pgffor}
\newcounter{bitreg}
\newcounter{bitval}
\newtoks\bits
\newcommand{\addtotoks}[2]{#1\expandafter{\the#1#2}}
\newcommand\bitstream[2][256]{%
  \setcounter{bitreg}{#2}%
  \setcounter{bitval}{\the\numexpr#1/2\relax}%
  \bits{}%
  \bitstreamaux%
}
\newcommand\bitstreamaux{%
  \addtocounter{bitreg}{-\thebitval}%
  \ifnum\thebitreg>-1\relax\addtotoks\bits{1}\else
    \addtotoks\bits{0}\addtocounter{bitreg}{\thebitval}\fi
  \ifnum\thebitval=1\relax\else%
    \setcounter{bitval}{\the\numexpr\thebitval/2\relax}%
    \expandafter\bitstreamaux%
  \fi
}
\newcommand\dobit[2]{%
  \expandafter\checkbit\the\bits\relax
  \ifnum\thisbit=1\relax#2\else#1\fi
}
\def\checkbit#1#2\relax{\gdef\thisbit{#1}\bits{#2}}
\begin{document}
\bitstream{255} \the\bits

\bitstream{128} \the\bits

\bitstream{53} \the\bits

\bitstream[8]{5} \the\bits

\foreach\x in{0,...,7}{\bitstream[8]{\x}
Test \x:
This is a test\dobit{}{,} of % COMMA IN OR NOT
multiple vers\dobit{io}{oi}ns. The test %VERSIONS MISSPELLED OR NOT
is for all \dobit{the the}{the} marbles.\bigskip\par% THE REPEATED OR NOT.
}
\end{document}

enter image description here


SUPPLEMENT

One additional note of interest. While the <total tests> are normally expected to be a power of 2, it seems to be the case that if they are not, unique tests will still be generated. However, the \bitstream will not correspond to the binary representation of the <test number>. For example, the following bit streams for 9 total tests,

\bitstream[9]{0}\the\bits\par
\bitstream[9]{1}\the\bits\par
\bitstream[9]{2}\the\bits\par
\bitstream[9]{3}\the\bits (4 not 3)\par
\bitstream[9]{4}\the\bits (5 not 4)\par
\bitstream[9]{5}\the\bits (8 not 5)\par
\bitstream[9]{6}\the\bits (9 not 6)\par
\bitstream[9]{7}\the\bits (10 not 7)\par
\bitstream[9]{8}\the\bits (12 not 8)\par

produces 9 unique results, just not the bitstreams corresponding to the numbers 0 through 8. The artifact arises from the integer arithmetic associated with the /2 division operation on numbers that are not powers of 2.

enter image description here


DOUBLE SUPPLEMENT

One possible gotcha (user error) is if you fail to issue enough \dobit calls to match the number of bits allocated to your \bitstream. Then, the digits that differentiate the cases never make it into the test, and so some cases might not be differentiated.

A fix for that user error is to build the \bitstream starting with the LSB (least significant bit), rather than the MSB (most significant bit). That way, even an incomplete number of \dobit invocations would still provide differentiation.

Here is a version of the answer that builds the \bitstream from LSB to MSB, rather than the opposite.

\documentclass{article}
\usepackage{pgffor}
\newcounter{bitreg}
\newcounter{bitval}
\newtoks\bits
\newcommand{\apptoks}[2]{#1\expandafter{\expandafter#2\the#1}}
\newcommand\bitstream[2][256]{%
  \setcounter{bitreg}{#2}%
  \setcounter{bitval}{\the\numexpr#1/2\relax}%
  \bits{}%
  \bitstreamaux%
}
\newcommand\bitstreamaux{%
  \addtocounter{bitreg}{-\thebitval}%
  \ifnum\thebitreg>-1\relax\apptoks\bits{1}\else
    \apptoks\bits{0}\addtocounter{bitreg}{\thebitval}\fi
  \ifnum\thebitval=1\relax\else%
    \setcounter{bitval}{\the\numexpr\thebitval/2\relax}%
    \expandafter\bitstreamaux%
  \fi
}
\newcommand\dobit[2]{%
  \expandafter\checkbit\the\bits\relax
  \ifnum\thisbit=1\relax#2\else#1\fi
}
\def\checkbit#1#2\relax{\gdef\thisbit{#1}\bits{#2}}
\begin{document}
\bitstream{255} \the\bits

\bitstream{128} \the\bits

\bitstream{53} \the\bits

\bitstream[8]{5} \the\bits

\foreach\x in{0,...,7}{\bitstream[8]{\x}
Test \x:
This is a test\dobit{}{,} of % COMMA IN OR NOT
multiple vers\dobit{io}{oi}ns. The test %VERSIONS MISSPELLED OR NOT
is for all \dobit{the the}{the} marbles.\bigskip\par% THE REPEATED OR NOT.
}
\end{document}

enter image description here