views:

583

answers:

4

We implemented the online service where it is possible to generate PDF with predefined structure. The user can choose a LaTeX template and then compile it with an appropriate inputs.

The question we worry about is the security, that the malicious user was not able to gain shell access through the injection of special instruction into latex document.

We need some workaround for this or at least a list of special characters that we should strip from the input data.

Preferred language would be PHP, but any suggestions, constructions and links are very welcomed.

PS. in few word we're looking for mysql_real_escape_string for LaTeX

A: 

You'd probably want to make sure that your \write18 is disabled.

See http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html and http://www.texdev.net/2009/10/06/what-does-write18-mean/

Mica
Useful links! Thanks.
Igor
+2  A: 

The only possibility (AFAIK) to perform harmful operations using LaTeX is to enable the possibility to call external commands using \write18. This only works if you run LaTeX with the --shell-escape or --enable-write18 argument (depending on your distribution).

So as long as you do not run it with one of these arguments you should be safe without the need to filter out any parts.

Besides that, one is still able to write other files using the \newwrite, \openout and \write commands. Having the user create and (over)write files might be unwanted? So you could filter out occurrences of these commands. But keeping blacklists of certain commands is prone to fail since someone with a bad intention can easily hide the actual command by obfusticating the input document.

Edit: Running the LaTeX command using a limited account (ie no writing to non latex/project related directories) in combination with disabling \write18 might be easier and more secure than keeping a blacklist of 'dangerous' commands.

Veger
Thank you Veger! Your answer merged with Geoff's Reedy post gives that perfect intrusion-proof receipt.
Igor
+1  A: 

According to http://www.tug.org/tutorials/latex2e/Special_Characters.html the special characters in latex are # $ % & ~ _ ^ \ { }. Most can be escaped with a simple backslash but _ ^ and \ need special treatment.

For caret use \^{} (or \textasciicircum), for tilde use \~{} (or \textasciitilde) and for backslash use \textbackslash

If you want the user input to appear as typewriter text, there is also the \verb command which can be used like \verb+asdf$$&\~^+, the + can be any character but can't be in the text.

Geoff Reedy
True, but these chars do not pose a security threat to the OPs online service.
Veger
If you escape away these characters, particularly \, then you would prevent them inserting any markup. That's the closest thing to a `mysql_real_escape_string` equivalent.
staticsan
@Veger: Yes as the symbol "'" make no harm in the SQL query, BUT in the correct place, and if you don't want to permit to inject some LaTeX-specific special chars you have to escape them in the same manner as you do for SQL queries. This I was searching for and find the answer very appropriate!
Igor
+1  A: 

In general, achieving security purely through escaping command sequences is hard to do without drastically reducing expressivity, since it there is no principled way to distinguish safe cs's from unsafe ones: Tex is just not a clean enough programming language to allow this. I'd say abandon this approach in favour of eliminating the existence of security holes.

Veger's summary of the security holes in Latex conforms with mine: i.e., the issues are shell escapes and file creation.overwriting, though he has missed a shell escape vulnerability. Some additional points follow, then some recommendations:

  1. It is not enough to avoid actively invoking --shell-escape, since it can be implicitly enabled in texmf.cnf. You should explicitly pass --no-shell-escape to override texmf.cnf;
  2. \write18 is a primitive of Etex, not Knuth's Tex. So you can avoid Latexes that implement it (which, unfortunately, is most of them);
  3. If you are using Dvips, there is another risk: \special commands can create .dvi files that ask dvips to execute shell commands. So you should, if you use dvips, pass the -R2 command to forbid invoking of shell commands;
  4. texmf.cnf allows you to specify where Tex can create files;
  5. You might not be able to avoid disabling creation of fonts if you want your clients much freedom in which fonts they may create. Take a look at the notes on security for Kpathsea; the default behaviour seems reasonable to me, but you could have a per user font tree, to prevent one user stepping on another users toes.

Options:

  1. Sandbox your client's Latex invocations, and allow them freedom to misbehave in the sandbox;
  2. Trust in kpathsea's defaults, and forbid shell escapes in latex and any other executables used to build the PDF output;
  3. Drastically reduce expressivity, forbidding your clients the ability to create font files or any new client-specified files. Run latex as a process that can only write to certain already existing files;
  4. You can create a format file in which the \write18 cs, and the file creation css, are not bound, and only macros that invoke them safely, such as for font/toc/bbl creation, exist. This means you have to decide what functionality your clients have: they would not be able to freely choose which packages they import, but must make use of the choices you have imposed on them. Depending on what kind of 'templates' you have in mind, this could be a good option, allowing use of packages that use shell escapes, but you will need to audit the Tex/Latex code that goes into your format file.

Postscript

There's a TUGBoat article, Server side PDF generation based on LATEX templates, addressing another take on the question to the one I have taken, namely generating PDFs from form input using Latex.

Charles Stewart
Thank you Charles! Your explanation goes beyond my LaTeX experience. The last link was extremely useful for me and the references at the end of that article gives a lot of sources to read about this topic.
Igor