views:

164

answers:

1

I'd like to take user input (sometimes this will be large paragraphs) and generate a LaTeX document. I'm considering a couple of simple regular expressions that replaces all instances of "\" with "\textbackslash " and all instances of "{" or "}" with "\}" or "\{".

I doubt this is sufficient. What else do I need to do? Note: In case there is a special library made for this, I'm using python.

Update: To clarify, I do not wish that anything be parsed treated as LaTeX syntax. $a$ should be replaced with \$ a\$.

+5  A: 

If your input is plain text and you are in a normal catcode regime, you must do the following substitutions:

  • \\textbackslash{} (note the empty group!)
  • {\{
  • }\}
  • $\$
  • &\&
  • #\#
  • ^\textasciicircum{} (requires the †extcomp package)
  • _\_
  • ~\textasciitilde{}
  • %\%

In addition, the following substitutions are useful at least when using the OT1 encoding (and harmless in any case):

  • <\textless{}
  • >\textgreater{}
  • |\textbar{}

And these three disable the curly quotes:

  • "\textquotedbl{}
  • '\textquotesingle{}
  • ` → \textasciigrave{}
Philipp
Will a space suffice in place of the empty group?
Conley Owens
Also what about the `---` that Mike Graham mentioned?
Conley Owens
Notice that `\textasciitilde` is actually really ugly because it’s too high and that is rarely what is wanted. Similarly, `\texttildelow` is too low. The best workaround that I know is posted here: http://stackoverflow.com/questions/256457/how-does-one-insert-a-backslash-or-a-tilde-into-latex/2037332#2037332
Konrad Rudolph
@Conley Owens: No, an space won't suffice, it will be gobbled by the input processor. The empty group is the easiest solution; you could also check whether a space follows in the input text and insert a control space (`\ `, backslash–space) in that case.
Philipp
@Conley Owens: What do you mean with `---`? The dashes is implemented as a ligature in (pdf)TeX. If you don't want to have “---” converted to “—”, you must replace it explicitly (e.g., `-{}-{}-`). The opposite direction is unproblematic: If you use a Unicode-capable engine (XeTeX, LuaTeX) or load the `inputenc` package with an appropriate encoding, you can use typographic characters like — or “ directly.
Philipp