views:

237

answers:

3

Problem

VerbatimOut from the “fancyvrb” package doesn’t play nicely with UTF-8 characters.

Minimal working example:

\documentclass{minimal}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{fancyvrb}

\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}

\input{\jobname.test}
\end{document}

Error message

When compiled using pdflatex mini, this gives the error

File ended while scanning use of \UTFviii@three@octets.

A different error occurs when the sole occurrence of é above is replaced by something else, e.g. é */:

Package inputenc Error: Unicode char \u8:### not set up for use with LaTeX.

– indicating that in this case, LaTeX succeeds in reading a multi-byte UTF-8 character, but not knowing what to do with it (i.e. it’s the wrong character).

In fact, when I open the produced .test file manually, it contains the character é, but in Latin-1 encoding!

Proof: when I open the files in a hex editor, I get the following:

  • Original file: C3 A9 (corresponds to LATIN SMALL LETTER E WITH ACUTE in UTF-8)
  • Written file: E9 (corresponds to é in Latin-1)

Question

How to set VerbatimOut up correctly?

filecontents* (from “filecontents”) shows that it can work. Unfortunately, I don’t understand either code so I cannot fix fancyvrb’s code by replicating the logic from filecontents manually.

I also cannot use filecontents* instead of VerbatimOut because the former doesn’t work within a \newenvironment, while the latter does.

(Oh, by the way: vanilla Verbatim instead of VerbatimOut also works as expected. The error seems to occur when writing the file, not when reading the verbatim input)

+1  A: 

Is your end goal to write symbols and accents in Verbatim? Because you can do that like this:

\documentclass{article}
\usepackage{fancyvrb}
\begin{document}
\begin{Verbatim}[commandchars=\\\{\}]
\'{e} \~{e} \`{e} \^{e}
\end{Verbatim}
\end{document}

The commandchars option allows the \ { } characters to work as they normally would.

Source: http://www.tex.ac.uk/tex-archive/macros/latex/contrib/fancyvrb/fancyvrb.pdf

Steve
Thanks for the hint but that solution isn’t usable because the saved verbatim code will be further processed by another program that doesn’t know about LaTeX – so I really need to be able to use Unicode characters directly.
Konrad Rudolph
Ah, okay. Then I am not quite sure. Good luck.
Steve
+1  A: 

This is still unfixed? I'll take another look. What exactly do you want: your package to use VerbatimOut, or for it not to interfere with it?

Tests

TexLive 2009's Xelatex compiles fine. With pdflatex, version

This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009)

I get an error message that is rather more useful error message than you got:


! Argument of \UTFviii@three@octets has an extra }.
 
                \par 
l.8 é

? i \makeatletter\show\UTFviii@three@octets
! Undefined control sequence.
\GenericError  ...                                
                                                    #4  \errhelp \@err@     ...
l.8 é

If I were to make a wild guess, I'd say that inputenc with pdftex uses the pdftex primitives to do some hairy storing and restoring of character tables, and some table somewhere has got a rarely mistake in it.

Possibly related

I saw a post by Vladimir Volovich in the pdf-tex mailing list archives, all the way back from 2003, that discusses a conflict between inputenc & fancyvrb, and posts a patch to "solve the problem". Who knows, maybe he faced the same problem? It might be worth emailing him.

Charles Stewart
Yes, the file encoding is definitely UTF-8 encoded.
Konrad Rudolph
(Yes, this is still unfixed.) That’s indeed a completely different error – although I’d suspect that a `}` is missing solely because the UTF-8 parser has already read one char too many. But why are you getting “undefined control sequence” when trying to show the definition of the macro?
Konrad Rudolph
@Konrad: I'm afraid debugging problems throwing up \GenericError is something that I have had bad experiences with. I plan on trying again sometime, but it won't be in the next few days.
Charles Stewart
@Charles: No worries. It’s a pretty big problem but unfortunately I don’t really have time to spend on it either at the moment. The easiest course would probably to contact the maintainer of the involved packages (i.e. fancyvrb and inputenc) so I’ll try that once I get the leisure to spend more time on this bug.
Konrad Rudolph
+1  A: 

XeTeX has much better Unicode support. The following run through xelatex produces “é” both in \jobname.test and the output PDF.

\documentclass{minimal}
\usepackage{fontspec}
\tracingonline=1
\usepackage{fancyvrb}

\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}

\input{\jobname.test}
\end{document}

fontspec loads the Latin Modern fonts, which have Unicode support. The standard TeX Computer Modern fonts don’t have the right tables for Unicode support.

If you use a character that does not have a glyph in the current font, by default XeTeX writes a blank space to the PDF and prints a warning in the log but not on the terminal. \tracingonline=1 prints the warning to the terminal.

andrew
Yes, I know about XeTeX and I use it exclusively. But I need this for a general-purpose package and since accented characters **do** work in normal LaTeX I don’t really want to break what little Unicode support works. This *isn’t* a Computer Modern font problem.
Konrad Rudolph