tags:

views:

273

answers:

5

Hi

I am using LaTeX and I have a problem concerning string manipulation. I want to have an operation applied to every character of a string, specifically I want to replace every character "x" with "\discretionary{}{}{}x". I want to do this because I have a long string (DNA) which I want to be able to separate at any point without hyphenation.

Thus I would like to have a command called "myDNA" that will do this for me instead of inserting manually \discretionary{}{}{} after every character.

Is this possible? I have looked around the web and there wasnt much helpful information on this topic (at least not any I could understand) and I hoped that you could help.

--edit To clarify: What I want to see in the finished document is something like this:


    the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA
    AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG
    ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT
    ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT
    CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC
    CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT
    AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT

just plain linebreaks, without any hyphens. The DNA sequence will be one long string without any spaces or anything but it can break at any point. This is why my idea was to inesert a "\discretionary{}{}{}" after every character, so that it can break at any point without inserting any hyphens.

A: 
  1. Assuming your string is the same, in your preamble, use the \newcommand{}{}. Like this: \newcommand{\myDNA}{blah blah blah}

if that doesn't satisfy your requirements, I suggest: 2. Break the strings down to the smallest portion, then use the \newcommand and then call the new commands in sequence: \myDNA1 \myDNA2.

If that still doesn't work, you might want to look at writing a perl script to satisfy your string replacement needs.

Mica
I basically have one long string without spaces (see above for an example). I want to apply a command (like "insert this text") to every character. I have thought about a perl script but I hoped I could do without it. Preprocessing every time before compilation is not much fun...
hroest
have you considered looking at some kind of fancy verbatim environment? I don't have time to look at it, but you could always just change the font, open an inline fancy verbatim environment, and try and let the environment do the work. I have no idea if this will work, but, as a last-ditch effort, it might be worth a work.
Mica
+4  A: 

This takes a string as an argument and calls \discretionary{}{}{} after each character. The input string stops at the first dollar sign, so you should not use that.

\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}

\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}

\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}

\def\say#1{#1}

You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.

Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.

Revision:

\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}

\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}

\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}

\def\transform#1{#1\hskip 0pt plus 1pt}

Steve’s suggestion of using \hskip sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip from \transform, you’ll also need to remove the \unskip in the main macro definition.


Edit:

There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…

Debilski
But you can change the discretionary to \hspace{0pt} and it'll definitely work! Way to actually know TeX!
Jefromi
works like a charm although unfortunately I do not understand what it does/how it works. Thanks a lot
hroest
I’ve just learned that myself from the Tex by topic book, although the example in there is sightly more complicated and it took me a while to adapt it…Basically what it does is pattern matching on a list. So it takes the first character, transforms it and then calls itself with the rest of the string.
Debilski
ok i see that \say{#1}\discretionary{}{}{}% is the part where it transforms it. I dont quite see the part where it pops one character from the list; is that in the function call itself?
hroest
I transformed it a little to get rid of the superfluous `\else`… — It’s all a little complicated because the macros take delimiting tokens for syntactical reasons but which don’t have any semantic value. Well, you see that `\xHyphenate` takes two arguments but when it’s called from inside `\takeTheRest`, it receives only one argument.
Debilski
+1  A: 

Debilski's post is definitely a solid way to do it, although the \say is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\@gobble and \@ifnextchar):

\makeatletter
\def\hyphenatestring#1{\xHyphen@te#1$\unskip}
\def\xHyphen@te{\@ifnextchar${\@gobble}{\sw@p{\hskip 0pt plus 1pt\xHyphen@te}}}
\def\sw@p#1#2{#2#1}
\makeatother

Note the use of \hskip 0pt plus 1pt instead of \discretionary - when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip adds some stretchable glue in between each character (and the \unskip afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an @ in them somewhere so that users don't accidentally call them.

If you want to figure out how this works, \@gobble just eats whatever's in front of it (in this case the $, since that branch is only run when a $ is the next char). The main point is that \sw@p is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen@te} and put that with no args in the "else" branch, but (in my opinion) \sw@p is more general (and I'm surprised it's not in standard LaTeX already).

Steve
Yes, `\hskip` seems like the way to do it.
Debilski
A: 

There is a contrib package on CTAN that deals with typesetting DNA sequences. It does a little more than just line-breaking, for example, it also supports colouring. I'm not sure if it is possible to get the output you are after though, and I have no experience in the DNA-sequence-typesetting area, but is one long string the most readable representation?

dreamlax
A: 

I have a problem that I think uses a similar solution, but I cannot get my code to work, despite having tried to hack the solution posted here. I posted a question in the forum:

http://stackoverflow.com/questions/3930765/parsing-through-arguments-in-latex

Does anyone have any suggestions on how to fix my recursion problem? Thanks! ERM

ERM