tags:

views:

71

answers:

3

I have a dictionary in .txt format, which looks like this:

term 1
    definition 1
    definition 2

term 2
    definition 1
    definition 2
    definition 3
etc.

There is a tab always before a definition, basically it's like this:

term 1
[tab]definition 1
[tab]definition 2
etc.

Now I need to wrap every term and it's definitions with <term> tag, i.e:

<term>
term 1
    definition 1
    definition 2
</term>

I was trying to use regular expressions to find term with it's definitions, but with no luck. Could you please help me with this?

Thank you for any suggestions!

A: 

Try this regular expression:

(^|\n).+(\n[ \t]+.+)*

Assuming that ^ marks the start of the string, \n is the line break character and . does not match line breaks.

Gumbo
A: 

Assuming an implementation that

  1. Matches multiple lines (/.../m)
  2. Uses \A to indicate the start of a line

this should match one "term":

\A[^\t][^\n]+\n(\t[^\n]+\n)+
calmh
A: 

Match a line with a leading non-whitespace character followed by one or more lines with leading TABs:

$ perl -0077 -pe 's/^(\S.+\n(^\t.+\n)+)/<term>\n$1<\/term>\n/mg' dict
<term>
term 1
        definition 1
        definition 2
</term>

<term>
term 2
        definition 1
        definition 2
        definition 3
</term>
Greg Bacon