tags:

views:

70

answers:

1

Hello, i'm having some troubles with regular expressions in ruby. I need to categorize some files that have the first line followed by two newlines, like in this example:

GIOVIANA

Si scrivono miliardi di poesie
sulla terra ma in Giove è ben diverso.
Neppure una se ne scrive. E certo
la scienza dei gioviani è altra cosa.
Che cosa sia non si sa. È assodato
che la parola uomo lassù desta
ilarità.

Empty lines can occur in other position of the file as well as double empty lines. I tried the following regexp (and many others)

/\A.*\n\n/

but i'm not getting the desired result.

I'll explain my whole project too, in case someone has a better idea on how to do it. I need to automatically markup textual structures in a plain text. I can do it fairly well with regular expressions to identify lines, sentences, and so on, but i can't tell my program this:

if the first line is followed by two newlines it is a tile, so mark it up with the title markup and go on on the third line if the first line is not followed by two newlines then the poem does not have a title, markup the first line as a title and then markup all the poem (including the first line)

in the first case the desired result is

[poem}[title}GIOVIANA{title]

[line}[sentence}Si scrivono miliardi di poesie{line]
[line}sulla terra ma in Giove è ben diverso.{sentence]{line]
[line}[sentence}Neppure una se ne scrive.{sentence][sentence} E certo{line]
[line}la scienza dei gioviani è altra cosa.{sentence]{line]
[line}[sentence}Che cosa sia non si sa.{sentence] [sentence}È assodato{line]
[line}che la parola uomo lassù desta{line]
[line}ilarità.{sentence]{line]
{poem]

in a poem without a title like

Ora sia il tuo passo
più cauto: a un tiro di sasso
di qui ti si prepara
una più rara scena.

the desired result is

[poem}[title}[line}[sentence}Ora sia il tuo passo{line]{title]
[line}più cauto: a un tiro di sasso{line]
[line}di qui ti si prepara{line]
[line}una più rara scena.{line]{sentence]{poem]

Thanks

+1  A: 

You don't need (sophisticated) regular expressions for that, just write a parser:

lines = string.split("\r\n") or lines = File.readlines(fname),

then something like this:

IN_SENTENCE=false
if lines[1] =~ /\w+/
 puts "[poem}[title}[line}[sentence}#{lines[0].strip}{line]{title]"
 IN_SENTENCE=true
 start = 1
else
 puts "[poem}[title}#{lines[0].strip}{title]"
 start = 2
end
lines[start..lines.size].each do |line|
  #process line
end
klochner
thanks for the answer, i'll try this approach
Marek