tags:

views:

210

answers:

6

File1:

hello
- dictionary definitions:
hi
hello
hallo
greetings
salutations
no more hello for you
-
world
- dictionary definitions:
universe
everything
the globe
the biggest tree
planet
cess pool of organic life
-

I need to format this (for a huge list of words) into a term to definition format (one line per term). How can one achieve this? None of the words are the same, only the structure seen above is. The resultant file would look something like this:

hello    - dictionary definitions:    hi    hello    hallo    greetings    salutations    no more hello for you    -
world    - dictionary definitions:    universe    everything    the globe    the biggest tree    planet    cess pool of organic life    -

Awk/Sed/Grep/Cat are the usual contenders.

+1  A: 

Not sure the scripting language you will be using, pseudo code here:

for each line
 if line is "-"
  create new line
 else
  append separator to previous line
  append line to previous line
 end if
end for loop
o.k.w
+2  A: 
awk 'BEGIN {FS="\n"; RS="-\n"}{for(i=1;i<=NF;i++) printf("%s   ",$i); if($1)print"-";}' dict.txt

outputs:

hello   - dictionary definitions:   hi   hello   hallo   greetings   salutations   no more hello for you   -
world   - dictionary definitions:   universe   everything   the globe   the biggest tree   planet   cess pool of organic life   -
RC
I love you man! That is a friggin' huge command - and it works perfectly.
Note if you need to handles empty line at the end of the file you will need to add an if: `awk 'BEGIN {FS="\n"; RS="-\n"}{if(NF>2){for(i=1;i<=NF;i++)printf("%s ",$i);print("-");}}' dict.txt`
RC
Thanks for the tip RC.
+2  A: 

A perl one-liner:

perl -pe 'chomp;s/^-$/\n/;print " "' File1

gives

 hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
 world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life

This is 'something like' your required output.

pavium
Nice! More elegant then the one above. I've heard that perl is awesome for its text manipulation abilities.
Perl is awesome, so is awk, its grand-daddy :)
ghostdog74
Yes, Larry Wall certainly gave credit to `awk`, no doubt about that.
pavium
+1  A: 

Try this one liner works on a conditions that theer will always be 6 lines for a word

sed 'N;N;N;N;N;N;N;N;s/\n/ /g' test_3
Vijay Sarathi
not flexible enough. you will never know how many definitions there are
ghostdog74
+2  A: 

and who says only Perl can do it elegantly ? :)

$ gawk -vRS="-\n" '{gsub(/\n/," ")}1' file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life

OR

# gawk 'BEGIN{RS="-\n";FS="\n";OFS=" "}{$1=$1}1'  file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life
ghostdog74
You have to be careful of lines that end in `-` with RS set like that...
ephemient
don't understand. is it <start><space>bar<space>? or just <bar><space>?
ghostdog74
+1  A: 
sed -ne'1{x;d};/^-$/{g;s/\n/ /g;p;n;x;d};H'
awk -v'RS=\n-\n' '{gsub(/\n/," ")}1'
ephemient