tags:

views:

29

answers:

4

Format:

[Headword]{}"UC(icl>restriction)"(Attributes);(gloss)

The testme.txt file has 2 lines

[testme] {} "acetify" (V,lnk,CJNCT,AJ-V,VINT,VOO,VOO-CHNG,TMP,Vo) <H,0,0>; 
[newtest] {} "acid-fast" (ADJ,DES,QUAL,TTSM) <H,0,0>;

The expected output is this:

testme = acetify
newtest = acid-fast

What I have achieved so far is:

cat testme.txt | sed 's/[//g' | sed 's/]//g' | sed 's/{}/=/g' | sed 's/\"//'

testme = acetify" (V,lnk,CJNCT,AJ-V,VINT,VOO,VOO-CHNG,TMP,Vo) <H,0,0>;
newtest = acid-fast" (ADJ,DES,QUAL,TTSM) <H,0,0>;

How do I remove all the text from the second " to the end of the line?

+1  A: 

Remove everything after the doublequote-space-openparenthesis " (:

sed 's/" (.*//g'
Konerak
+1  A: 

The whole process might be a little quicker with awk:

awk 'NF > 0 { print $1 " = " $3 }' testme.txt | tr -d '[]"'
David Zaslavsky
+1  A: 

this is how you do it with awk instead of all those sed commands, which is unnecessary. what you want is field 1 and field 3. use gsub() to remove the quotes and brackets

$ awk '{gsub(/\"/,"",$3);gsub(/\]|\[/,"",$1);print $1" = "$3}' file
testme = acetify
newtest = acid-fast
ghostdog74
+1  A: 

Your whole sequence of multiple calls to sed can be replaced by:

sed 's/\[\([^]]*\)][^"]*"\([^"]*\).*/\1 = \2/' inputfile
Dennis Williamson