tags:

views:

750

answers:

5

I am trying to write an awk script to convert a CSV formatted spreadsheet into XML for Bugzilla bugs. The format of the input CSV is as follows (created from an XLS spreadsheet and saved as CSV):

tag_1,tag_2,...,tag_N
value1_1,value1_2,...,value1_N
value2_1,value2_2,...,value2_N
valueM_1,valueM_2,...,valueM_N

The header column represents the name of the XML tag. The above file converted to XML should look as follows:

<element>
    <tag_1>value1_1</tag_1>
    <tag_2>value1_2</tag_2>
    ...
    <tag_N>value1_N</tag_N>
</element>
<element>
    <tag_1>value2_1</tag_1>
    <tag_2>value2_2</tag_2>
    ...
    <tag_N>value2_N</tag_N>
</element>
...

The awk script I have to accomplish this follows:

BEGIN {OFS = "\n"}
NR == 1 {for (i = 1; i <=NF; i++)
            tag[i]=$i
         print "<bugzilla version=\"3.4.1\" urlbase=\"http://mozilla.com/\" maintainer=\"[email protected]\" exporter=\"[email protected]\">"}
NR != 1 {print "   <bug>"
         for (i = 1; i <= NF; i++)
            print "      <" tag[i] ">" $i "</" tag[i] ">"
         print "   </bug>"}
END {print "</bugzilla>"}

The actual CSV file is:

cf_foo,cf_bar,short_desc,cf_zebra,cf_pizza,cf_dumpling ,assigned_to,bug_status,cf_word,cf_caslte
ABCD,A-BAR-0032,A NICE DESCRIPTION - help me,pretty,Pepperoni,,,NEW,,

The actual output is:

$ awk -f csvtobugs.awk bugs.csv

<bugzilla version="3.4.1" urlbase="http://mozilla.com/" maintainer="[email protected]" exporter="[email protected]">
   <bug>
      <cf_foo,cf_bar,short_desc,cf_zebra,cf_pizza,cf_dumpling>ABCD,A-BAR-0032,A</cf_foo,cf_bar,short_desc,cf_zebra,cf_pizza,cf_dumpling>
      <,assigned_to,bug_status,cf_word,cf_caslte>NICE</,assigned_to,bug_status,cf_word,cf_caslte>
      <>DESCRIPTION</>
      <>-</>
      <>help</>
      <>me,pretty,Pepperoni,,,NEW,,</>
   </bug>
   <bug>
   </bug>
</bugzilla>

Clearly, not the intended result (I admit, I copy-pasted this script from this forum: http://www.unix.com/shell-programming-scripting/21404-csv-xml.html). The problem is that it's been SOOOOO long since I've looked at awk scripts and I have NO IDEA what the syntax means.

+4  A: 

You need to set FS = "," in the BEGIN rule to use comma as your field separator; the code as you show it should work if the field separator was a tab, which is a different (also popular) convention in files that are often still called "CSV" even then commas aren't used;-).

Alex Martelli
beat me two it, so i'll accept yours!
LES2
You can also use `-F,` as an option to `awk`
Dennis Williamson
A: 

I was able to fix it by changing the FS (field separator):

BEGIN {
    FS=",";
    OFS = "\n"}
NR == 1 {for (i = 1; i <=NF; i++)
            tag[i]=$i
         print "<bugzilla version=\"3.4.1\" urlbase=\"http://mozilla.com/\" maintainer=\"[email protected]\" exporter=\"[email protected]\">"}
NR != 1 {print "   <bug>"
         for (i = 1; i <= NF; i++)
            print "      <" tag[i] ">" $i "</" tag[i] ">"
         print "   </bug>"}
END {print "</bugzilla>"}

Output:

<bugzilla version="3.4.1" urlbase="http://mozilla.com/" maintainer="[email protected]" exporter="[email protected]">
   <bug>
      <cf_foo>ABCD</cf_foo>
      <cf_bar>A-BAR-0032</cf_bar>
      <short_desc>A NICE DESCRIPTION - help me</short_desc>
      <cf_zebra>pretty</cf_zebra>
      <cf_pizza>Pepperoni</cf_pizza>
      <cf_dumpling ></cf_dumpling >
      <assigned_to></assigned_to>
      <bug_status>NEW</bug_status>
      <cf_word></cf_word>
      <cf_caslte></cf_caslte>
   </bug>
</bugzilla>
LES2
+1  A: 

Use a tool that you do know:)

That awk script does not look it deals with " and other CSV oddities. (I think it just splits on tabs - as the other answers note it needs to be change to split on , ) python, perl .Net etc have objects to fully deal with CSV and XML and probably you could write the solution in as few characters as the awk script and MORE importantly understand it.

Mark
hey, it didn't take long, did it? i had figured out the answer on my own, but could only post 2 seconds after the first answer (my answer is arguably better since i include more info) :)
LES2
A: 

You can use various tricks like setting FS. More tricks can be found at the Awk news group. There are also parsers like mine: http://lorance.freeshell.org/csv/

LoranceStinson
+1  A: 

Remember that splitting by comma in a csv is fine until you get the following scenario:

1997,Ford,E350,"Super, luxurious truck"

In which case it will split "Super, luxurious truck" into two items which is incorrect. I would recommend using the csv libs in another language as 'Mark' states in the above post.

Jamie
I got around this problem by switching to a "TSV" export (tab-separated values). The master file was an Excel worksheet, and I won't need to do this all the time.I was migrating the team from an Excel based tracker (for 'stories' in an 'agile' methodology) to Bugzilla. Each story is now kept as a bug in Bugzilla. We are using the Eclipse Mylyn plugin to pull the stories into the IDE as tasks. Much better than the Excel solution, IMO.Anyway, this initial import only need to happen once - and I didn't want to learn Perl for that. The AWK script worked well enough :)
LES2