views:

222

answers:

3

If I had a string with escaped commas like so:

a,b,{c\,d\,e},f,g

How might I use awk to parse that into the following items?

a
b
{c\,d\,e}
f
g
+2  A: 
{
  gsub("\\\\,", "!Q!")
  n = split($0, a, ",")
  for (i = 1; i <= n; ++i) {
    gsub("!Q!", "\\,", a[i])
    print a[i]
  }
}
DigitalRoss
This will work as long as you **never** have `!Q!` in your text.
system PAUSE
+1  A: 

I don't think awk has any built-in support for something like this. Here's a solution that's not nearly as short as DigitalRoss's, but should have no danger of ever accidentally hitting your made-up string (!Q!). Since it tests with an if, you could also extend it to be careful about whether you actually have \\, at the end of your string, which should be an escaped slash, not comma.

BEGIN {
 FS = ","
}

{
 curfield=1
 for (i=1; i<=NF; i++) {
  if (substr($i,length($i)) == "\\") {
   fields[curfield] = fields[curfield] substr($i,1,length($i)-1) FS
  } else {
   fields[curfield] = fields[curfield] $i
   curfield++
  }
 }
 nf = curfield - 1
 for (i=1; i<=nf; i++) {
  printf("%d: %s   ",i,fields[i])
 }
 printf("\n")
}
Jefromi
I assumed that this was all the splitting you were interested in doing, and therefore set FS to split on commas. If you're trying to do this just on a substring, use system PAUSE's version of the same method.
Jefromi
+2  A: 
{
   split($0, a, /,/)
   j=1
   for(i=1; i<=length(a); ++i) {
      if(match(b[j], /\\$/)) {
         b[j]=b[j] "," a[i]
      } else {
         b[++j] = a[i]
      }
   }
   for(k=2; k<=length(b); ++k) {
      print b[k]
   }
}
  1. Split into array a, using ',' as delimiter
  2. Build array b from a, merging lines that end in '\'
  3. Print array b (Note: Starts at 2 since first item is blank)

This solution presumes (for now) that ',' is the only character that is ever escaped with '\'--that is, there is no need to handle any \\ in the input, nor weird combinations such as \\\,\\,\\\\,,\,.

system PAUSE