views:

2373

answers:

3

I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.

Format is:

NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc

Suppose I write:

read l
IFS=","
set -A nvls $l
echo "$nvls[2]"

This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:

NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc

Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".

I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.

Does anyone have a useful trick up their sleeve for me?

PS I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)

A: 

I think you've just run out of steam in ksh and need to move to a full-powered regex language - possibly Perl or Python. There are then related questions to help you. See, for example, Regular expression for parsing name value pairs. The situation there is slightly different, but could be adapted.

Jonathan Leffler
+1  A: 

As it often happens, I deviced an answer minutes after asking the question in public forum :(

I worked around the quoting/unquoting issue by piping the input file through the following sed script:

sed -e 's/\([^\]\),/\1\
/g;s/$/\
/

It converted the input into:

NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>

Now, I can parse this input like this:

while read name value ; do
  echo "$name => $value"
done

Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.

PS Since I cant accept my own answer, should I delete the question, or ...?

ADEpt
Does using sed count? You could also use awk or perl or ... to do the munging. The sed regex surprises me slightly; I would have used two backslashes inside the square brackets, but I guess that is not actually necessary.
Jonathan Leffler
As to deleting the question - I don't know what the recommended procedure is, but I doubt that destroying your words of wisdom is really what they want. If the worst comes to the worst, I could copy your answer for you and let you select that - but it is a complete cheat.
Jonathan Leffler
Oh. I just stumbled upon http://stackoverflow.com/questions/209329/stackoverflow-should-i-answer-my-own-question-or-not.Seems like it's better to leave it as it is. Maybe someone will found this useful and upvote it :)
ADEpt
A: 

You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.

read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}
Bill Karwin
The only caveat here is that older KSH (as still found on SunOS, for example) does not have that nifty substitution function.
ADEpt
SunOS? That really is old.
Bill Karwin