ansaurus

Question

Wrap Field in a Line with Variable Number of Separators

Answer 1

+1 A:

For your sample data:

sed -i '/\([^|]*|\)\{7\}/{s/\([^|]*|\)/"\1/4;s/\(|[^|]*\)/"\1/6}' inputfile

For your real data:

sed -i '/\([^|]*|\)\{17\}/{s/\([^|]*|\)/"\1/14;s/\(|[^|]*\)/"\1/16}' inputfile

Edit:

(I added a pair of missing braces in each example so the second s command (actually both) only operate when the address matches. I also removed the -n and the p. Removing the p eliminates the duplication. Sorry for the errors.)

The part before the s command is called an "address". It selects only the lines that have 7 (or 17) pipe characters, thus excluding the s command from operating on lines with a different number of pipe characters.

// - The delimiters for the address
\(\) - grouping parentheses (escaped)
[^|]* - zero or more (*) non-pipe (^|) characters ([] - character list delimiters)
| - and the pipe character we're interested in
\{7\} - repeat the group seven times
{command; command} - these braces delimit a block of commands that will be executed when the address is matched - together the address and the braces act like an if statement and its associated block

So that address matches lines that have seven groups of zero or more non-pipe characters that are each followed by a pipe character.

Then the first s command says to replace the 4th (or 14th) pipe character and its preceding non-pipe characters with a quotation mark followed by the matched characters.

The semicolon is a command separator. Some versions of sed require that, instead of semicolons, you use the `sed -e 'command' -e 'command' form for multicommand one-liner scripts.

By the way, the s is a command and not part of the regex. The part between the initial pair of slashes in the commands above and the part between the initial pair of slashes in an s/// command are regexes.

Please let me know if you have additional questions. The second s command looks for the 6th (or 16th) pipe character and the zero or more non-pipes that follow it and replaces that with itself (the matched characters) and a quotation mark.

Dennis Williamson 2010-08-02 16:57:04

Thanks very much! Can you explain it a tad bit? I've never seen a RegExp that has anything before the s/ . Also, does the semicolon allow for more than one sed expression to be executed or is that just part of the RegExp? And lastly, what is the /14 and /16 for? Thanks again!

Topher Fangio 2010-08-02 18:14:59

Sorry, one last question: this doesn't actually change the file, it just outputs the correct lines. When I replaced `sed -n` with `sed -i` (for inline) it worked, but it duplicated the fixed lines. How would you suggest I actually run this to fix the lines in the existing file?

Topher Fangio 2010-08-02 18:21:02

@Topher: See my edit. Briefly, the `14`, `16`, `4` or `6` are qualifiers for the `s///` command that select the `nth` occurrence of the regex (between the first pair of slashes in the `s` command) to perform the substitution on. The semicolon is a command separator. Remove the `p` at the end to eliminate the duplication and change the `-n` to a `-i`. Be sure to note the addition of a pair of braces.

Dennis Williamson 2010-08-02 19:30:40

Answer 2

+1 A:

KISS. When you are working with distinct fields and field separators, use a tool like awk which is totally meant for the job.

$ cat file
Field 1|Field 2|Field 3|Field 4|Field 5|Field 6
Field 1|Field 2|Field 3|Field | with | pipes|Field 5|Field 6

$ awk -F"|" 'NF>6{$4="\042"$4 ; $(NF-2)=$(NF-2)"\042";}1' OFS="|"  file
Field 1|Field 2|Field 3|Field 4|Field 5|Field 6
Field 1|Field 2|Field 3|"Field | with | pipes"|Field 5|Field 6

the above says, if the number of fields are more than 6 (ie NF>6), then add a double quote (\042) to the 4th field, as well as the last 2nd field. (or change it accordingly depending on your data. )

No need to use complicated regular expression.

ghostdog74 2010-08-03 00:24:18

+1 - Thanks for this answer. This is what I was hoping someone would post. It's quite elegant and makes a tad bit more sense to me than the sed solution above. I decided to go ahead and accept this as the solution since it is the most elegant and most correct.

Topher Fangio 2010-08-03 16:10:55

ansaurus

tags:

views:

answers:

Wrap Field in a Line with Variable Number of Separators

related questions