views:

156

answers:

1

I have a file containing a lot of SQL statements, such as:

CREATE TABLE "USER" (
    "ID" INTEGER PRIMARY KEY,
    "NAME" CHARACTER VARYING(50) NOT NULL,
    "AGE" INTEGER NOT NULL
);

COPY "USER" (id, name, age) FROM stdin;
1   Skywalker   19
2   Kenobi      57

I want the column names in the COPY statements to be uppercased and quoted:

COPY "USER" ("ID", "NAME", "AGE") FROM stdin;

Using sed, I found the following regexp:

sed -r 's/([( ])(\w+)([,)])/\1"\U\2\E"\3/g'

It does replace the column names, but it is not selective enough, and replaces other words in the file:

~/test]$sed -r 's/([( ])(\w+)([,)])/\1"\U\2\E"\3/g' star_wars_example
CREATE TABLE "USER" (
  "ID" INTEGER PRIMARY "KEY",
  "NAME" CHARACTER VARYING("50")NOT "NULL",
  "AGE" INTEGER NOT NULL
);

COPY "USER" ("ID", "NAME", "AGE") FROM stdin;
1   Skywalker   19
2   Kenobi      57

To avoid this problem, I want sed to only apply my regexp to the lines starting with COPY and ending with FROM stdin;.

I have looked into lookahead / lookbehind, but they are not supported in sed. They seem to be supported in super-sed, but I am currently using Cygwin (Windows is mandatory here...) and it does not seem available in the package list.

Is there a way to force sed to only consider specific line?

I've considered piping my file through grep before applying sed, but other lines will then disappear from the output.

Am I missing something obvious?

It would be great if the answer was easily applicable on a default Cygwin install. I guess I could try installing super-sed on cygwin, but I'd like to know if there are more obvious ideas

+2  A: 

Since I have no sed available to me at the moment, and have never actually used grouping, this command may or may not work (at all, or as intended) =)

Try

sed -r '/^COPY /{ s/([( ])(\w+)([,)])/\1"\U\2\E"\3/g }'

If I understand the manual correctly, this will execute the substitution on any line starting with COPY.

Another approach would be to use branching. This would look a lot more complicated, but is more flexible.

Jens
Wow, thanks a lot! This worked flawlessly. While this solves the problem I had, I'm curious about the branching method: does it involve grepping the file, looking for "^COPY", and then "branching" the result either to sed or to the standard output? How would one do that? (maybe post it in another answer so people may vote on it too)
eneveu
@eneveu: Here you are. I edited this into my original answer. Untested, too, though. sed seems to be turing-complete. Scary.
Jens
@Jens: I get an error from your second example: "sed: -e expression #1, char 9: unknown command: 'C'"
Dennis Williamson
@Dennis: Yeah, I found it. Correcting it.
Jens
@Jens: Now it gives: "sed: -e expression #1, char 18: unknown option to 's'". Clue: most of the places you have spaces their should be semicolons instead. Second clue: your "branching" version is not doing anything that the other one is not except needlessly doing a null-net-effect replacement. The selector in your first example is much simpler and more direct. Third clue: Since your first replacement is anchored to the beginning of the line, the `g` option is useless (and unwanted). And fourth clue: you can do a `b` without a target label and it will branch to the end.
Dennis Williamson
@Dennis: Thanks. I guess writing these without being able to test them does not work out. I'll remove the branching part to avoid confusing any more people. =)
Jens