views:

69

answers:

3

Really would appreciate help on this.

I am using sed to create a CSV file. Essentially multiple html files are all merged to a single html file and sed is then used to remove all the junk pictures etc to get to the raw columnar data.

I have all this working but am stuck on the last bit.

What I want to do is very basic - I want to replace the following lines:

"a variable string"
"end td"
"begin td"

with a single line:

"a variable string" 

(with a tab character at the end of this line)

I'M USING DOS.

As you see I'm new to all this. If I could get this working would save me a lot of time in the future so would appreciate the help. At the moment I have to inject some html headers back into the text file, open it in a html editor, select the table and then paste this into a spreadsheet which is a bit of pain.

P.S. is there an easy way to get sed to remove the parenthesis '(' and ')' from a given line?

+1  A: 

I doubt that this is what you really want, but it's what you asked for.

sed "s/\"a variable string\"/&\t/; s/\"end td\"//; s/\"begin td\"//" inputfile

What you probably want to do is replace them when they appear consecutively. Here's how you might do that:

sed "1{N;N}; /\"a variable string\"\n\"end td\"\n\"begin td\"/ s/\n.*$/\t/;ta;bb;:a;N;N;:b;$!P;N;D" inputfile

This will remove all parentheses in a file:

sed "s/[()]//g" inputfile

To select particular lines, you could do something like this:

sed "/foo/ s/[()]//g" inputfile

which will only make the replacement if the word "foo" is somewhere on a line.

Edit: Changed single quotes to double quotes to accommodate GNUWin32 and CMD.EXE.

Dennis Williamson
cmd.exe hates single quotes. `sed " ... " file`
@user229426: I just tried Cygwin `sed` from a `CMD` prompt and it worked just fine with single quotes. When I tried it with GNUWin32 `sed` I got an error complaining about the single quotes. I'll edit my answer.
Dennis Williamson
Wow - thank you for taking all the time to reply.
Rhys
A: 

A previous comment I left doesn't appear to have been saved - so will try again

The code to remove the ( and ) worked perfectly thanks

You are right - I was looking to merge the 3 lines into one line so the second example you gave where it looks like its reading the next two lines into the pattern space looks more promising. The output wasn't what I was expecting however.

I now realize the code is going to have to be more complicated and I don't want to trouble you any more as my manual method of injecting some html code back into the text file and opening it up in Openoffice and pasting into a spreadsheet only takes a few seconds and I have a feeling to manually produce the sed coding to this would be a nightmare.

Essentially the rules for converting the html would need to be: [each tag has been formatted so it appears on its own line] I have given example of an input file and desired output file below for reference

1) if < tr > is followed by < td > on the next line completely remove the < tr > and < td > lines [i.e. do not output a carriage return] and on the NEXT line stick a " at the start of that line [it doesn't matter about a carriage return at the end of this line as it is going to be edited later]

2) if < /td > is followed by < td > completely remove both these two lines [again do not output a carriage return after these lines] and on the PREVIOUS line output a ", [do not output a carriage return] and on the NEXT line stick "at the start of the line [don't worry about the the ending carriage return is will be edited later]

3) if < /td > is followed by < /tr > delete both of these lines and on the previous line add a " at to the end of the line and a final carriage return.

I have given an example of what the input and desired output would be:

input: http://medinfo.redirectme.net/input.txt

[the wanted file will be posted in the next message - this board will not allow new users to post a message with more than one hyperlink!]

there is an added issue that the address column is on multiple lines on the input file - this could be reduced to one line by looking to see if the first character of the NEXT line is a " If it isn't then do not output the carriage return at the end of the current line

Phew that was a nightmare just to type out never mind actually code. But thanks again for all your help in getting this far! :-)

Rhys
A: 

wanted: http://medinfo.redirectme.net/wanted.txt

p.s. wanted to give you a "nice answer badge" dennis but not sure how with me not being registered.

Rhys