tags:

views:

44

answers:

3

Take the following file...

ABCD,1234,http://example.com/mpe.exthttp://example/xyz.ext
EFGH,5678,http://example.com/wer.exthttp://example/ljn.ext

Note that "ext" is a constant file extension throughout the file.

I am looking for an expression to turn that file into something like this...

ABCD,1234,http://example.com/mpe.ext
ABCD,1234,http://example/xyz.ext
EFGH,5678,http://example.com/wer.ext
EFGH,5678,http://example/ljn.ext

In a nutshell I need to capture everything up to the urls. Then I need to capture each URL and put them on their own line with the leading capture.

I am working with sed to do this and I cannot figure out how to make it work correctly. Any ideas?

A: 

I have no sed available to me at the moment.

Wouldn't

sed -r 's/(....),(....),(.*\.ext)(http.*\.ext)/\1,\2,\3\n\1,\2,\4/g' 

do the trick?

Edit: removed the lazy quantifier

Jens
Very good idea (I hope the part before the URLs is that constant). But I thought that sed doesn't support lazy quantifiers.
Tim Pietzcker
It does not? *sigh* Let me think...
Jens
Well, I think it should work without the lazyness, too.
Jens
+1  A: 

If the number of URLs in each line is guaranteed to be two, you can use:

sed -r "s/([A-Z0-9,]{10})(.+\.ext)(.+\.ext)/\1\2\n\1\3/" < input
Amarghosh
+1  A: 

This does not require the first two fields to be a particular width or limit the set of (non-comma) characters between the commas. Instead, it keys on the commas themselves.

sed 's/\(\([^,]*,\)\{2\}\)\(.*\.ext\)\(http:.*\)/\1\3\n\1\4/' inputfile.txt

You could change the "2" to match any number of comma-delimited fields.

Dennis Williamson