views:

76

answers:

6

I'm trying to use a regexp using sed. I've tested my regex with kiki, a gnome application to test regexpd, and it works in kiki.

date: 2010-10-29 14:46:33 -0200;  author: 00000000000;  state: Exp;  lines: +5 -2;  commitid: bvEcb00aPyqal6Uu;

I want to replace author: 00000000000; with nothing. So, I created the regexp, that works when I test it in kiki:

author:\s[0-9]{11};

But doesn't work when I test it in sed.

sed -i "s/author:\s[0-9]{11};//g" /tmp/test_regex.txt

I know regex have different implementations, and this could be the issue. My question is: how do I at least try do "debug" what's happening with sed? Why is it not working?

A: 

The fact that you are substituting author: 00000000000 is already said in sed when you add the s before the first /.

Alberto Zaccagni
But it's not working. author: 00000000000 doesn't get substituted with the line I provided.
Somebody still uses you MS-DOS
What is not working? I did not provide an example but an answer on why your regexp did not work. In paxdiablo's answer you will find the right command for sed.
Alberto Zaccagni
author: 00000000000 doesn't gets substituted, so the regex is not working. Thanks anyway.
Somebody still uses you MS-DOS
+1  A: 

My version of sed doesn't like the {11} bit. Processing the line with:

sed 's/author: [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9];//g'

works fine.

And the way I debug it is exactly what I did here. I just constructed a command:

echo 'X author: 00000000000; X' | sed ...

and removed the more advanced regex things one at a time:

  • used <space> instead of \s, didn't fix it.
  • replaced [0-9]{11} with 11 copies of [0-9], that worked.

It pretty much had to be one of those since I've used every other feature of your regex before with sed successfully.

But, in fact, this will actually work without the hideous 11 copies of [0-9], you just have to escape the braces [0-9]\{11\}. I have to admit I didn't get around to trying that since it worked okay with the multiples and I generally don't concern myself too much with brevity in sed since I tend to use it more for quick'n'dirty jobs :-)

But the brace method is a lot more concise and adaptable and it's good to know how to do it.

paxdiablo
I think leaving off \s as space also helped this.
Peer Stritzinger
I tried to escape '{' and '}': sed -i "s/author:\s[0-9]\{11\};//g" /tmp/test_regex.txt. It worked. Would you mind testing?
Somebody still uses you MS-DOS
@Peer, actually, I put that back in and it still worked. That's not to say it will work on _every_ `sed` (mine is the CygWin one).
paxdiablo
@Somebody: yes, that does work with the escaped braces.
paxdiablo
Thanks for answering and helping me out. I used to hate these tools, I think I just need some more training and study and get used to these little issues, and learn to do baby steps.
Somebody still uses you MS-DOS
+1  A: 

You are using the -i flag incorrectly. You need to put give it a string to put on the temporary file. You also need to escape your curly braces.

sed -ibak -e "s/author:\s[0-9]\{11\};//g" /tmp/test_regex.txt

I usually debug my statement by starting with a regex I know will work (like 's/author//g' in this case). When that works I know that I have the right arguments. Then I expand the regex incrementally.

Brian Clements
I tested with \{ in paxdiablo's answer. It's working, I just wanted to know if it's going to work in his environment too. I understood the incrementally regex approach from paxdiablo as well, seens to be a good one.
Somebody still uses you MS-DOS
`-i` does not _need_ a suffix and, when you use it, it's either `-ibak` or `--in-place=bak`, never `-i=bak`. I won't downvote since it's trivial but you may want to fix it.
paxdiablo
fixed the -i suffix, the old way works fine, it just put some extra chars on the suffix. There are some versions of sed that will not make a temporary file if you do not supply a suffix. This is dangerous and can cause data corruption.
Brian Clements
+5  A: 

In sed you need to escape the curly braces. "s/author:\s[0-9]\{11\};//g" should work.

Sed has no debug capability. To test you simplify at the command line iteratively until you get something to work and then build back up.

$ echo 'xx a: 00123 b: 5432' | sed -e 's/a:\s[0-9]\{5\}//
xx  b: 5432
verisimilidude
The posting software ate my backslashes! Put a backslash before your opening and closing curly braces.
verisimilidude
@veri, I fixed up your answer. If you put backticks around code, it will leave it alone (first line above). If you indent lines with four spaces, you get the same effect for code blocks (bottom section).
paxdiablo
+1 for good answer and sweet username
Platinum Azure
+1  A: 

That looks more like a perl regex than it does a sed regex. Perhaps you would prefer using

perl -pi.orig -e 's/author:\s[0-9]{11};//g' file1 file2 file3

At least that way you could always add -Mre=debug to debug the regex.

tchrist
+1  A: 

There is a Python script called sedsed by Aurelio Jargas which will show the stepwise execution of a sed script. A debugger like this isn't going to help much in the case of characters being taken literally (e.g. {) versus having special meaning (e.g. \{), especially for a simple substitution, but it will help when a more complex script is being debugged.

The latest SVN version.
The most recent stable release.
Disclaimer: I am a minor contributor to sedsed.

sedsed example

Another sed debugger, sd by Brian Hiles, written as a Bourne shell script (I haven't used this one).

Dennis Williamson