tags:

views:

36

answers:

2

I am trying to run some regular expressions(grep) on a text file of about 4K lines. The main portion that I need replaced looks like this:

1,"An Internet-Ready Resume",1,2,"","

And I need it to look like this:

<item>
<title>An Internet-Ready Resume</title>
<category>1</category>
<author>2</author>
<content>

So far, this is what I was trying to no avail:

[0-9]{1}\,\"*\"\,[0-9]\,[0-9]\,\"\"\,\"
+1  A: 

You should start with doing a little reading on regular expressions. There are tons of useful resources online. Then you would see that:

  • you needn't escape everything (such as commas or quotes)
  • the asterisk * doesn't mean anything, but zero or more times
  • the any character is the . character. .* means any character any number of times (or anything)
  • if you need to make substitutions where you need atoms of what you're searching, you have to set those atoms by using (<atom content>) where <atom content> is a bit of a regexp.

A tip to start: instead of \"*\" try ".*"; Check the reference.

Also note that the part regarding the replacement will depend on the text editor/tool you're using. Usually a regexp such as (a)(b) (where a,b are regexp atoms) being replaced by x\1y\2z would produce xaybz.

Miguel Ventura
Thanks for the help. I'm not sure what is right or wrong, but this is working: [0-9],"(.*)",([0-9]),([0-9]),"","
Matt Blake
A: 

The error is the \"*\" part. When you use the * operator you need to tell it what is to be repeated. As written it is going to repeat the previous quote character. Instead of that you should tell it to repeat any character (.), thus: \".*\"

A secondary comment is that you have a lot of unnecessary backslashes. In fact, none of them are necessary as far as I can tell. Without them your regex looks like:

[0-9],".*",[0-9],[0-9],"","
John Kugelman