tags:

views:

316

answers:

2

I have the following string:

<div id="mydiv">This is a "div" with quotation marks</div>

I want to use regular expressions to return the following:

<div id='mydiv'>This is a "div" with quotation marks</div>

Notice how the id attribute in the div is now surrounded by apostrophes?

How can I do this with a regular expression?

Edit: I'm not looking for a magic bullet to handle every edge case in every situation. We should all be weary of using regex to parse HTML but, in this particular case and for my particular need, regex IS the solution...I just need a bit of help getting the right expression.

Edit #2: Jens Ameskamp helped to find a solution for me but anyone randomly coming to this page should think long and very hard about using this solution. In my case it works because I am very confident of the type of strings that I'll be dealing with. I know the dangers and the risks and make sure you do to. If you're not sure if you know then it probably indicates that you don't know and shouldn't use this method. You've been warned.

+3  A: 

This could be done in the following way: I think you want to replace every instance of ", that is between a < and a > with '.

So, you look for each " in your file, look behind for a <, and ahead for a >. The Regex looks like

(?<=\<[^<>]*)"(?=[^><]*\>)

You can the replace the found characters to your liking, maybe using Regex.Replace.

Note: While I found the stackoverflow community most friendly and helpful, these Regex/HTML questions are responded with a little too much anger, in my opinion. After all, this question here does not ask "What regex matches all valid html, and does not match anything else.".

Jens
Thanks. I'll give this a shot.
Cindyydnic
What if you have a `>` inside a quoted string in a tag? Before you start trying to modify the regex to anticipate every possibility, seems like you might want to have a look at http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
Jefromi
I made a small mistake. Editing to get the working version. =)
Jens
While there is certainly a lot of unnecessary anger, the points made in the multitude of regex/html questions are generally valid. Sure, if the question said "I need to do this once on some html which doesn't have anything crazy in it I promise", then regex would be a reasonable approach. If it's a lot of HTML, you're not sure everything is straightforward, you might apply this to new input... you want it to work. You want to parse the HTML.
Jefromi
@Jefromi: In this case, this Regex breaks, but maybe this is not a problem for the OP. And I understand that people dislike to point to the Agility Pack over and over again. I just think they do so too quickly sometimes. I'd frown at using a third party parser for a small quick and dirty replace. =)
Jens
I won't have this situation. As I wrote in my comment at the top, the HTML is going to be "sanitized" before I apply the regex. This is running in a company intranet and we have a lot of control over what the string will look like. I understand the dangers of using regular expressions to parse HTML but, in this case, I'm confident that there is no need to worry about the edge cases.
Cindyydnic
@Jefromi, @Jens Ameskamp: Exactly. In this situation, the regex should work for all of my cases but not necessarily ALL cases for everyone else.
Cindyydnic
@Jens: My philosophy is that a solution which may work depending on the specifics of what the OP is doing (but hasn't told you) is not in the general case a good solution, though in this case it sounds like it will work out. @Cindyydnic: I know you haven't posted much, and you probably had no idea about the regex/HTML minefield here - the information in that comment is pretty essential to avoiding it (maybe even edit the question to put it in).
Jefromi
@Jefromi: I know more that enough about the regex minefield. I appreciate how that community is trying to discourage a newbie playing with regex fire but it feels more like smashing a dog on the nose with a rolled up newspaper.However, you are absolutely correct that the question should be updated with an explanation that I am very much aware of the dangers of what I'm asking for.
Cindyydnic
A: 

You can match:

(<div.*?id=)"(.*?)"(.*?>)

and replace this with:

$1'$2'$3
codaddict