tags:

views:

394

answers:

7

I have this html, and I want to replace the numeric value within value="##" with the value between <option>value</option>

For example: <option value="16">Accounting</option>, I want to know the regex it'd take to automatically change it to <option value="Accounting">Accounting</option>

I plan on doing it to this entire list.

<option value="16">Accounting A.A.S.</option>
<option value="15">Accounting A.S.</option>
<option value="33">Art Studies</option>
<option value="18">Business Administration A.A.S.</option>
<option value="17">Business Administration A.S.</option>
<option value="20">Computer Network Technician</option>
<option value="21">Computer Support Specialist</option>
<option value="40">Criminal Justice A.A.S.</option>
<option value="39">Criminal Justice A.S.</option>
<option value="37">Criminal Justice: Corrections Certificate</option>
<option value="41">Criminal Justice: Cybersecurity</option>
<option value="42">Criminal Justice: Economic Crime</option>
<option value="43">Criminal Justice: Forensic Investigation</option>
<option value="34">Early Childhood</option>
<option value="22">Fashion Buying & Merchandising</option>
<option value="35">Fine Arts</option>
<option value="23">Health Services Management A.S.</option>
<option value="24">Health Services Management Technology A.A.S.</option>
<option value="92">Human Resource Management A.A.S.</option>
<option value="44">Human Services</option>
<option value="25">International Business</option>
<option value="36">Liberal Arts & Sciences: Childhood Education</option>
<option value="49">Liberal Arts & Sciences: Communication Arts</option>
<option value="50">Liberal Arts & Sciences: General Studies</option>
<option value="52">Liberal Arts & Sciences: Social Science</option>
<option value="51">Liberal Arts and Sciences: Humanities</option>
<option value="26">Marketing A.A.S.</option>
<option value="27">Medical Coder/Transcriptionist Certificate</option>
<option value="45">Music Industry</option>
<option value="28">Paralegal</option>
<option value="46">Photographic Technology</option>
<option value="47">Radio/Television Broadcasting</option>
<option value="91">Science A.S.</option>
<option value="29">Small Business Management</option>
<option value="30">Small Business Management: Certificate</option>
<option value="48">Teaching Assistant: Certificate</option>
<option value="31">Travel & Tourism: Hospitality & Events Management</option>
<option value="32">Website and E-Business Development</option>

EDIT: I want to use GREP, within textwrangler

+1  A: 

Under Linux:

sed 's|<option value="[^"]*">\([^<>]*\)</option>|<option value="\1">\1</option>|g'
cadrian
+2  A: 

I assume this means you want:

<option value="14">Foobar</option>

To become:

<option value="Foobar">Foobar</option>

If so, here's the Javascript. I assume the select variable contains the surrounding <select> tag DOM element, e.g. by form.nameOfselect.

for ( var option : select.options ) {
   option.value = option.text;
}
Jason Cohen
I am using textwrangler on Mac OS X and it can regex for its search/replace function.Sorry I did not mention that.
Brad
+3  A: 

Just remove the value attributes. <option> by default takes a value of it's content.

<option value="Accounting">Accounting</option>

is equivalent to:

<option>Accounting</option>

Edit: using sed you can do

sed -r s/' value="[0-9]+"'//g
vartec
Quotes generally don't need to be escaped in a regex. This is a matter of the host language's string representation. For clarity you should leave it off if you are not referring to a specific implementation.
Tomalak
Ok ;-) Although I like expression "better safe, then sorry" ;-)
vartec
I think putting escape characters everywhere "just in case" is a bad thing, as it raises uncertainty which part of the processing chain (in your case shell->sed->regex) gets to see what. I'd rather have the pure regex, doing my own escaping than removing escaping someone else anticipated for me.
Tomalak
+1  A: 

If your HTML is well-formed, this will do the trick:

Regex:

(?<=<option) value="\d+"

Replace with the empty string.

In HTML, option values fall back automatically to the displayed text, if no value attribute is present.

Tomalak
A: 

It can be made a bit more general, but it works when there are no other attributes in the option tag:

/(.*")([^"]+)(">)([^<]+)(.*)/

Then replace the captured string with this:

$1$4$3$4$5

If you're using backslash as the subgroup reference character, replace the dollar signs with backslash.

Tip: There is a great regexp tester online at http://gskinner.com/RegExr/, check it out!

ciscoheat
A: 

You've added that you want to perform the operation in a text editor. I wrote something assuming you'd want to use php, and I'll let it stand here because I think it's important.

Anyway, using TextWrangler (or any other text editor), replace

<option value="[^"]*">([^<]*)</option>

with

<option value="\1">\1</option>


As you're mostly active in the php tag, I assume you're asking for a solution in php. However, let me begin this answer with an advice: If you are looking for a way, state your goal, not a technology you think is suitable. regexps work, but this is far easier in simplexml:

$xml = str_replace("&", "&amp;", $xml); // Fix errors in your XML
$doc = simplexml_load_string($xml);
foreach ($doc->xpath("//option") as $o) {
    $o["value"] = "" . $o;
}
$xml = $doc->asXML();

If you insist on using a regexps, you can do so:

$xml = preg_replace(
    '#<option value="[^"]*">([^<]*)</option>#',
    '<option value="$1">$1</option>', $xml);
phihag
A: 

I haven't tested this, but I believe this will work, using jQuery.

 $('option').each(function(elm) { $(this).val($(this).text()); });
James Curran