regex to extract names and values of attributes

I have the following possible strings that I need to turn into arrays so I can feed them into an html generator. I am not staring with html or XML, I am trying to create a shorthand that will allow me to populate my html objects much simpler and faster with more readable code.

id='moo'
id = "foo" type= doo    value ='do\"o'
on_click='monkeys("bobo")'

I need to pull out the attribs and their corresponding values. These attrib strings are not associated with an html or xml tag. And I would like to do it with 1 to 3 regular expressions

The value may be encapsulated by either single or double quotes
If the value is encapsulated by quotes it may also contain whitespace, quotes different from encapsulating quotes or escaped quotes that are same as the encapsulating quotes.
There may or may not be whitespaces between the attrib and =, and the = and value.

The eventual results should look like:

array(1) {
  [id] => moo
}
array(3) {
  [id] => foo
  [type] => doo
  [value] => do"o
}
array(1) {
  [on_click] => monkeys("bobo")
}

but if it turns out like:

array(2) {
  [0] => id
  [1] => moo
}
array(6) {
  [0] => id
  [1] => moo
  [2] => class
  [3] => foo
  [4] => value
  [5] => do"o
}

array(2) {
  [0] => on_click
  [1] => monkeys("bobo")
}

I can re-arrange it from there.

Some previous regexes I have tried to use and their issues:

/[\s]+/ - Return attrib/value pairs only if there was no whitespace around the =
/(?<==)(\".*\"|'.*'|.*)$/ - Returns value including the encapsulating quotes. It does ignore escaped quotes within the value though
/^[^=]*/ - Returns the attribute just fine. regardless of whitespace between attrib and =

any suggestions on how I should be going about this?

Tyson of the Northwest 2010-09-10 19:37:29

State machine, parsing "tokens", and knowing what to expect. Starts with looking for an identifier, then (skipping spaces), an '='. Then one of ', " and word token, followed by the same quote, OR just a word token without quotes. Repeat as needed.

zigdon 2010-09-10 21:23:58

Unfortunately no, I am putting a framework over the DOM to facilitate faster generation of xml content. I am familer with the dom and I am trying to make a tool that will parse out an attribute string into an array so I can feed it into my dom objects.

Tyson of the Northwest 2010-09-10 19:35:08

ansaurus

tags:

views:

answers:

regex to extract names and values of attributes

related questions