tags:

views:

101

answers:

6

I am trying to determine what the following pattern match criteria allows me to enter:

\s*([\w\.-]+)\s*=\s*('[^']*'|"[^"]*"|[^\s]+)

From my attempt to decipher (by looking at the regex's I do understand) it seems to say I can start with any character sequence then I must have a brace followed by alphanumerics, then another sequence followed by braces, one intial single quote, no backslashes closed by a brace ???

Sorry if I have got this completely muddled. Any help is appreciated.

Regards, Pablo

A: 

RegexBuddy says:

\s*([\w\.-]+)\s*=\s*('[^']*'|"[^"]*"|[^\s]+)

Options: case insensitive

Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «([\w\.-]+)»
   Match a single character present in the list below «[\w\.-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A word character (letters, digits, etc.) «\w»
      A . character «\.»
      The character “-” «-»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 2 «('[^']*'|"[^"]*"|[^\s]+)»
   Match either the regular expression below (attempting the next alternative only if this one fails) «'[^']*'»
      Match the character “'” literally «'»
      Match any character that is NOT a “'” «[^']*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character “'” literally «'»
   Or match regular expression number 2 below (attempting the next alternative only if this one fails) «"[^"]*"»
      Match the character “"” literally «"»
      Match any character that is NOT a “"” «[^"]*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character “"” literally «"»
   Or match regular expression number 3 below (the entire group fails if this one fails to match) «[^\s]+»
      Match a single character that is a “non-whitespace character” «[^\s]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»


Created with RegexBuddy
Scott Evernden
+2  A: 

The square brackets are character classes, and the parens are for grouping. I'm not sure what you mean by "braces".

This basically matches a name=value pair where than name consists of one or more "word", dot or hyphen characters, and the value is either a single quoted character or a double-quoted string of characters, or a bunch of non-whitespace characters. Single-quoted characters cannot contain a single quote, and double quoted strings may not contain double-quotes (both arguably minor flaws whatever syntax this is from). There's also arguably some ambiguity since the last option ("a bunch on non-whitespace characters") could match something starting with a single or double quote.

Also, zero or more whitespaces may appear around the equal sign or at the beginning (that's the \s* bits).

Laurence Gonsalves
It looks like a crude attempt to pluck an *attribute=value* sequence out of an HTML tag. If that's the case, I would change the last alternative to `[^\s>]+`. Aside from that it will probably work 99+% of the time, simplistic though it is.
Alan Moore
A: 

Let us break \s*([\w\.-]+)\s*=\s*('[^']*'|\"[^\"]*\"|[^\s]+) apart:

\s*([\w\.-]+)\s*:

  • \s* means 0 or more whitespace characters
  • `[\w.-]+ means 1 or more of the following characters: A-Za-z0-9_.-

('[^']*'|\"[^\"]*\"|[^\s]+):

  • One or more characters non-' characters enclosed in ' and '.
  • One or more characters non-" characters enclodes in " and ".
  • One or more characters not containing a space

So basically, you can mostly ignore the \s*'s in trying to understand the expression, they just handle removing spacing.

Sebastian P.
+1  A: 

It's looking for strings of text which are basically

<identifier> = <value>
  • identifier is made up of letters, digits, '-' and '.'

  • value can be a single-quoted strings, double-quoted strings, or any other sequence of characters (as long as it doesn't contain a space).

So it would match lines that look like this:

foo = 1234
bar-bar= "a double-quoted string"
bar.foo-bar ='a single quoted string'
   .baz      =stackoverflow.com this part is ignored

Some things to note:

  • There's no way to put a quote inside a quoted string (such as using \" inside "...").
  • Anything after the quoted string is ignored.
  • If a quoted string isn't used for value, then everything from the first space onwards is ignored.
  • Whitespace is optional
too much php
A: 

Yes, you have got it completely muddled. :P For one thing, there are no braces in that regex; that word usually refers to the curly brackets: {}. That regex only contains square brackets and parentheses (aka round brackets), and they're all regex metacharacters--they aren't meant to match those characters literally. The same goes for most of the other characters.

You might find this site useful. Very good tutorial and reference site for all things regex.

Alan Moore
A: 

What will it be the correct regex pattern for the following scenario:

Found the = sign and only the value next to it, like for example

> <font color=#ff3300
> size=18><b>original
> flavor</b></font><br><font
> color=#666666 size=14>mint
> flavor</font>

has to be like below.

> <font color='#ff3300'
> size='18'><b>Original
> Flavor</b></font><br /><font
> color='#666666' size='14'>Mint
> Flavor</font>

Any help will be greatly appreciated. Thanks