ansaurus

Question

PHP Regex to match lines with all-caps with occaisional hyphens.

Answer 1

+2 A:

Along the lines of (don't forget the "u" flag for Unicode regexes):

^(?:\*\*)?(?=[^*]{4,})(\p{Lu}+)(?:\s*-\s*(\p{Lu}+))?(?:\*\*)?\s*$

^               # start of line
(?:\*\*)?       # two stars, optional
(?=[^*]{4,})    # followed by at least 4 non-star characters
(\p{Lu}+)       # group 1, Unicode upper case letters
(?:             # start no capture group
  \s*-\s*       #   space*, dash, space*
  (\p{Lu}+)     #   group 2, Inicode upper case letters
)?              # end no capture group, make optional
(?:\*\*)?       # two stars, optional
\s*             # optional trailing spaces
$               # end of line

EDIT: Simplified, as per the comments:

^(?=[A-Z ]{4,})([A-Z ]+)(?:-([A-Z ]+))?\s*$

^               # start of line
(?=[A-Z -]{4,}) # followed by at least 4 upper case characters, spaces or dashes
([A-Z ]+)       # group 1, upper case letters or space
(?:             # start no capture group
  -             #   a dash
  ([A-Z ]+)     #   group 2, upper case letters or space
)?              # end no capture group, make optional
\s*             # optional trailing spaces
$               # end of line

Contents of groups 1 and 2 must be trimmed before use.

Tomalak 2010-04-20 13:14:08

This is good, but is there a simpler expression, since the asterisks will no longer be used, and all chars are English uppercase letters? I appreciate your detail, but I'm also trying to learn by example. Thanks.

Yaaqov 2010-04-20 13:22:50

Well done, and clarified. Thank you. By the way, is there a tool you're aware of that automatically comments on the components of regular expressions? This is very helpful for a beginner like myself.

Yaaqov 2010-04-20 13:47:06

@Yaaqov: Maybe. I don't know any such tool, though.

Tomalak 2010-04-20 14:03:57

@Yaaqov: I would highly recommend RegexBuddy (http://www.regexbuddy.com/). It's not free but it's in my list of must-have tools when dealing with regex stuff.

Amry 2010-04-21 00:58:58

Answer 2

A:

So all you need to know is that the header starts with four uppercase ASCII letters? This should work:

'#^([A-Z]{4}[^-]*)(?:-(.*))?$#'

Alan Moore 2010-04-20 13:32:47

Answer 3

+1 A:

^([A-Z]{4,}(?:[A-Z ]*[A-Z])?)(?:\s*-\s*([A-Z]{4,}(?:[A-Z ]*)?))?$

What about this one? It would match uppercase words of at least 4 characters and an optional subheader of again at least 4 uppercase letters.

Aurril 2010-04-20 13:35:20

Answer 4

+1 A:

The regular expression:

^(?=.{4})([^-]+)(?:-(.*))?$

The explanation:

^          # start of line
(?=.{4})   # look ahead to make sure there are at least 4 characters
([^-]+)    # get all characters until it finds a dash character, if there is any
(?:-(.*))? # optional: skip the dash and continue get all characters until EOL
$          # end of line

I assumed you were only interested on lines having at least 4 characters.

Also, I cheated a bit so that the regex will match any characters, not just English uppercase letters, since it leads to simpler expression. Anyhow, if you want to make sure it only accepts uppercase letters, this should do it:

^(?=.{4})([A-Z\s]+)(?:-([A-Z\s]+))?$

Amry 2010-04-20 14:28:26

ansaurus

tags:

views:

answers:

PHP Regex to match lines with all-caps with occaisional hyphens.

related questions