views:

205

answers:

13

Greetings.

I've been tasked with debugging part of an application that involves a Regex -- but, I have never dealt with Regex before. Two questions:

1) I know that the regexes are supposed to be testing whether or not two strings are equivalent, but what specifically do the two regex statements, below, mean in plain English?

2) Does anyone have a recommendation on websites / sources where I can learn more about Regexes? (preferably in C#)

if (Regex.IsMatch(testString, @"^(\s*?)(" + tag + @")(\s*?),", RegexOptions.IgnoreCase))
                {
                    result = true;
                }
else if (Regex.IsMatch(testString, @",(\s*?)(" + tag + @")(\s*?),", RegexOptions.IgnoreCase))
                {
                    result = true;
                }
+1  A: 

One word - Cribsheet (or is that two?) :)

David Neale
+1  A: 

Using The Regex Coach

The regular expression is a sequence consisting of the expression '(\s*?)', the expression '(tag)', the expression '(\s*?)', and the character ','.

where (\s*?) is defined as The regular expression is a repetition which matches a whitespace character as often as necessary.

the second one matches a , at the start too

As for good learning websites, I like www.regular-expressions.info/

Super simple version:

At the start of a string 0 or more spaces, whatever Tag is, 0 or More spaces, a comma.

the second one is

a comma, 0 or more spaces, whatever Tag is, 0 or More spaces, a comma.

Scott Chamberlain
You're missing the negation '^' operator on the first capture. It means that the string cannot start with "0 or more whitespace characters"
mjmarsh
Thanks corrected.
Scott Chamberlain
Wrong. The "^" is only negation within a character set (so [^e] would match anything _except_ e). "^" at the beginning of a regex matches the beginning of the string.
Chris B.
A: 

It looks like that they are trying to match some kind of list of words delimited by colons. The first one is probably matching first item and the second one some item after the first one excluding the last one. I hope you will understand :).

A good source of information about regular expressions is at http://www.regular-expressions.info/

ZuseX4
Delimited by commas, not colons, you mean.
Chris B.
Yes, I meant commas. My bad. Sorry.
ZuseX4
+2  A: 

It's going to be difficult to tell what that regex means, without knowing what's in tag. In fact, it looks like that regex is broken (or, at least, doesn't properly escape inputs).

Roughly speaking, for the first regex:

  • The ^ says to match at the beginning of the string.
  • The (...) sets up a capturing group (which is available, although this example apparently doesn't use it).
  • The \s matches any white space characters (spaces, tabs, etc.)
  • The *? matches zero or more of the previous character (in this case, whitespace), and because it has a question-mark, it matches the minimum number of characters needed to make the rest of the expression work.
  • The (" + tag + @") inserts the contents of the tag into the regex. As I mention, that's dangerous, without escaping.
  • The (\s*?) matches the same as the before (the minimum number of whitespace characters)
  • The , matches a trailing comma.

The second regex is very similar, but looks for a starting comma (rather than the beginning of the string).

I like the Python documentation for Regular Expressions, but it looks like this site has a pretty good, basic introduction, with C# examples.

Chris B.
Thanks. Adding an escape input fixed the code. I'll be sure to scold the programmer who worked on this project before me!
Raven Dreamer
A: 

Once you have the very basic idea about regex (it's full of resources over there) I recommend you to use Expresso for creating your regular expressions.

Expresso editor is equally suitable as a teaching tool for the beginning user of regular expressions or as a full-featured development environment for the experienced programmer or web designer with an extensive knowledge of regular expressions.

Claudio Redi
A: 

Your premise is not correct. Regular expressions are not used to tell if two strings are equivalent, but rather if the input string matches a certain pattern.

The first test above looks for any text that does not contain "zero or more whitespace charaters" searching "non-greedy". Then matches the text of the variable "tag" in the middle, then "zero or more whitespace characters, non greedy" again.

The second one is very similar, except that it allows for beginning whitespace as long as it follows a comma.

It is hard to explain "non-greedy" in this context, especially involving whitespace characters, so look here for more information.

mjmarsh
A: 

A regular expression is a way to describe a set of strings that have some particular characteristics.

They don't merely need just to compare two strings.. what you usually do it to test if a string matches a particular regular expression. They can also be used to do simple parsing of a string in tokens that respect some patterns..

The good thing about regexps is that they allow you to express certain constraints inside a string keeping it general and able to match a group of strings that respect those constraints.. then they follow a formal specification that doesn't leave ambiguities around..

Here you can find a comparison table of various regular expression languages in many different programming languages and a specific guide for C# if you follow its link.

Usually the implementations for the various languages are quite similar since the syntax is somewhat standardized from the theoretical topics regexps come from, so any tutorial about regexp will be fine, then you'll just need to get into C# API.

Jack
+1  A: 

I'm not c# savvy but I can recommend an awesome guide to regular expressions that I use for Bash and Java programming. It applies to pretty much all languages:

http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=tmm_pap_title_0

It is totally worth $30 to own this book. It is VERY thorough and helped my fundamental understanding of Regex a lot.

-Ryan

SDGuero
I can second this recommendation. It's a fantastic book.
Chris B.
+1  A: 

Since you specifically tagged C#, I recommend the Regex Hero as a tool you can use to play around with them since it's running on .NET. It also lets you toggle the different RegexOptions flags as you would pass them into the constructor when creating a new Regex.

Also, if you're using a version of Visual Studio 2010 that supports extensions, I would take a look at the Regex Editor extension... it will popup whenever you type new Regex( and offer you some guidance and autocomplete for your regex pattern.

John Rasch
A: 

1) The first regex is trying to do a case-insensitive match starting at the beginning of the test string. It then matches optional whitespace, followed by whatever is in tag, followed by optional whitespace then finally a comma.

The second matches a string containing a comma, followed by optional whitespace, followed by whatever is in tag, followed by optional whitespace then finally a comma.

Thought it's for C# I recommend picking up the Perl Pocket Reference which has a great Regex syntax reference. It helped my out a lot when I was learning regexes 14 years ago.

James O'Sullivan
A: 

http://www.myregextester.com/ is a decent regular expression tester that also has an explain option for C# regexps - For Instance check out this example:

The regular expression:

(?-imsx:^(\s*?)(tagtext)(\s*?),)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \s*?                     whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the least amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    tagtext                  'tagtext'
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    \s*?                     whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the least amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
gnarf
A: 

A regular expression does not tell you if two strings match, but rather if a given string matches a pattern.

This site is my favorite for learning and testing regular expressions:

http://gskinner.com/RegExr/

It allows you to interactively test regular expressions as you write them, and provides a built-in tutorial.

Eric J.
A: 

Although it doesn't use C#, Rejex is a simple tool for testing and learning about regular expressions which includes a quick reference for the special characters

oliver