I'm starting to learn Regular Expressions and I want to know: In which cases is better to use them?
When you are trying to find/replace/validate complicated string patterns.
Regular expressions is a form of pattern matching that you can apply on textual content. Take for example the DOS wildcards ? and * which you can use when you're searching for a file . That is a kind of very limited subset of RegExp. For instance, if you want to find all files beginning with "fn", followed by 1 to 4 random characters, and ending with "ht.txt", you can't do that with the usual DOS wildcards. RegExp, on the other hand, could handle that and much more complicated patterns.
Regular expressions are, in short, a way to effectively
- handle data
- search and replace strings
- provide extended string handling.
Often a regular expression can in itself provide string handling that other functionalities such as the built-in string methods and properties can only do if you use them in a complicated function or loop.
There are some cases where, if you need better performance, you should avoid regular expressions in favor of writing code. An example of this is parsing very large CSV files.
Regular expressions are a dsl (domain specific language) for parsing text. Just like xpath is a dsl for traversing xml. It is essentially a mini language inside of a general purpose language. You can accomplish quite a bit in a very small amount of code because it is specialized for a narrow purpose. One very common use for regular expressions is checking if a string is an email address, phone number, ssn, etc...
I use regular expressions when comparing strings (preg_match), replacing substrings (sed,preg_replace), replacing characters (sed,preg_replace), searching for strings in files (grep), splitting strings (preg_split) etc.
It is a very flexible and widespread pattern expression language and it is very useful to know.
BUT! It's like they say about poker, it's very easy to learn, but very hard to master.
I just came across a question that i thought was perfect for a RegEx, have a look and decide for yourself.
There are also cases where regular expressions are >>NOT<< appropriate (in general; there are always exceptions).
- Parsing HTML
- Parsing XML
In the above cases a DOM parser is almost always a better choice. The grammars are complex and there are too many edge cases, such as nested tags.
Also be sure to consider future maintenance programmers (which may be you). Comments and/or well-chosen method/constant/variable names can make a world of difference, especially for developers not fluent in regular expressions.
Regular expressions can be especially useful for validating the format of free text input. Of course they can't validate the correctness of data, just its format. And you have to keep in mind regional variations for certain types of values (phone numbers or postal codes for example). But for cases where valid input can be defined as a text pattern, regexes make quick work of the validation.