views:

658

answers:

3

The reply in my post states that pattern "*@he.com" is not a "correct regexp". Actually, it is not a regex at all, rather wildcard, things like:

find . -iname ".gi*"

The above example in Regex would be

find . -iname "\.gi.*"

Questions about Regex and Wildcards

  1. How can I use only Regex instead of wildcards?
  2. Where do you really need wildcards and globbing if you can use Regex?
  3. Have Regexes evolved from wildcards or vice versa?

  4. Does the citation mean "Regex is the language, while wildcards are alphabets"?

Regular expressions

In many regular expression implementations, the period (.) is the wildcard character for a single character. (Source)

+3  A: 

Described in the man page:

-name pattern

True if the last component of the pathname being examined matches pattern. Special shell pattern matching characters ([, ], *, and ?) may be used as part of pattern. These characters may be matched explicitly by escaping them with a backslash (\).

So in other words, patterns that are usable in shell glob patterns are usable by find.

Man pages can generally tell you a lot. ;)

$ man find

for more information.

mipadi
The shell matching system is called `glob`.
dmckee
A: 

My initial question had a wrong premise; they are wildcards, not regexes! Glob-program handles wildcards.

Regular expressions

Note that wildcard patterns are not regular expressions, although they are a bit similar. First of all, they match filenames, rather than text, and secondly, the conventions are not the same: for example, in a regular expression '*' means zero or more copies of the preceding thing. Now that regular expressions have bracket expressions where the negation is indicated by a '^', POSIX has declared the effect of a wildcard pattern "[^...]" to be undefined.

The explanation is not 100% thorough. For example, you can easily match filenames with Regex.

Masi
You can, but regexes tend to have a lot of overhead. I've found that, in general, sufficiently precise globs and the use of brackets is more than I need in the shell. If I need to do something more complicated that actually requires the use of regexes, it tends to be complicated enough to whip up a quick Perl script for rather than continuing to do through the shell.
Chris Lutz
Lutz: +1 Good point.
Masi
+3  A: 

I think your confusion is based on the differences between shell-globbing wildcards (the * character) and the regular expression symbol (the * character). Regexes are not shell-globbing, they are a lot more powerful and useful, but for everyday shell use, wildcards and shell-globbing are "good enough."

  1. How can I use only Regex instead of wildcards?

Don't use the shell. Write a Perl/Python/Ruby/[your-choice-of-scripting-language-here] script to do the job for you. It'll probably be faster, since it won't have to fork so much.

  1. Where do you really need wildcards and globbing if you can use Regex?

No. But in most shells, you don't have regexes, so you have globs. Think of them as a poor-man's regex.

  1. Have Regexes evolved from wildcards or vice versa?

Regexes came from set theory, and specifically early text editors (one early Unix text editor called ed had a regex-like feature, which was then re-used in a little program called grep, which you might have heard of). I imagine wildcards have just been features of the shell. They can't be hard to implement, so shell writers would add them fairly quickly, and with little overhead.

Chris Lutz
History: I read from a BBC article that some neuroscientists invented Regex before formal mathematical formulation. I can look for it if someone wants.
Masi
Ps: It would be interesting to know more about the history of Wildcards. Regex is clear because it is just a CA.
Masi