I need this grep call:
grep "field3=highland" data_file
to return both results with "field3=highland" as well as "field3=chicago highland". How can I redesign the grep call to account for both scenarios?
I need this grep call:
grep "field3=highland" data_file
to return both results with "field3=highland" as well as "field3=chicago highland". How can I redesign the grep call to account for both scenarios?
If you mean to match the third field of the line against you string (rather than matching a literal "field3=highland
") grep
is not the right tool for you. In that case consider awk
:
awk '$3=="highland" { print $0 }' <input file>
for an exact match or
awk '$3~".*highland.*" { print $0 }' <input file>
to match with a regular expression.
Note that awk
assumes a space as the field separator, but you can use "-F <field separator>
" to change it on the command line so that
awk -F : '$1~".*oo.*" {print $0}' /etc/passwd
grabs the root line from the password file.
$ grep 'f=h\|f=c h' << eof
> f=c h
> f=h
> not
> going f= to
> match
> eof
f=c h
f=h
$
Or, if the idea is that c
can be anything, perhaps something like:
$ grep 'f=.*h'
If you want to get all lines with 'field3=' followed by any characters followed by 'highland', you need:
grep 'field3=.*highland' data_file
The '.'
means any character and the '*'
means zero or more occurences of the last pattern. So '.*'
is effectively any string, including the empty one.
goe,
My advice would be to spend considerably more effort on composing your question.
You mention "grep tool (Linux)" and "SQL LIKE operator" ... in the subject ... then include a frankly unintelligible question which seems to be about matching two different variations of a sample line of input.
You're getting answers which are only guesses at what your actual question might be.
I think the question is something like:
"I have data which contains some lines like: field3=highland
and field3=other stuff highland
and I want to match all those lines (filtering out everything else)."
The simplest regular expression which might work would be:
grep "field3=.*highland
... but this would match things like "field3=highlands" and "field3=thighland" and "myfield3=...", etc. Also it would fail to match "field3 =..." (with the space between the field designator and the equal sign).
Is the "field3" supposed to be at the beginning of the line? Is the highland supposed to be anchored at the end of the line? Should "highland" only match if it's not a substring in a longer "word" (i.e. if the character before the "h" and after the "d" is non-alphabetic)?
There are a great many questions about your expected inputs and desired results ... which will have considerable effect on the sorts of regular expressions that will match or not.
The reference to SQL LIKE expressions and it's % tokens is mostly useless. For the most part a % token in an SQL LIKE expression is equivalent to the ".*" regular expression. If you have a snippet of SQL that works (over the same range of inputs) and you're trying to find a functionally equivalent regular expression ... then you should take the time to paste in the working SQL expression.
Also there's nothing particularly specific to grep
(Linux or otherwise) in this question. It would be better tagged as a question about regular expressions.
In general there are three or four common abstractions for matching text against patterns: regular expressions (with many variants), "glob" and "wildmat" patterns (shell and MS-DOS like), and SQL LIKE expressions.
Of these regular expressions are the most commonly used by programmers ... and they are, by far, the most complicated. They range from the oldest simplest variations (as included in the historical UNIX ed
line editors from which grep
was orginally excerpted), to the more powerful "extended" versions (typified by egrep
or grep -E
) and up to the insanely elaborate "Perl compatible regular expressions" (now widely used by other programming languages as the PCRE libraries).
Glob patterns are far simpler. They support "shell wild cards" ... originally just ? and * (any single character, or any number of any characters, respectively). Later enhancements which are supported by modern shells and other tools include support for character classes (such as [0-9] for any digit and [a-zA-Z] for any letter, and so on). Some of these also support negated character classes.
Because glob patterns use special characters (? and *) which are similar to regular expression syntax, albeit for different purposes ... and because they use almost identical syntax for describing character classes and their complements, glob patterns are often mistaken for regular expressions. When I teach classes in systems administration I usually have to make this point so that students "unlearn" the sloppiness of terminology that's so common.
The old MS-DOS "wildmat" or "wildcard matching" can be thought of as a variant of the original glob patterns. It only supports the ? and * meta-characters ... with mostly the same semantics as UNIX shell globbing. However, I counsel against thinking of them this way. The underlying semantics of how an MS-DOS command line handles arguments containing these patterns is sufficiently different that thinking of them as "globs" is a trap. (A command like: COPY *.TXT *.BAK
is perfectly sensible under MS-DOS while a UNIX command like: cp *.txt *.bak
is wrong for almost any reasonable situation).
Obviously, as I've described above, the SQL LIKE expression is quite similar to a UNIX glob. There are only two "special" or "meta" characters in most basic SQL LIKE implementations % (analogous to *) and _ (analogous to ?).
Notice the weasel words here, though. I won't claim that % is the same as a glob * nor that _ is the same as a glob's ? character. There may be some corner cases (regarding how these might batch at the beginnings or endings of strings, or adjacent to whitespace etc). There may be differences among different implementations of SQL and there may even be some cruftier versions of the UNIX/Linux fnmatch (globbing) libraries that would make a difference if you tried to rely on such claims.