views:

304

answers:

4

I need to filter all lines with words starting with a letter followed by zero or more letters or numbers, but no special characters (basically names which could be used for c++ variable).

egrep  '^[a-zA-Z][a-zA-Z0-9]*'

This works fine for words such as "a", "ab10", but it also includes words like "b.b". I understand that * at the end of expression is problem. If I replace * with + (one or more) it skips the words which contain one letter only, so it doesn't help.

EDIT: I should be more precise. I want to find lines with any number of possible words as described above. Here is an example:

int = 5;
cout << "hello";
//some comments

In that case it should print all of the lines above as they all include at least one word which fits the described conditions, and line does not have to began with letter.

+1  A: 

Assuming the line ends after the word:

'^[a-zA-Z][a-zA-Z0-9]+|^[a-zA-Z]$'
reko_t
I didn't know you can use "OR" with regex. That makes it much easier. Thanks.
Mike55
A: 

You have to add something to it. It might be that the rest of it can be white spaces or you can just append the end of line.(AFAIR it was $ )

kubal5003
+5  A: 

Your solution will look roughly like this example. In this case, the regex requires that the "word" be preceded by space or start-of-line and then followed by space or end-of-line. You will need to modify the boundary requirements (the parenthesized stuff) as needed.

'(^| )[a-zA-Z][a-zA-Z0-9]*( |$)'
FM
This is exactly what I was looking for. Many thanks!!!
Mike55
A: 

Your problem lies in the ^ and $ anchors that match the start and end of the line respectively. You want the line to match if it does contain a word, getting rid of the anchors does what you want:

egrep  '[a-zA-Z][a-zA-Z0-9]+'

Note the + matches words of length 2 and higher, a * in that place would signel chars too.

rsp