Hi,
I have a word list, but it has some words like East's
I need to find the words, those only contain a-z and A-Z, from a word list. How to do that.
I am using grep. What should I put after grep
grep *** myfile.txt
Thanks!
Hi,
I have a word list, but it has some words like East's
I need to find the words, those only contain a-z and A-Z, from a word list. How to do that.
I am using grep. What should I put after grep
grep *** myfile.txt
Thanks!
The regexp you want is ^[a-zA-Z]+$
For grep:
vinko@parrot:~$ more a.txt
Hi
Hi Dude
Hi's
vinko@parrot:~$ egrep ^[a-zA-Z]+$ a.txt Hi
In pseudocode:
regexp = "^[a-zA-Z]+$";
foreach word in list
if regexp.matches(word)
do_something_with(word)
[a-z]+
using the case insensitive option, or
[A-Za-z]+
without the case insensitive option.
Post the data and the langage for more help.
for grep
egrep -i '^[a-z]+$' wordlist.dat
i can't remember what metachars need escaping and not if it doesn't work, try \[a-z\]\+ or any similar combination!
Use fgrep
if you want to match against a word list.
fgrep word_list_file myfile.txt
The grep syntax is:
grep '^[[:alpha:]]\+$' input.txt
Documentation for grep's pattern syntax is here.
Or filter out all words that contain funnies
grep -v '[^a-zA-Z]'Is there a prize for the shortest answer? :)
Note that there are portability differences between [[:alpha:]] and [A-Za-z]. [A-Za-z] works in more versions of grep, but [[:alpha:]] takes account of wide character environments and internationalization (accented characters for example when they are included in the locale).