views:

427

answers:

7

I have a collection of unicode text files (exported from regedit) and I'd like to pull out all the lines with a certain text on them.

I've tried Grep for Windows and findstr but both can't seem to handle the unicode encoding. My results are empty, but when I use the -v option (show non-matching lines), the output shows a NUL between each character.

Are there any free options to perform a simple grep on Unicode files in Windows?

A: 

is cygwin an option for you? maybe the grep that is builtin behaves better than the one you tried...

regards

Atmocreations
+1  A: 

check out BareGrep. I think it will do what you want.

Muad'Dib
Pretty cool program but doesn't seem to work with unicode text -- am I missing something?
jacobsee
I personally have not tired it with Unicode, but their sales propaganda says it will. they could (and probably are), of course, be lying.
Muad'Dib
+2  A: 

Well, while findstr can't handle Unicode files directly, type does and findstr actually handles Unicode input without problems.

So what you need to do would just be

type myfile.txt | findstr "I'm searching for this"
> type uc-test.txt
Unicode test. äöüß
Another line
Something else
> findstr "Something" uc-test.txt

> findstr /v "Something" uc-test.txt
 ■U n i c o d e   t e s t .   õ ÷ ³ ▀
 A n o t h e r   l i n e
 S o m e t h i n g   e l s e
> type uc-test.txt | findstr "Another"
Another line
Joey
I've had no problem with findstr and unicode. Seems to work fine. Also ought to add that you can search with regular expressions by passing in the /r switch. Like grep it also has ignore case, and list files only etc etc.
Chris J
this works for a single file -- still looking for a grep replacement so that I can pick out a single line from each of many files, each in their own subdirectory
jacobsee
You can easily combine this with `for /r` to walk a directory tree recursively.
Joey
Thank you, I did get this working: `FOR /R %%D IN (*.txt) do type "%%d" | findstr /c:"Search text" >> outFile.txt` (Now I'd love to figure out a way to prefix each line with the name and/or timestamp of the file it was in, similar to the default behavior of grep.)
jacobsee
emacs 23 has unicode support, and a nice search system
zdav
A: 

perl -CSD -ne 'print if m{\Qyour text here\E}' file.txt

wrang-wrang
+1  A: 

definitely go with cygwin (using x server) - the latest supports utf8. At my last gig, I was doing a lot of work with CJK characters. Using cygwin's x server, you can search on any characters and display any characters that you have a fixed width font for. Also check out od and xxd which makes it easy to enter your searches using hex characters eg: $ echo '?' | grep $(echo '3f' | xxd -p -r)

andersonbd1
+1  A: 

I have not used windows for years, but I know two alternatives to grep which are written in interpreted language and therefore should run on any platform:

Both are command-line tool, but I assume you already have a solution for this if you have used grep for windows.

Have a look at them, I am sorry I can't help a fellow grepper better than this.

dalloliogm
+1  A: 

Just ran across grepWin which works perfectly for what I want here. Wish I would have found it earlier!

jacobsee