ansaurus

Question

Answer 1

+1 A:

Perl has a way better regex syntax than grep (much more powerful), it has UTF8 and UTF16 support, but I'm not sure how good it is at guessing the encoding... if you tell it which encoding to use, though, it can read these files without any issues and run regexes over them. You'll have to write yourself a tiny Perl program for that (your own micro-grep implementation in Perl so to say), but that isn't too hard. Perl exists for all major operating systems.

Mecki 2009-03-05 00:26:49

There are even a few examples of very basic grep replacements written in Perl throughout the Perldoc website. I believe they're generally about 5 or 6 lines, though they would be more if you wanted to add any sort of sophisticated command-line parsing.

Chris Lutz 2009-03-05 03:24:25

Answer 2

+1 A:

ack as perl-based grep replacement?

You'll definitely want to check out ack.

It supports Unicode encodings, and is basically grep, but better.

try a matching Unicode locale with grep

If you are under Linux, Unix, etc. you may want to change your LANG envariable to an encoding to match your documents.

Check your locale first. Here is what mine is set to by default on my MacBook Pro:

 $ locale 
 LANG="en_US.UTF-8"
 LC_COLLATE="en_US.UTF-8"
 LC_CTYPE="en_US.UTF-8"
 LC_MESSAGES="en_US.UTF-8"
 LC_MONETARY="en_US.UTF-8"
 LC_NUMERIC="en_US.UTF-8"
 LC_TIME="en_US.UTF-8" 
 LC_ALL=

say, under bash:

$ LANG="foo" grep 'gotta be found now' file.name

something a little more permanent (be careful with this):

$ export LANG="foo"
$ grep 'bar' mitz.vah

popcnt 2009-03-05 03:16:54

popcnt 2009-03-05 05:48:19

ansaurus

tags:

views:

answers:

An encoding-savvy grep replacement?

ack as perl-based grep replacement?

try a matching Unicode locale with grep

related questions