ansaurus

Question

C# regular expressions - matching whole words?

Answer 1

+2 A:

It would be helpful to see a sample of the source text. but maybe this helps

var doc = @"asdfsdafjkj;lkjsadf asddf jsadf asdfj;lksdajf
sdafjkl;sjdfaas  sadfj;lksadf sadf jsdaf jf sda sdaf asdf sad
jasfd sdf sadf sadf sdajlk;asdf
this_file_name asdfsadf asdf asdf asdf 
asdf sadf asdfj asdf sdaf sadfsadf
sadf asdf this_file_name asdf asdf ";

var reg = new Regex("this_file_name", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var matches = reg.Matches(doc);

bendewey 2009-02-07 02:16:42

The Multiline modifier is not needed.

Alan Moore 2009-03-24 11:22:16

@Alan M, why not?

bendewey 2009-03-24 18:13:11

As Alan pointed out, the `RegexOptions.Multiline` is not needed. Read its documentation. It only makes a difference if you’re using `^` and/or `$`.

Timwi 2010-10-13 14:46:36

Answer 2

A:

If I understand your problem correctly, I think a regular expression is the wrong tool for the job. I'll assume your file names are separated with some kind of delimiter (like commas or new lines).

If this is the case, use String.Split to put all file names into an array, sort the array alphabetically, then perform a binary search against the sorted array for each item in the "collection" you mentioned. I'm pretty sure that this is the most computationally efficient way to perform the task.

When you say "LARGE" text files, think about their size relative to the machines this program will be running on. A 1 MB text file may seem large, but it will easily fit into the memory of a machine with 2 GB RAM. If the file is considerably larger compared to the memory of your client machines, read the file in chunks at a time. This is called buffering.

James Jones 2009-02-07 03:26:34

Answer 3

+1 A:

Perhaps break your document into tokens by splitting on space or non word characters first?

After, I think a regex that might work for you would look something like this:

Regex r = new Regex(@"([\w_]+)");

Scott Hoffman 2009-02-07 03:29:32

ansaurus

tags:

views:

answers:

C# regular expressions - matching whole words?

related questions