tags:

views:

78

answers:

4

I have a huge file, and I want to blow away everything in the file except for what matches my regex. I know I can get matches and just extract those, but I want to keep my file and get rid of everything else.

Here's my regex:

"Id":\d+

How do I say "Match everything except "Id":\d+". Something along the lines of

!("Id":\d+) (pseudo regex) ?

I want to use it with a Regex Replace function. In english I want to say:

Get all text that isn't "Id":\d+ and replace it with and empty string.

+1  A: 

Sorry, but I totally don't get what your problem is. Shouldn't it be easy to grep the matches into a new file?

Yoo wrote:

Get all text that isn't "Id":\d+ and replace it with and empty string.

A logical equivalent would be:

Get all text that matches "Id":\d+ and place it in a new file. Replace the old file with the new one.

splash
Micah mentions in his question that he doesn't want to do that
Rohith
But he said "I want to blow away everything in the file except for what matches my regex", so I understand it as he wants all lines that match his regex. I find it a little bit confusing.
splash
+1  A: 

well, the opposite of \d is \D in perl-ish regexes. Does .net have something similar?

bemace
Yes, .NET does support it.
Benjamin Anderson
A: 

I haven't use .net before, but following works in java

System.out.println("abcd Id:12351abcdf".replaceAll(".*(Id:\\d+).*","$1"));

produces output

Id:12351

Although in true sense it doesnt match the criteria of matching everything except Id:\d+, but it does the job

Hemang
Try and see what happens when you have two occurrences of `Id:234` in your string...
Tim Pietzcker
+1  A: 

Try this:

string path = @"c:\temp.txt"; // your file here
string pattern = @".*?(Id:\d+\s?).*?|.+";
Regex rx = new Regex(pattern);

var lines = File.ReadAllLines(path);
using (var writer = File.CreateText(path))
{
    foreach (string line in lines)
    {
        string result = rx.Replace(line, "$1");
        if (result == "")
            continue;

        writer.WriteLine(result);
    }
}

The pattern will preserve spaces between multiple Id:Number occurrences on the same line. If you only have one Id per line you can remove the \s? from the pattern. File.CreateText will open and overwrite your existing file. If a replacement results in an empty string it will be skipped over. Otherwise the result will be written to the file.

The first part of the pattern matches Id:Number occurrences. It includes an alternation for .+ to match lines where Id:Number does not appear. The replacement uses $1 to replace the match with the contents of the first group, which is the actual Id part: (Id:\d+\s?).

Ahmad Mageed