views:

226

answers:

2

Hi,

I have a file which represents items, in one line there's Item GUID followed by 5 lines describing the item.

Example:

Line 1: Guid=8e2803d1-444a-4893-a23d-d3b4ba51baee name= line1 
Line 2: Item details = bla bla   
.  
.  
Line 7: Guid=79e5e39d-0c17-42aa-a7c4-c5fa9bfe7309 name= line7 
Line 8: Item details = bla bla    
.  
. 

I am trying to access this file first to get the GUIDs of the items meet the criteria provided using LINQ e.g. where line.Contains("line1").. This way I will get the whole line, I will extract the GUID from there, I want to pass this GUID to another function which should access the file "again", find that line (where line.Contains("line1") && line.Contains("8e2803d1-444a-4893-a23d-d3b4ba51baee") and reads the next 5 lines starting from that line.

Is there any efficient way to do so?

+3  A: 

I don't think it really makes sense to use LINQ entirely given the requirements of what you need to do and given that the index of the line in the array is fairy integral. I would also recommend doing everything in one pass - opening the file multiple times won't be as efficient as just reading everything once and processing it immediately. As long as the file is structured as well as you describe, this won't be terribly difficult:

    private void GetStuff()
    {
        var lines = File.ReadAllLines("foo.txt");
        var result = new Dictionary<Guid, String[]>();
        for (var index = 0; index < lines.Length; index += 6)
        {
            var item = new
            {
                Guid = new Guid(lines[index]),
                Description = lines.Skip(index + 1).Take(5).ToArray()
            };
            result.Add(item.Guid, item.Description);
        }
    }
Daniel Schaffer
Thanks Daniel. I am a bit concered about big files in this case, I mean readin the file from beginning to end, sometimes the files am dealing with go up to 20MB.. Is there any perf conerns using this way?
OneDeveloper
The other thing is that, each line with GUID will have a "type", and each type will have a different set of actions. So, I will find the type from that line, compare it to a list of types, and then proceed... I don’t feel good doing that to every line in the file!
OneDeveloper
You should post a snippet of the raw data file if you need a solution more specific to your situation.
xcud
If you're concerned about performance, why do you think opening and accessing the file multiple times would *help* that?
Daniel Schaffer
+1  A: 

I tried a couple different ways to do this with LINQ but nothing allowed me to do a single scan of the file. For this scenario you're talking about I would go down to the Enumerable level and use the GetEnumerator like this:

public IEnumerable<LogData> GetLogData(string filename)
{
    var line1Regex = @"Line\s(\d+):\sGuid=([0123456789abcdefg]{8}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{12})\sname=\s(\w*)";
    int detailLines = 4;

    var lines = File.ReadAllLines(filename).GetEnumerator();
    while (lines.MoveNext())
    {
        var line = (string)lines.Current;
        var match = Regex.Match(line, line1Regex);
        if (!match.Success)
             continue;

        var details = new string[detailLines];
        for (int i = 0; i < detailLines && lines.MoveNext(); i++)
        {
            details[i] = (string)lines.Current;
        }

        yield return new LogData
        {
            Id = new Guid(match.Groups[2].Value),
            Name = match.Groups[3].Value,
            LineNumber = int.Parse(match.Groups[1].Value),
            Details = details
        };
    }
}
bendewey