tags:

views:

128

answers:

6

I have this text file that contains approximately 22 000 lines, with each line looking like this:

12A4 (Text)

So it's in the format 4-letter/number (Hexdecimal) and then text. Sometimes there is more than one value in text, separated by a comma: A34d (Text, Optional)

Is there any efficient way to search for the Hex and then return the first text in the parentheses? Would it be much more effective if I stored this data in SQLite?

+1  A: 
var lines = ...;

var item = (from line in lines
            where line.StartsWith("a34d", StringComparison.OrdinalIgnoreCase)
            select line).FirstOrDefault();

//if item == null, it is not found

var firstText = item.Split('(',',',')')[1];

It works and if you want to strip leading and trailing whitespaces from firstText then add a .Trim() in the end.

For splitting a text into several lines, see my two answers here. http://stackoverflow.com/questions/3545833/how-can-i-convert-a-string-with-newlines-in-it-to-separate-lines/3545853#3545853

lasseespeholt
+1  A: 

Use a StreamReader to ReadLine and you can then check if the first characters are equal to what you search and if it is you can do

string yourresult = thereadline.Split(new string[]{" (",","},StringSplitOptions.RemoveEmptyEntries)[1]
Wildhorn
+5  A: 

Example using substring and split.

        string value = "A34d (Text, Optional)";

        string hex = value.Substring(0, 4);
        string text = value.Split('(')[1];

        if (text.Contains(','))
            text = text.Substring(0, text.IndexOf(','));
        else
            text = text.Substring(0, text.Length-1);

For searching use a Dictionary.

Chris Persichetti
Marked this as answer. Pretty easy to understand, no regex :), and since my file is so small (282kb), StreamReader can easily iterate through the entire file in no time at all. I'll probably explore some different options later, especially with loading data into SQLite.
DMan
Bear in mind that if you need to query several times the data you'll be reading over and over the information. Not because it easy means you have to do it. Use a dictionary and keep that information in RAM and access it when ever you want during the program execution. See my answer for more details.
OscarRyz
@Oscar- I agree with you. I just mean his splitting code is easy to understand, which was the main thing I was looking for.
DMan
+1  A: 

If you want to search for the Hex value more than once, you definitely want to store this in a lookup table of some sort.

This could be as simple as a Dictionary<string, string> that you populate with the contents of your file on startup:

  • read each line (StreamReader.ReadLine)
  • hexString = substring of first 4 characters in line
  • store the rest of the string

To find the first part, create a function that retrieves "A" from "(A, B, C, ...)"

If you can rule out commas "," in "A", you are in luck: Remove the parentheses, split on "," and return first substring.

Daren Thomas
So do you recommend I read the data every time I open the program and store it into a Dictionary, where I then search the dictionary?
DMan
That really depends on your program. I cannot really recommend anything. Just saying it "could be as simple as". It depends on how often you use the lookup table, how long it takes to load, how often it changes, how often your program runs etc. There is no "optimal solution" without answering these questions first.
Daren Thomas
+3  A: 

That's probably < 2 mb of data.

I think you can:

  1. Read the whole file
  2. Split each line in key ( the hex number ) and value ( the remaining ) Chris Persichetti answer is excellent for that
  3. Store each line in a dictionary ( using the number as int , nor as string )

    d = Dictionary<int,string>
    d.put( int.Perse( key ), value );
    
  4. Keep that dictionary in memory and then perform a very quick look up by the id

OscarRyz
Quick question- Since my keys are hex (with numbers), doesn't that mean I can't store my key as a int?
DMan
Okay, I just decided to store it as string, string. I've implemented your solution, I'm sure I'll find some speed increases once I actually get the _other_ part of my program working!
DMan
Nope, they are actually quite parseable. I'm not sure, but probably adding "0x" would do ( at least in Java :P )
OscarRyz
+2  A: 

There are elegant answers posted already, but since you requested regex, try this:

var regex = @"^(?<hexData>.{4}\s(?<textData>.*)$)";
var matches = Regex.Matches(textInput, regex, RegexOptions.IgnoreWhiteSpace | RegexOptions.Singleline);

then you parse through matches object to get whatever you want.

NeverDie