The primary cost with substring is the excising of the sub string into a new string. Using Reflector you can see this:
private unsafe string InternalSubString(int startIndex, int length, bool fAlwaysCopy)
{
if (((startIndex == 0) && (length == this.Length)) && !fAlwaysCopy)
{
return this;
}
string str = FastAllocateString(length);
fixed (char* chRef = &str.m_firstChar)
{
fixed (char* chRef2 = &this.m_firstChar)
{
wstrcpy(chRef, chRef2 + startIndex, length);
}
}
return str;
}
Now to get there (notice that that is not Substring()
) it has to go through 5 checks on length and such.
If you are referencing the same substring multiple times then it may well be worth pulling everything out once and dumping the giant string. You will incur overhead in the arrays to store all these substrings.
If it's generally a "one off" access then Substring it, otherwise consider partitioning up. Perhaps System.Data.DataTable
would be of use? If you're doing multiple accesses and parsing to other data types then DataTable
looks more attractive to me. If you only need one record in memory at a time then a Dictionary<string,object>
should be sufficient to hold one record (field names have to be unique).
Alternatively, you could write a custom, generic class that handles fixed-length record reading for you. Indicate the start index of each field and the type of the field. The length of the field is inferred by the start of the next field (exception is the last field which can be inferred from the total record length). The types can be auto-converted using the likes of int.Parse()
, double.Parse()
, bool.Parse()
, etc.
RecordParser r = new RecordParser();
r.AddField("Name", 0, typeof(string));
r.AddField("Age", 48, typeof(int));
r.AddField("SystemId", 58, typeof(Guid));
r.RecordLength(80);
Dictionary<string, object> data = r.Parse(recordString);
If reflection suits your fancy:
[RecordLength(80)]
public class MyRecord
{
[RecordFieldOffset(0)]
string Name;
[RecordFieldOffset(48)]
int Age;
[RecordFieldOffset(58)]
Guid Systemid;
}
Simply run through the properties where you can get the PropertyInfo.PropertyType
to know how to deal with the sub string from the record; you can pull out the offsets and total length from the attributes; and return an instance of your class with the data populated. Essentially, you could use reflection to pull out information to call RecordParser.AddField() and RecordLength() from my previous suggestion.
Then wrap it all up into a neat little, no-fuss class:
RecordParser<MyRecord> r = new RecordParser<MyRecord>();
MyRecord data = r.Parse(recordString);
Could even go so far to call r.EnumerateFile("path\to\file")
and use the yield return
enumeration syntax to parse out records
RecordParser<MyRecord> r = new RecordParser<MyRecord>();
foreach (MyRecord data in r.EnumerateFile("foo.dat"))
{
// Do stuff with record
}