views:

68

answers:

5

I'm writing some code that takes a report from the mainframe and converts it to a spreadsheet.

They can't edit the code on the MF to give me a delimited file, so I'm stuck dealing with it as fixed width.

It's working okay now, but I need to get it more stable before I release it for testing.

My problem is that in any given line of data, say it could have three columns of numbers, each five chars wide at positions 10, 16, and 22. If on this one particular row, there's no data for the last two cols, it won't be padded with spaces; rather, the length of the string will be only 14. So, I can't just blindly have

dim s as string = someStream.readline
a = s.substring(10, 5)
b = s.substring(16, 5)
c = s.substring(22, 5)

because it'll choke when it substrings past the length of the string.

I know I could test the length of the string before processing each row, and I have automated the filling of some of the vsariables using a counter and a loop, and using the counter*theWidthOfTheGivenVariable to jump around, but this project was a dog to start with (come on! turning a report into a spreadsheet?), but there are many different types of rows (it's not just a grid), and the code's getting ugly fast. I'd like this to be clean, clear, and maintainable for the poor sucker that gets this after me.

If it matters, here's my code so far (it's really crufty at the moment). You can see some of my/its idiocy in the processSection#data subs

So, I'm wondering

1) is there a way baked in to .NET to have string.substring not error when reading past the end of a string without wrapping it in a try...catch?

and

2) would it be appropriate in this situation to write a new string class that inherits from string that has a more friendly substring function in it?

ETA: Thanks for all the advice and knowledge everyone. I'll go with the extension. Hopefully one of these years, I'll get my chops up enough to pay someone back in kind. :)

+2  A: 

You can't derive from String, so option 2 is out of the question. I'd argue that it would be a bad use of inheritance anyway, and ReadLine() wouldn't return an instance of your new string...

You could write an extension method though (assuming you're using VB9) - SafeSubstring or something like that. I don't believe there's anything in the framework already.

Jon Skeet
"...and ReadLine() wouldn't return an instance of your new string..."Arg. Totally right. I tend to jump right into things without thinking about them. Took me some tears to quit committing ad hoc db changes without double checking the results. This would have been similar.But it's working out great, and I learned more than one new thing, and am really excited about these extension methods. I had never heard of them before. Thanks again.
aape
+3  A: 

I would implement the whole thing a little differently -

Right now, you have to parse every character sent from the mainframe at least three times: once when you create the string, and once when you parse the string, and once as you create a string for each new variable. With fixed-width lines, I'd go for a state machine that reads from the stream character by character. This isn't as hard as it sounds, and it will perform loads better because it only processes each character once. It's how the .Net framework itself handles this kind of thing, if you look at functions like String.Format().

Joel Coehoorn
+1. Addresses the overall problem.
Patrick Karcher
Thanks. I'll look into this to learn how to do it. I'm more of a mechanic than a engineer when it comes to programming, if you know what I mean - I don't understand a lot of the "Predicate<(Of <(T>)>)" type of stuff, though I intend to learn.In that same vein, as soon as I send this, I'm going to have to go look up what a "state machine" is.Thanks again for the advice.
aape
+1  A: 

Option 2 seems like potentially dangerous obfuscation and serious pain to the next developer unless you make it VERY clear somehow that the new API will not choke on certain array operations. (AS per others' comments, you can't inherit the string class, but you could simply wrap it... still, i don't think that this is the recommended approach.)

Option 1, the try/catch, is ugly and will be slow if you hit that error frequently.

I honestly think that your best approach would be to explicitly check for the bounds:

To make it clear and easy to read you could use a static utility method (perhaps contained in a utility class.) This actually gives you the best of both worlds: Cleaner code with custom bounds checking and it will be very clear to whomever picks up the code after you.

Paul Sasik
+2  A: 

You could write an extension method that would take care of the logic you are trying to incorporate. I'm not a VB.NET guy at heart, but I think this is the correct syntax for an extension method:

''# Comment to fix broken prettify.js syntax highlighting for attributes in vb.net
<Extension()> _
Public Shared Function SafeSubstring(ByVal aString As String, ByVal startIndex As Integer, ByVal length As Integer) As String
    If (startIndex + length > aString.Length) Then
        Return aString.Substring(startIndex)
    Else
        Return aString.Substring(startIndex, length)
    End If
End Function
Nick
If you're going to use this function, you might want to add a check that startIndex is also within bounds.
Joel Coehoorn
@Joel - Agreed. Was just trying to bang something out quick for the guy so he got the idea.
Nick
+1  A: 

Maybe consider a different approach and use some kind of regular expression or sscanf-equivalent for parsing the lines, and avoid all the explicit substrings?

David Gelhar