Some clarification. These data classes set one level above the primitives types, constraining data to make it meaningful.
Actually, they sit just above the RegexConstrainedString
and ConstrainedNumber<T>
classes, which is where we're talking about refactoring the constructor's validation code into a separate method.
The problem with refactoring the validation code, is that the Regex necessary for validation exists only in the subclasses of RegexConstrainedString
, since each subclass has a different Regex
. This means that the validation data is only available to the RegexConstrainedString
's constructor, not any of it's methods. So, if I factor out the validation code, callers would need access to the Regex
.
public class Password: RegexConstrainedString
{
internal static readonly Regex regex = CreateRegex_CS_SL_EC( @"^[\w!""#\$%&'\(\)\*\+,-\./:;<=>\?@\[\\\]\^_`{}~]{3,20}$" );
public Password( string value ): base( value.TrimEnd(), regex ) {} //length enforced by regex, so no min/max values specified
public Password( Password original ): base( original ) {}
public static explicit operator Password( string value ) {return new Password( value );}
}
So, when reading a value from the database or reading user input, the Password constructor forwards the Regex
to the base class to handle the validation. Another trick is that it trims the end characters automatically, in case the database type is char rather than varchar, so I don't have to remember to do it. Anyway, here is what the main constructor for RegexConstrainedString looks like:
protected RegexConstrainedString( string value, Regex subclasses_static_regex, int? min_length, int? max_length )
{
_value = (value ?? String.Empty);
if (min_length != null)
if (_value.Length < min_length)
throw new Exception( "Value doesn't meet minimum length of " + min_length + " characters." );
if (max_length != null)
if (_value.Length > max_length)
throw new Exception( "Value exceeds maximum length of " + max_length + " characters." );
value_match = subclasses_static_regex.Match( _value ); //Match.Synchronized( subclasses_static_regex.Match( _value ) );
if (!value_match.Success)
throw new Exception( "Invalid value specified (" + _value + "). \nValue must match regex:" + subclasses_static_regex.ToString() );
}
Since callers would need access to the subclass's Regex
, I think my best bet is to implement a IsValueValid
method in the subclass, which forwards the data to the IsValueValid
method in the RegexConstrainedString
base class. In other words, I would add this line to the Password class:
public static bool IsValueValid( string value ) {return IsValueValid( value.TrimEnd(), regex, min_length, max_length );}
I don't like this however, because I'm replicating the subclasses constructor code, having to remember to trim the string again and pass the same min/max lengths when necessary. This requirement would be forced upon all subclasses of RegexConstrainedString, and it's not something I want to do. These data classes like Password is so simple, because RegexConstrainedString handles most of the work, implementing operators, comparisons, cloning, etc.
Furthermore, there are other complications with factoring out the code. The validation involves running and storing a Regex match in the instance, since some data types may have properties that report on specific elements of the string. For example, my SessionID class contains properties like TimeStamp, which return a matched group from the Match stored in the data class instance. The bottom line is that this static method is an entirely different context. Since it's essentially incompatible with the constructor context, the constructor cannot use it, so I would end up replicating code once again.
So... I could factor out the validation code by replicating it and tweaking it for a static context and imposing requirements on subclasses, or I could keep things much simpler and just perform the object construction. The relative extra memory allocated would be minimal, as only a string and Match reference is stored in the instance. Everything else, such as the Match and the string itself would still be generated by the validation anyway, so there's no way around that. I could worry about the performance all day, but my experience has been that correctness is more important, because correctness often leads to numerous other optimizations. For example, I don't ever have to worry about improperly formatted or sized data flowing through my application, because only meaningful data types are used, which forces validation to the point-of-entry into the application from other tiers, be it database or UI. 99% of my validation code was removed as a no-longer-necessary artifact, and I find myself only checking for nulls nowadays. Incidentally, having reached this point, I now understand why including nulls was the billion dollar mistake. Seems to be the only thing I have to check for anymore, even though they are essentially non-existent in my system. Complex objects having these data types as fields cannot be constructed with nulls, but I have to enforce that in the property setters, which is irritating, because they otherwise would never need validation code... only code that runs in response to changes in values.
UPDATE:
I simulated the CLR function calls both ways, and found that when all data is valid, the performance difference is only fractions of a millisecond per thousand calls, which is negligible. However, when roughly half the passwords are invalid, throwing exceptions in the "instantiation" version, it's three orders of magnitude slower, which equates to about 1 extra sec per 1000 calls. The magnitudes of difference will of course multiple as multiple CLR calls are made for multiple columns in the table, but that's a factor of 3 to 5 for my project. So, is an extra 3 - 5 second per 1000 updates acceptable to me, as a trade off for keeping my code very simple and clean? Well that depends on the update rate. If my application were getting 1000 updates per second, a 3 - 5 second delay would be devastating. If, on the other hand, I was getting 1000 updates a minute or an hour, it may be perfectly acceptable. In my situation, I can tell you now that it's quite acceptable, so I think I'll just go with the instantiation method, and allow the errors through. Of course, in this test I handled the errors in the CLR instead of letting SQL Server handle them. Marshalling the error info to SQL Server, and then possibly back to the application, could definitely slow things down much more. I guess I will have to fully implement this to get a real test, but from this preliminary test, I'm pretty sure what the results will be.