I'm trying to build an efficient string matching algorithm. This will execute in a high-volume environment, so performance is critical.
Here are my requirements:
- Given a domain name, i.e. www.example.com, determine if it "matches" one in a list of entries.
- Entries may be absolute matches, i.e. www.example.com.
- Entries may include wildcards, i.e. *.example.com.
- Wildcard entries match from the most-defined level and up. For example, *.example.com would match www.example.com, example.com, and sub.www.example.com.
- Wildcard entries are not embedded, i.e. sub.*.example.com will not be an entry.
Language/environment: C# (.Net Framework 3.5)
I've considered splitting the entries (and domain lookup) into arrays, reversing the order, then iterating through the arrays. While accurate, it feels slow.
I've considered Regex, but am concerned about accurately representing the list of entries as regular expressions.
My question: what's an efficient way of finding if a string, in the form of a domain name, matches any one in a list of strings, given the description listed above?