People don't seem to get the fact that they don't have to use REs (or SQL, but that's another issue :-) for every task, especially those with procedural code is cleaner.
If you're limiting yourself to using REs, I think that's a lack of vision.
I would simply process the string, token by token, where a token is one of:
- a non-delimiter.
- a column delimiter.
- a row delimiter.
Start with an empty column list, then extract (using indexOf/substring stuff) up to the first next row/column delimiter, adding that text to the column list.
If the delimiter is column, keep going.
If the delimiter is row, check the number of columns and process the list as required.
If there's no final row delimiter and the column list is non-empty, then the format was invalid.
Sorry if you were really after an RE method but I don't believe it's required (or even desirable) here.
Pseudo-code (only a first cut, may be slightly buggy) follows:
def processStr(s):
if not s.endsWith ("|ROW-DELIM|"):
error "Invalid format"
columnList = []
while not s.equals (""):
nextRowDelim = s.indexOf ("|ROW-DELIM|")
nextColDelim = s.indexOf ("|COL-DELIM|")
if nextColDelim == NotFound:
nextColDelim = nextRowDelim + 1
nextDelim = minimumOf (nextRowDelim,nextColDelim)
columnList.add (s.substring (0, nextDelim))
s = s.substring (nextDelim)
if nextDelim == nextRowDelim:
s = s.substring (length ("|ROW-DELIM|"))
processColumns (columnList)
columnList = []
else:
s = s.substring (length ("|COL-DELIM|"))
You could easily add code to check the correct number of columns in this code, or in processColumns()
, if that was your desire.