I'm having trouble with the needed regular expression... I'm sure I need to probably be using some combination of 'lookaround' or conditional expressions, but I'm at a loss.
I have a data string like:
pattern1 pattern2 pattern3 unwanted-groups pattern4 random number of tokens pattern5 optional1 optional2 more unknown unwanted junk separated with white spaces optional3 optional4 etc
where I have a matching expression for each of the 'pattern#' and 'optional#' groups (optional groups being groups that are not required in the data and therefore not always present), but I don't have any pattern (text is free-form) or group count to skip for the other sections other than all 'tokens' are separated by white space.
I've managed to figure out how to skip the unwanted stuff between the required groups but when I hit the optional groups, I'm lost. any suggestion on where I should be looking for hints/help?
Thanks
this is what I currently have:
pattern = re.compile(r'(?:(METAR|SPECI)\s*)*(?P<ICAO>[\w]{4}\s)*'
r'(?P<NIL>(NIL)\s)*(?P<UTC>[\d]{6}Z\s)*(?P<AUTOCOR>(AUTO|COR)*\s)*'
r'(?P<WINDS>[\w]{5,6}G*[\d]{0,2}(MPS|KT|KMH)\s)\s*'
r'.*?\s' #skip miscellaneous between winds and thermal data
r'(?P<THERM>[\d]{2}/[\d]{2}\s)\s*(?P<PRESS>A[\d]{4}\s)\s*'
r'(?:RMK\s)\s*(?P<AUTO>AO\d\s)*'
r'(?P<PEAK>(PK\sWND\s[\d]{5,6}/[\d]{2,4}))*'
r'(?P<SLP>SLP[\d]{3}\s)*'
r'(?P<PRECIP>P[\d]{4}\s)*'
r'(?P<remains>.*)'
)
example = "METAR KCSM 162353Z AUTO 07011KT 10SM TS SCT100 28/19 A3000 RMK AO2 PK WND 06042/2325 WSHFT 2248 LTG DSNT ALQDS PRESRR SLP135 T02780189 10389 20272 53007="
data = pattern.match(example)
It seems to work for the first 10 groups, but that is about it....
again thanks everybody