Hi I have a difficult regex problem that I have tried and have a partial solution, but haven't gotten to work perfectly yet. Essentially, I have to parse a document that is in an outline format such as this:
1. HEY BUDDY
1.A. Your the best
1.A.1 And i know
1.A.2. this because
1.A.3 it is
1.A.4. the
1.A.5. truth i
1.A.6. tell ya.
1.B. so anyway
1.B.1. one two
1.B.2 three four!
2. i have
2.A. many numbers
2.A.1. hahaha
2.A.1.A. 3.2 ppl
2.A.1.B. are in
2.A.1.C my head.
2.A.1.D. yes exactly
2.A.2. 3.21
2.A.3. if you dont
2.A.4 trust me
2.B then
2.B.1. youre
2.B.2.soo wrong
2.C. its not
3. even funny.
3.A. but it
3.B. kind of
3.C. is a little
4. bit i
4.A. believe.
4.A.1. talk to me
4.A.2. more about
4.B. these ppl
4.B.2. in your head.
That is my test document... I need to find each of the new "bullets" in this document and then save the text in between them and do more computation. All that I haven't figured out is how to acurately identify the different outline numbers using regex. (I know it could probably be done other ways then regex but I'm in the process of learning regex and I have my mind set on doing it this way) What I've come up with now is this:
(\b)(([1-9][0-9]?)(\.))([A-Z])?((\.)([1-9][0-9]?)((\.)([A-Z]))?)*(\.)?(\b)
The problem with this is that it isn't recognizing the 1., 2., 3., or 4., and it IS picking up "3." from the 3.2 and 3.21 in the text. (And yes i will have doubles in the text like this) The format for the outline is always #.A-Z.#.A-Z.#.A-Z... and numbers should never be higher then 99.
Thanks for any help.