I'm working on some doc file, that when copied and pasted into a text file, gives me the following sample 'output':
ARTA215 ADVANCED LIFE DRAWING (3 Cr) (2:2) + Studio 1 hr. This advanced study in drawing with the life .... Prerequisite: ARTA150 Lab Fee Required ARTA220 CERAMICS II (3 Cr) (2:2) + Studio 1 hr. This course affords the student the opportunity to ex... Lab Fee Required ARTA250 SPECIAL TOPICS IN ART This course focuses on selected topic.... ARTA260 PORTFOLIO DEVELOPMENT (3 Cr) (3:0) The purpose of this course is to pre.... BIOS010 INTRODUCTION TO BIOLOGICAL CONCEPTS (3IC) (2:2) This course is a preparatory course designed to familiarize the begi.... BIOS101 GENERAL BIOLOGY (4 Cr) (3:3) This course introduces the student to the principles of mo... Lab Fee Required BIOS102 INTRODUCTION TO HUMAN BIOLOGY (4 Cr) (3:3) This course is an introd.... Lab Fee Required
I want to be able to parse it so that 3 fields are generated and I could output the values into a .csv file.
The line breaks, spacing, etc... is how it could be at any point during this file.
My best guess is for a regex to find 4 capitalized alpha chars followed by 3 num chars, then find out if the next 2 chars are capitalized. (this accounts for the course #, but also excludes the possibility of tripping up during where it might say "prerequisite" as in the first entry). After this, the regex finds the first line break and gets everything after it until it finds the next course #. The 3 fields would be a course number, a course title, and a course description. The course number and title are on the same line always and the description is everything beneath.
Sample end result would contain 3 fields which I'm guessing could be stored into 3 arrays:
"ARTA215","ADVANCED LIFE DRAWING (3 Cr) (2:2) + Studio 1 hr.","This advanced study in drawing with the life .... Prerequisite: ARTA150 Lab Fee Required"
Like I said, it's quite a nightmare, but I want to automate this instead of cleaning up after someone each time the file is generated.