I have a text file that I extracted from a PDF file. It's arranged in a tabular format; this is part of it:
DATE SESS PROF1 PROF2 COURSE SEC GRADE COUNT
2007/09 1 RODRIGUEZ TANIA DACSB 06500 001 A 3
2007/09 1 RODRIGUEZ TANIA DACSB 06500 001 A- 2
2007/09 1 RODRIGUEZ TANIA DACSB 06500 001 B 4
2007/09 1 RODRIGUEZ TANIA DACSB 06500 001 B+ 2
2007/09 1 RODRIGUEZ TANIA DACSB 06500 001 B- 1
2007/09 1 RODRIGUEZ TANIA DACSB 06500 001 WU 1
2007/09 1 NOOB ADRIENNE JOSH ROGER DBIOM 10000 125 C+ 1
2007/09 1 NOOB ADRIENNE JOSH ROGER DBIOM 10000 125 C+ 1
2007/09 1 FUENTES TANIA DACSB 06500 002 A 3
2007/09 1 FUENTES TANIA DACSB 06500 002 A- 8
2007/09 1 FUENTES ALEXA DACSB 06500 002 B 5
2007/09 1 FUENTES ALEXA DACSB 06500 002 B+ 3
2007/09 1 FUENTES ALEXA DACSB 06500 002 B- 1
2007/09 1 FUENTES ALEXA DACSB 06500 002 C 1
2007/09 1 FUENTES ALEXA DACSB 06500 002 C+ 1
2007/09 1 LIGGINS FREDER DACSB 06500 003 A 1
Where the first line is the columns names, and the rest of the lines are the data.
there are 8 columns which I want to get, at first it seemed very easy by splitting with split(/\s+/, ...)
for each line I read, but then,I noticed that in some lines there are additional spaces, for example:
2007/09 1 NOOB ADRIENNE JOSH ROGER DBIOM 10000 125 C+ 1
Sometimes the data for a certain column is optional as you can see it.