I am writing a program that an HTML scraper that when it grabs the HTML from the page, it returns the HTML, and I want to Grab words that are All Capital letters, and then stores these words into a database. My problem right now is I cannot right the algorithm to parse each line of the HTML I got back in order to store the words. This is essentially what format that I am working with. IMPORTANT You will notice that the capital lettered words are always the first ones, so essentially I only need to look at the first letter of each line of HTML, and then decide if the whole word is capital. If it is then I want to add the word to a list, if it isn't then I want to go to the next line...So the it would look like this...
list of names ----> This line should be skipped because first word is not all CAPS
AARON ....
ABRAHAM ....
ANGELA ...
AMY ...
ASHLEY....
AARON through ASHLEY should be added to list because first word is all CAPS
I am able to get the html in the format above, but now I am having a hard time writing the algorithm for getting the first word of each line, and then
does anybody know how to do this without using external parsing and just using loops and lists. Thanks, I appreciate you helping out