I am looking to extract specific items out of a large pool of unstructured documents. These documents could be 1-5 pages of text formatted in various ways by the user, but in most cases would contain at least:
- Name
- Address (physical)
- Email Address
- Phone number
- website URL
I'm looking for a semantic parser that can attempt to extract these elements from the documents so that I can load that information into a relational database and work with these records as contacts.
Other services I've looked for, while valuable for other purposes, do not address this specific need.
Any thoughts, suggestions or leads?