My database contains URLs stored as text fields and each URL contains a representation of the date of a report, which is missing from the report itself.
So I need to parse the date from the URL field to a String representation such as:
2010-10-12
2007-01-03
2008-02-07
What's the best way to extract the dates?
Some are in this format:
http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html
http://e.com/data/invoices/2010/09/invoices-report-thursday-september-2-2010.html
http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html
http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html
http://e.com/data/invoices/2010/08/invoices-report-monday-august-30th-2010.html
http://e.com/data/invoices/2009/05/invoices-report-friday-may-8th-2009.html
http://e.com/data/invoices/2010/10/invoices-report-wednesday-october-6th-2010.html
http://e.com/data/invoices/2010/09/invoices-report-tuesday-september-21-2010.html
Note the inconsistent use of th
following the day of the month in cases such as these two:
http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html
http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html
Others are in this format (with three hyphens before the date starts, no year at the end and an optional use of invoices-
before report
):
http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-1.html
http://e.com/data/invoices/2010/09/invoices-report---thursday-september-2.html
http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-15.html
http://e.com/data/invoices/2010/09/invoices-report---monday-september-13.html
http://e.com/data/invoices/2010/08/report---monday-august-30.html
http://e.com/data/invoices/2009/05/report---friday-may-8.html
http://e.com/data/invoices/2010/10/report---wednesday-october-6.html
http://e.com/data/invoices/2010/09/report---tuesday-september-21.html