This finds all the dates in your example sentence:
for match in re.finditer(
r"""(?ix) # case-insensitive, verbose regex
\b # match a word boundary
(?: # match the following three times:
(?: # either
\d+ # a number,
(?:\.|st|nd|rd|th)* # followed by a dot, st, nd, rd, or th (optional)
| # or a month name
(?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*)
)
[\s./-]* # followed by a date separator or whitespace (optional)
){3} # do this three times
\b # and end at a word boundary.""",
subject):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
It's definitely not perfect and liable to miss some dates (especially if they are not in English - 21. Mai 2006
would fail, as well as 4ème décembre 1999
), and to match nonsense like August Augst Aug
, but since nearly everything is optional in your examples, there is not much you can do at the regex level.
The next step would be to feed all the matches into a parser and see if it can parse them into a sensible date.
The regex can't interpret context correctly. Imagine a (stupid) text like You'll find it in box 21. August 3rd will be the shipping date.
It will match 21. August 3rd
which of course can't be parsed.