I'd like to create a (PCRE) regular expression to match all commonly used numbered lists, and I'd like to share my thoughts and gather input on way to do this.
I've defined 'lists' as the set of canonical Anglo-Saxon conventions, i.e.
Numbers
1 2 3
1. 2. 3.
1) 2) 3)
(1) (2) (3)
1.1 1.2 1.2.1
1.1. 1.2. 1.3.
1.1) 1.2) 1.3)
(1.1) (1.2) (1.3)
Letters
a b c
a. b. c.
a) b) c)
(a) (b) (c)
A B C
A. B. C.
A) B) C)
(A) (B) (C)
Roman numerals
i ii iii
i. ii. iii.
i) ii) iii)
(i) (ii) (iii)
I II III
i. ii. iii.
i) ii) iii)
(i) (ii) (iii)
I'd like to know how strong a set of list this is, and if there are other numbering conventions that should be in there, and if any of these ought to be removed.
Here's a regular expression I've created to solve this problem (in Python):
numex = r'(?:\d{1,3}'\ # 1, 2, 3
'(?:\.\d{1,3}){0,4}'\ # 1.1, 1.1.1.1
'|[A-Z]{1,2}'\ # A. B. C.
'|[ivxcl]{1,6}' # i, iii, ...
rex = re.compile(r'(\(?%s\)|%s\.?)' % numex, re.I) # re.U?
rex.match("123. Some paragraph")
I'd like to know how adequate this regex is for this problem, and if there are other alternative (regex or otherwise) solutions.
Incidentally, for my particular use-case, I wouldn't expect list numbers of more than 25-50.
Thank you for reading.
Brian