Hi guys - I'm kinda a newbie at regular expressions, so would appreciate a bit of peer feedback on this one. It will be heavily used on my site, so any weird edge cases can totally wreak havoc. The idea is to type in an amount of an ingredient in a recipe in whole units or fractions. Due to my autocomplete mechanism, just a number is valid too (since it'll pop up a dropdown). These lines are valid:
1
1/2
1 1/2
4 cups
4 1/2 cups
10 3/4 cups sliced
The numeric part of the line should be its own group so I can parse that with my fraction parser. Everything after the numeric part should be a second group. At first, I tried this:
^\s*(\d+|\d+\/\d+|\d+\s*\d+\/\d+)\s*(.*)$
This almost works, but "1 1/2 cups" will get parsed as (1) (1/2 cups) instead of (1 1/2) and (cups). After scratching my head a bit, I determined this was because of the ordering of my "OR" clause. (1) satisfies the \d+ and (.*) satisfies the rest. So I changed this to:
^\s*(\d+\/\d+|\d+\s*\d+\/\d+|\d+)\s*([a-z].*)$
This almost works, but allows weirdness such as "1 1/2/4 cups" or "1/2 3 cups". So I decided to enforce a letter as the first character after a valid numeric expression:
^\s*(\d+\/\d+|\d+\s*\d+\/\d+|\d+)\s*($|[a-z].*)$
Note I'm running this in case-insensitive mode. Here's my questions:
Can the expression be improved? I kinda don't like the "OR" list for number, fraction, compound fraction but I couldn't think of a way to allow whole numbers, fractions, or compound fractions.
It would be extra nice if I could return a group for each word after the numeric component. Such as a group for (10 3/4), a group for (cups) and a group for (sliced). There can be any number of words after. Is this possible?
Thanks!