tags:

views:

39

answers:

2

I'm sort of new to regexs with Ruby, (or I suppose regex in general), but I was wondering if there was a pragmatic way to match a string using an array?

Let me explain, say I have a list of ingredients in this case:

1 1/3 cups all-purpose flour
2 teaspoons ground cinnamon
8 ounces shredded mozzarella cheese

Ultimately I need to split the ingredients into its respective "quantity and measurement" and "ingredient name", so like in the case of 2 teaspoons ground cinnamon, will be split into "8 ounces, and shredded mozzarella cheese.

So Instead of having a hugely long regex like: (cup\w*|teaspoon\w*ounce\w* ....... ), how can I use an array to hold those values outside the regex?


update

I did this (thanks cwninja):

  # I think the all units should be just singular, then 
  # use ruby function to pluralize them.

units = [
  'tablespoon',
  'teaspoon',
  'cup',
  'can',
  'quart',
  'gallon',
  'pinch',
  'pound',
  'pint',
  'fluid ounce',
  'ounce'
  # ... shortened for brevity
]

joined_units = (units.collect{|u| u.pluralize} + units).join('|')

# There are actually many ingredients, so this is actually an iterator
# but for example sake we are going to just show one.
ingredient = "1 (10 ounce) can diced tomatoes and green chilies, undrained"

ingredient.split(/([\d\/\.\s]+(\([^)]+\))?)\s(#{joined_units})?\s?(.*)/i)

This gives me close to what I want, so I think this is the direction I want to go.

puts "measurement: #{arr[1]}"
puts "unit: #{arr[-2] if arr.size > 3}"
puts "title: #{arr[-1].strip}"
+1  A: 

For an array a, something like this should work:

a.each { |line|
    parts = /^([\d\s\.\/]+)\s+(\w+)\s+(.*)$/.match(line)
    # Do something with parts[1 .. 3]
}

For example:

a = [
    '1 1/3 cups all-purpose flour',
    '2 teaspoons ground cinnamon',
    '8 ounces shredded mozzarella cheese',
    '1.5 liters brandy',
]
puts "amount\tunits\tingredient"
a.each { |line|
        parts = /^([\d\s\.\/]+)\s+(\w+)\s+(.*)$/.match(line)
        puts parts[1 .. 3].join("\t")
}
mu is too short
+ 1 Thanks for your answer, oddly enough your answer is like right on for the dumb way I described my problem, I don't think I was very clear, but your solution is actually really good for the way I described it.
Joseph Silvashy
+2  A: 

Personaly I'd just build the regexp programatically, you can do:

measurements = [...] MEASUREMENTS_RE = Regexp.new(measurements.join("|"))

… then use the regexp.

As long as you save it and don't keep recreating it, it should be fairly efficient.

cwninja