views:

266

answers:

3

I'm interested in building a DSL in Ruby for use in parsing microblog updates. Specifically, I thought that I could translate text into a Ruby string in the same way as the Rails gem allows "4.days.ago". I already have regex code that will translate the text

@USER_A: give X points to @USER_B for accomplishing some task
@USER_B: take Y points from @USER_A for not giving me enough points

into something like

Scorekeeper.new.give(x).to("USER_B").for("accomplishing some task").giver("USER_A")
Scorekeeper.new.take(x).from("USER_A").for("not giving me enough points").giver("USER_B")

It's acceptable to me to formalize the syntax of the updates so that only standardized text is provided and parsed, allowing me to smartly process updates. Thus, it seems it's more a question of how to implement the DSL class. I have the following stub class (removed all error checking and replaced some with comments to minimize paste):

class Scorekeeper

  attr_accessor :score, :user, :reason, :sender

  def give(num)
    # Can 'give 4' or can 'give a -5'; ensure 'to' called
    self.score = num
    self
  end

  def take(num)
    # ensure negative and 'from' called
    self.score = num < 0 ? num : num * -1
    self
  end

  def plus
    self.score > 0
  end

  def to (str)
    self.user = str
    self
  end

  def from(str)
    self.user = str
    self
  end

  def for(str)
    self.reason = str
    self
  end

  def giver(str)
    self.sender = str
    self
  end

  def command
    str = plus ? "giving @#{user} #{score} points" : "taking #{score * -1} points from @#{user}"
    "@#{sender} is #{str} for #{reason}"
  end

end

Running the following commands:

t = eval('Scorekeeper.new.take(4).from("USER_A").for("not giving me enough points").giver("USER_B")')
p t.command
p t.inspect

Yields the expected results:

"@USER_B is taking 4 points from @USER_A for not giving me enough points"
"#<Scorekeeper:0x100152010 @reason=\"not giving me enough points\", @user=\"USER_A\", @score=4, @sender=\"USER_B\">"

So my question is mainly, am I doing anything to shoot myself in the foot by building upon this implementation? Does anyone have any examples for improvement in the DSL class itself or any warnings for me?

BTW, to get the eval string, I'm mostly using sub/gsub and regex, I figured that's the easiest way, but I could be wrong.

+5  A: 

Am I understanding you correctly: you want to take a string from a user and cause it to trigger some behavior?

Based on the two examples you listed, you probably can get by with using regular expressions.

For example, to parse this example:

@USER_A: give X points to @USER_B for accomplishing some task

With Ruby:

input = "@abe: give 2 points to @bob for writing clean code"
PATTERN = /^@(.+?): give ([0-9]+) points to @(.+?) for (.+?)$/
input =~ PATTERN
user_a = $~[1] # => "abe"
x      = $~[2] # => "2"
user_b = $~[3] # => "bob"
why    = $~[4] # => "writing clean code"

But if there is more complexity, at some point you might find it easier and more maintainable to use a real parser. If you want a parser that works well with Ruby, I recommend Treetop: http://treetop.rubyforge.org/

The idea of taking a string and converting it to code to be evaled makes me nervous. Using eval is a big risk and should be avoided if possible. There are other ways to accomplish your goal. I'll be happy to give some ideas if you want.

A question about the DSL you suggest: are you going to use it natively in another part of your application? Or do just plan on using it as part of the process to convert the string into the behavior you want? I'm not sure what is best without knowing more, but you may not need the DSL if you are just parsing the strings.

David James
I'm really just trying to come up with a robust method to smartly parse a string such as "Scorekeeper, give Y points to @USER_A and take X points from @USER_B for being a jerk." From this, string, I need to pull out "+Y->USER_A" and "-X->USER_B for 'being a jerk'" Using Regex was becoming unweildy (i.e. I'd give USER_A X points b/c I parsed for "user + points" before "points + user"-- I'm not a regex guru, obviously). I just thought exploring the DSL option might be wise, but since I'm not actually using it elsewhere, perhaps robust regex is a better option. Would love some ideas. Thanks.
JohnMetta
I just posted an example regex in the answer, above. (I tried pasting it here but the formatting was not very nice.)
David James
@David Thanks! That's a much, much cleaner bit of regex than I have. A question I was trying to answer by creating a DSL is how to smartly parse through that and the last comment example. For instance, allowing points to come after user "give @bob 4 points" should be valid too, "for" is optional, etc. But what I have now matches smaller bits and matching full statements as you've written might allow me to match multiple possible pattern better without, say, scoring the wrong individual the wrong number of points. Thanks.
JohnMetta
@mettadore I would recommend creating an array of regex patterns and try matching each. (Don't forget to write tests otherwise this could get really ugly.)Hopefully this will get you going quickly. If you find that the number of regex patterns grows beyond say, 5 or 10, then I would definitely give Treetop a try -- thinking about your problem like a parser does will make it more robust. Thinking in regexs, on the other hand, is more of a quick and dirty solution.In any case, I don't think you need a DSL for this. Just have a good OO design and you'll be fine.
David James
+1  A: 

This echoes some of my thoughts on a tangental project (an old-style text MOO).

I'm not convinced that a compiler-style parser is going to be the best way for the program to deal with the vaguaries of english text. My current thoughts have me splitting up the understanding of english into seperate objects -- so a box understands "open box" but not "press button", etc. -- and then having the objects use some sort of DSL to call centralised code that actually makes things happen.

I'm not sure that you've got to the point where you understand how the DSL is actually going to help you. Maybe you need to look at how the english text gets turned into DSL, first. I'm not saying that you don't need a DSL; you might very well be right.

As for hints as to how to do that? Well, I think if I were you I would be looking for specific verbs. Each verb would "know" what sort of thing it should expect from the text around it. So in your example "to" and "from" would expect a user immediately following.

This isn't especially divergent from the code you've posted here, IMO.

You might get some milage out of looking at the answers to my question. One commenter pointed me to the Interpreter Pattern, which I found especially enlightening: there's a nice Ruby example here.

Shadowfirebird
Geez. As soon as I read "interpreter pattern" I thought "Well, duh!" Now I feel just silly. Not really sure why I didn't explore that more simple solution in the first place. I'm even reading "Design Patterns in Ruby" at the moment. How funny. That's probably the route I should take if plain regex is not robust enough.
JohnMetta
Makes you smarter than me -- I'd never heard of it.
Shadowfirebird
A: 

Building on @David_James' answer, I've come up with a regex-only solution to this since I'm not actually using the DSL anywhere else to build scores and am merely parsing out points to users. I've got two patterns that I'll use to search:

SEARCH_STRING = "@Scorekeeper give a healthy 4 to the great @USER_A for doing something 
really cool.Then give the friendly @USER_B a healthy five points for working on this. 
Then take seven points from the jerk @USER_C."

PATTERN_A = /\b(give|take)[\s\w]*([+-]?[0-9]|one|two|three|four|five|six|seven|eight|nine|ten)[\s\w]*\b(to|from)[\s\w]*@([a-zA-Z0-9_]*)\b/i

PATTERN_B = /\bgive[\s\w]*@([a-zA-Z0-9_]*)\b[\s\w]*([+-]?[0-9]|one|two|three|four|five|six|seven|eight|nine|ten)/i

SEARCH_STRING.scan(PATTERN_A) # => [["give", "4", "to", "USER_A"],
                              #     ["take", "seven", "from", "USER_C"]]
SEARCH_STRING.scan(PATTERN_B) # => [["USER_B", "five"]]

The regex might be cleaned up a bit, but this allows me to have syntax that allows a few fun adjectives while still pulling the core information using both "name->points" and "points->name" syntaxes. It does not allow me to grab the reason, but that's so complex that for now I'm going to just store the entire update, since the whole update will be related to the context of each score anyway in all but outlier cases. Getting the "giver" username can be done elsewhere as well.

I've written up a description of these expressions as well, in hopes that other people might find that useful (and so that I can go back to it and remember what that long string of gobbledygook means :)

JohnMetta