tags:

views:

335

answers:

4

I'm trying to clean up something for the hell of it, and looking for some better ways of going about it. The idea is that instead of using regular expressions in my rules for parsing a string, I'd like to use something closer to the routes syntax "something/:searchitem/somethingelse", then given a string like "/something/FOUNDIT/somethingelse" you'd get the result "FOUNDIT".

Here's the example I'm refactoring: Given an input string, say "http://claimid.com/myusername". I want to be able to run this string against a number of possible matches and then return the "myusername" one for the one that matches.

The data to run it against might look like this:

PROVIDERS = [
  "http://openid.aol.com/:username",
  "http://:username.myopenid.com",
  "http://claimid.com/:username",
  "http://:username.livejournal.com"]

  something_here("http://claimid.com/myusername") # => "myusername"

Any good way of matching up a string like http://claimid.com/myusername to this list and making sense of the results? Or any techniques to make something like this easier? I was looking through the rails routing code as it does something like this, but that's not the easiest code to follow.


Right now I'm just doing this with regular expressions, but it seems like the above method would be MUCH easier to read

PROVIDERS = [
  /http:\/\/openid.aol.com\/(\w+)/,
  /http:\/\/(\w+).myopenid.com/,
  /http:\/\/(\w+).livejournal.com/,
  /http:\/\/flickr.com\/photos\/(\w+)/,
  /http:\/\/technorati.com\/people\/technorati\/(\w+)/,
  /http:\/\/(\w+).wordpress.com/,
  /http:\/\/(\w+).blogspot.com/,
  /http:\/\/(\w+).pip.verisignlabs.com/,
  /http:\/\/(\w+).myvidoop.com/,
  /http:\/\/(\w+).pip.verisignlabs.com/,
  /http:\/\/claimid.com\/(\w+)/]

url = "http://claimid.com/myusername"
username = PROVIDERS.collect { |provider|
  url[provider, 1]
}.compact.first
+2  A: 

How about String include? or index ?

url.include? "myuserid"

Or do you want some positional thing? If so, then you could split the URL.

Yes a third thought: Using your input form with the :username thing, construct and compile a Regexp for each such string, and use Regexp#match to return a MatchData. If you kept pairs of the Regexp and the index of the :username field, you could do it directly.

Charlie Martin
In this case I can't use regular old include? I have input "http://claimid.com/myusername" and from that I need output "myusername". The problem is the input could be something else like "http://myusername.blogspot.com" and I'd still want output "myusername". Basically finding the username part of an openid URL; but the openid URL could be anything, and might not be found.It sounds like the "third thought" is what I'm doing in the bottom example already as well? It's running through all potential strings, and getting the "username" part of each, clearing nils and returning the first one.
AdamFortuna
+1  A: 

I still think regular expression can be a solution here. However you need to write a code that would create a regexp out of routing-like string. An example code is:

class Router
    def initialize(routing_word)
     @routes = routing_word.scan /:\w+/
     @regex = routing_word
     @regex.gsub!('/','\\/')
     @regex = Regexp.escape(@regex)
     @regex.gsub!(/:\w+/,'(\w+)')
            @regex = '^'+@regex+'$'
     @regex = Regexp.new(@regex)
    end
    def match(url)
     matches = url.match @regex
     ar = matches.to_a[1..-1]
     h = {}
     @routes.zip(ar).each {|k,v| h[k] = v}
     return h
    end
end

r = Router.new('|:as|:sa')
puts r.match('|a|b').map {|k,v| "#{k} => #{v}\n"}

Use a router for each routing string. It should return a nice hash tables that matches URL colon-strings to actual URL components.

In order to recognize the given URL one should go through all the Routers, and find out which one accepts the given URL.

class OpenIDRoutes
 def initialize()
  routes = [
     "http://openid.aol.com/:username/",
     "http://:username.myopenid.com/",
     "http://:username.livejournal.com/",
     "http://flickr.com/photos/:username/",
     "http://technorati.com/people/technorati/:username/",
     "http://:username.wordpress.com/",
     "http://:username.blogspot.com/",
     "http://:username.pip.verisignlabs.com/",
     "http://:username.myvidoop.com/",
     "http://:username.pip.verisignlabs.com/",
     "http://claimid.com/:username/"
  ].map {|x| Router.new x}
 end

 #given a URL find out which route does it fit
 def route(url)
  for r in routes
   res = r.match url
   if res then return res
   end
 end

r = OpenIDRoutes.new
puts r.route("http://claimid.com/myusername")

I think that's a nice and easy implementation of most of rails routing.

Elazar Leibovich
+4  A: 

I think your best bet is to generate the regular expressions, as Elazar suggested previously. If you're just matching one field (:username) then something like this would work:

PROVIDERS = [
   "http://openid.aol.com/:username/",
   "http://:username.myopenid.com/",
   "http://:username.livejournal.com/",
   "http://flickr.com/photos/:username/",
   "http://technorati.com/people/technorati/:username/",
   "http://:username.wordpress.com/",
   "http://:username.blogspot.com/",
   "http://:username.pip.verisignlabs.com/",
   "http://:username.myvidoop.com/",
   "http://:username.pip.verisignlabs.com/",
   "http://claimid.com/:username/"
]

MATCHERS = PROVIDERS.collect do |provider|
  parts = provider.split(":username")
  Regexp.new(Regexp.escape(parts[0]) + '(.*)' + Regexp.escape(parts[1] || ""))
end

def extract_username(url)
  MATCHERS.collect {|rx| url[rx, 1]}.compact.first
end

It's very similar to your own code, only the list of providers is much cleaner, making it easier to maintain and add new providers as required.

tomafro
Awesome, works great! Lot less code and easier to read.
AdamFortuna
+1  A: 

It's a little URI specific, but the standard library has URI.split():

require 'uri'

URI.split("http://claimid.com/myusername")[5] # => "/myusername"

Might be able to use that somehow.

C.J.

CJ