views:

46

answers:

5

Does anyone know the rules for valid Ruby variable names? Can it be matched using a RegEx?

UPDATE: This is what I could come up with so far:

^[_a-z][a-zA-Z0-9_]+$

Does this seem right?

+1  A: 

Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.

/^[a-z_][a-zA-Z_0-9]*$/
AboutRuby
You forgot about ? and !
DigitalRoss
@DigitalRoss: Local variables cannot contain `?` or `!` to my knowledge.
Chuck
The `+` should be a `*`, because `a` and `_` are valid variable names.
Chuck
Yup: `!` and `?` are valid in method names but not in local vars AFAIK.
kolrie
@Chuck: correct - `+` should be `*`. I am adding that to my regex. Thanks!
kolrie
@Chuck: Good point, I'll fix that.
AboutRuby
Aren't unicode letters allowed in variable names?
Andrew Grimm
I don't think they are in Ruby, but I haven't tried that either.
AboutRuby
Here it is: [Fun with Unicode](http://www.oreillynet.com/ruby/blog/2007/10/fun_with_unicode_1.html), though it uses examples of method names.
Andrew Grimm
I see. It can if you pass -Ku to switch to unicode encoding.
AboutRuby
+1  A: 

I think /^(\$){0,1}[_a-zA-Z][a-zA-Z0-9_]*([?!]){0,1}$/ is a bit closer to what you will need...

It depends on whether you want to match method names as well.

If you are trying to match a name that might be encountered in an expression, then it might start with $ and it might end with ? or !. If you know for sure that it is just a local variable then the rule will be much simpler.

DigitalRoss
Local variables cannot begin with $, and matching 0 or 1 $'s is easier said as \$?. The same with [?!]? instead of ([?!]){0,1}, and variable names can't end in them anyway.
AboutRuby
+1  A: 

According to http://rubylearning.com/satishtalim/ruby_names.html a Ruby variable consists of:

A name is an uppercase letter, lowercase letter, or an underscore ("_"), followed by Name characters (this is any combination of upper- and lowercase letters, underscore and digits).

In addition, global variables begin with a dollar sign, instance variables with a single at-sign, and class variables with two at-signs.

A regular expression to match all that would be:

%r{
  (\$|@{1,2})?  # optional leading punctuation
  [A-Za-z_]     # at least one upper case, lower case, or underscore
  [A-Za-z0-9_]* # optional characters (including digits)
}x

Hope that helps.

rjk
Thanks a lot for this. I am accepting the first one, because I am looking for local variables only (no names, ivars or classvars).
kolrie
+1  A: 
Jörg W Mittag
+1  A: 

It's possible for variable names to be unicode letters, in which case most of the existing regexes don't match.

varname = "\u2211" # => "∑" 
eval(varname + '= "Tony the Pony"') => "Tony the Pony"
puts varname # => ∑
local_variable_identifier = /Insert large regular expression here/
varname =~ local_variable_identifier # => nil

See also "Fun with Unicode" in either the Ruby 1.9 Pickaxe or at Fun with Unicode.

Andrew Grimm
Thanks for bringing this to my attention. However my goal is to make sure an user input matches a variable name, so I am fine with a safe subset of the actual naming rules.
kolrie