First of all, in that example data you listed, it appears that there aren't any lines which contain "CLOWN" and "112". I'm going to assume for the rest of this answer that the course number you are interested in is "110".
This line appears to be your problem:
line.scan~(/department/&&/classnumber/)
A useful debugging tool is to try and reduce your problem to a small test case. In Ruby and other scripting languages, it can be helpful to play with that test case in an interactive shell like irb
. Let's try that in irb
, with some mockup data so our variables are defined:
>> department = "CLOWN"
=> "CLOWN"
>> classnumber = "110"
=> "110"
>> line = "342 1936 CLOWN 110 ON HD CLOWN MAKE-CLASS 5.0 5.0 KRUSTY 798 MTWTh 7:30A 8:30A 24 13 11 4.3"
=> "342 1936 CLOWN 110 ON HD CLOWN MAKE-CLASS 5.0 5.0 KRUSTY 798 MTWTh 7:30A 8:30A 24 13 11 4.3"
>> line.scan~(/department/&&/classnumber/)
TypeError: wrong argument type nil (expected Regexp)
from (irb):4:in `scan'
from (irb):4
from :0
OK, so there are a few problems. The first is that scan~
is not valid syntax; the method is just scan
:
>> line.scan(/department/&&/classnumber/)
=> []
Hmm. Not an error this time, but still no result. Lets see what the components of that are doing. What we're doing in this line is computing /department/&&/classnumber/
, and then passing the result of that to the scan
method on the line
string.
>> /department/&&/classnumber/
=> /classnumber/
Interesting. That just gives us the second regular expression that we passed in. Why is that? Well, the &&
operator takes two expressions. It computes the first expression. If that is false, it returns false. If it is true, it computes the second expression. If that is false, it returns false. If that is true, it returns the second expression. Now, every value in ruby except for false
and nil
is treated as if it were true. So, since these two regular expression are not false
or nil
, they are both treated as true, and the result of this expression is the second component, /classnumber/
.
But even given that the first regular expression is being ignored, and only the second is being used, why doesn't this work?
>> line.scan(/classnumber/)
=> []
When you write the regular expression /classnumber/
, you are looking for the literal characters classnumber
in your string. For instance:
>> "string containing classnumber".scan(/classnumber/)
=> ["classnumber"]
What you want to be looking for, however, is the value of the variable classnumber
. There are a couple of ways to go about this. You could just pass that string in to scan
:
>> line.scan(classnumber)
=> ["110"]
Or, you can build a regular expression by interpolating your classnumber
variable into it:
>> line.scan(/#{classnumber}/)
=> ["110"]
Now, you have something working. But you still want to match against the department too. How can you combine the two? You could just interpolate them into the same regexp:
>> line.scan(/#{department} #{classnumber}/)
=> ["CLOWN 110"]
Note that I add a space in the middle to match the space between department and course number in the input. Depending on your data format, you may want this to be /#{department} +#{classnumber}/
to indicate “one or more spaces,” or /#{department}.*#{classnumber}/
to indicate “any number of any character;” you'll have to make that call yourself.
Oh, and if you want to be getting the whole line, you're going to need to add something to match the text before and after the department and class number:
>> line.scan(/.*#{department} #{classnumber}.*/)
=> ["342 1936 CLOWN 110 ON HD CLOWN MAKE-CLASS 5.0 5.0 KRUSTY 798 MTWTh 7:30A 8:30A 24 13 11 4.3"]
Anyhow, I think that's about it. You can now match against the department and class number that have been input; and if you followed the steps I used to deconstruct your problem, you might be able to use a similar technique to isolate and solve problems in the future.