tags:

views:

46

answers:

2

This is a Ruby design problem. How can I make a reusable flat file parser that can perform different data scrubbing operations per call, return the emitted results from each scrubbing operation to the caller and perform bulk SQL insertions?

Now, before anyone gets narky/concerned, I have written this code already in a very unDRY fashion. Which is why I am asking any Ruby rockstars our there for some assitance.

Basically, everytime I want to perform this logic, I create two nested loops, with custom processing in between, buffer each processed line to an array, and output to the DB as a bulk insert when the buffer size limit is reached.

Although I have written lots of helpers, the main pattern is being copy pasted everytime. Not very DRY!

Here is a Ruby/Pseudo code example of what I am repeating.

lines_from_file.each do |line|

   line.match(/some regex/).each do |sub_str|
     # Process substring into useful format
       # EG1: Simple gsub() call
       # EG2: Custom function call to do complex scrubbing
       #      and matching, emitting results to array
       # EG3: Loop to match opening/closing/nested brackets 
       #      or other delimiters and emit results to array 
   end

   # Add processed lines to a buffer as SQL insert statement
   @buffer << PREPARED INSERT STATEMENT

   # Flush buffer when "buffer size limit reached" or "end of file"
   if sql_buffer_full || last_line_reached
     @dbc.insert(SQL INSERTS FROM BUFFER)
     @buffer = nil
   end

end

I am familiar with Proc/Lambda functions. However, because I want to pass two separate procs to the one function, I am not sure how to proceed. I have some idea about how to solve this, but I would really like to see what the real Rubyists suggest?

Over to you. Thanks in advance :D

+2  A: 

You can pass any number of proc objects into a method, but they become normal parameters, and lose the special block syntax. E.g.

def import(lines_from_file,insert_statment_maker,special_logic) 
  lines_from_file.each do |line|    
  inserts = []    

  line.match(/some regex/).each do |sub_str|
    inserts << insert_statment_maker.call(sub_str)    
  end

  inserts = special_logic.call(inserts) if special_logics

  # Add processed lines to a buffer as SQL insert statement    
  @buffer << inserts

  # Flush buffer when "buffer size limit reached" or "end of file"    
  if sql_buffer_full || last_line_reached
    @dbc.insert(SQL INSERTS FROM BUFFER)
    @buffer = nil    
  end

end

To call you would...

build_m_and_m_insert = Proc.new {|sub_str| ..... }
take_out_the_brown_ones = Proc.new {|inserts| .... }

import lines, build_m_and_m_insert, take_out_the_brown_ones
jeem
+1  A: 

While jeem's solution is probably perfect for you, you can even go a bit further and create two classes instead of the Procs, a reader and a writer:

class InsertReader
  def initialize(lines)
    @lines = lines
  end

  def each
    @lines.each do |line|
      yield(make_insert(line))
    end
  end

  def make_insert line
    # return INSERT created for input line
  end
end

class InsertWriter
  def initialize
    @buff = []
    @buffsize = 100
  end

  def write insert
    @buff << insert
    flush if @buff.length > @buffsize
  end

  def flush
    @buff.each do |insert|
      DbAdapter.insert(insert)
    end
    @buff = []
  end
end

and you use them like

reader = InsertReader.new(my_file)
writer = InsertWriter.new
reader.each do |insert|
  writer.write insert
end
writer.flush

And then inherit and reimplement appropriate methods for each specific case you need to cover.

Mladen Jablanović
Thanks to Jeem and Mladen. Both of you get up votes for a straight forward explanation. Design-wise, I preferred Mladen's approach, since this fits the needs of my project better.Nice work Rubyistas! :D
crunchyt