views:

149

answers:

1

As part of my Rails application, I've written a little importer that sucks in data from our LDAP system and crams it into a User table. Unfortunately, the LDAP-related code leaks huge amounts of memory while iterating over our 32K users, and I haven't been able to figure out how to fix the issue.

The problem seems to be related to the LDAP library in some way, as when I remove the calls to the LDAP stuff, memory usage stabilizes nicely. Further, the objects that are proliferating are Net::BER::BerIdentifiedString and Net::BER::BerIdentifiedArray, both part of the LDAP library.

When I run the import, memory usage eventually peaks at over 1GB. I need to find some way to correct my code if the problem is there, or to work around the LDAP memory issues if that's where the problem lies. (Or if there's a better LDAP library for large imports for Ruby, I'm open to that as well.)

Here's the pertinent bit of our my code:

require 'net/ldap'
require 'pp'

class User < ActiveRecord::Base
  validates_presence_of :name, :login, :email

  # This method is resonsible for populating the User table with the
  # login, name, and email of anybody who might be using the system.
  def self.import_all
    # initialization stuff. set bind_dn, bind_pass, ldap_host, base_dn and filter

    ldap = Net::LDAP.new
    ldap.host = ldap_host
    ldap.auth bind_dn, bind_pass
    ldap.bind

    begin
      # Build the list
      records = records_updated = new_records = 0
      ldap.search(:base => base_dn, :filter => filter ) do |entry|
        name = entry.givenName.to_s.strip + " " + entry.sn.to_s.strip
        login = entry.name.to_s.strip
        email = login + "@txstate.edu"
        user = User.find_or_initialize_by_login :name => name, :login => login, :email => email
        if user.name != name
          user.name = name
          user.save
          logger.info( "Updated: " + email )
          records_updated = records_updated + 1
        elsif user.new_record?
          user.save
          new_records = new_records + 1
        else
          # update timestamp so that we can delete old records later
          user.touch
        end
        records = records + 1
      end

      # delete records that haven't been updated for 7 days
      records_deleted = User.destroy_all( ["updated_at < ?", Date.today - 7 ] ).size

      logger.info( "LDAP Import Complete: " + Time.now.to_s )
      logger.info( "Total Records Processed: " + records.to_s )
      logger.info( "New Records: " + new_records.to_s )
      logger.info( "Updated Records: " + records_updated.to_s ) 
      logger.info( "Deleted Records: " + records_deleted.to_s )

    end

  end
end

Thanks in advance for any help/pointers!

By the way, I did ask about this in the net/ldap support forum as well, but didn't get any useful pointers there.

+6  A: 

One very important thing to note is that you never use the result of the method call. That means that you should pass :return_result => false to ldap.search:

ldap.search(:base => base_dn, :filter => filter, :return_result => false ) do |entry|

From the docs: "When :return_result => false, #search will return only a Boolean, to indicate whether the operation succeeded. This can improve performance with very large result sets, because the library can discard each entry from memory after your block processes it."

In other words, if you don't use this flag, all entries will be stored in memory, even if you do not need them outside the block! So, use this option.

Daniel Abrahamsson
The block returns a set of integers. This is a good pointer but I doubt it's the big deal described.
bb
I rephrased the first sentence to "result of the method call" instead of "result of the block", as that is what is important. But I sincerely do think this will result in a great improvment.
Daniel Abrahamsson
Daniel, you're right. I just tested that with a query which returns ~50000 results. With the :return_result => false the client stays at about 50MB of RAM during processing the result where it goes up to ~600MB without this parameter.
bb
Daniel, you, sir, are the bomb. Thank you for this pointer -- it's brought down the memory usage to entirely reasonable levels. Brilliant! (And makes me feel a bit foolish for not Ring TFM better.) Official answer and an upvote to you!
Sean McMains