tags:

views:

81

answers:

4

I've got a Java application that, among other things, goes out to our Active Directory server every hour, and pulls down a list of all the accounts, and dumps them in a database; this work is done via a thread that's spawned new every hour, and the database interfacing is done via Hibernate. The thread's run method (essentially the only thing this thread does) looks like so:

public void run() {
    try {
        Thread.sleep(3600000); //we run once an hour, so we sleep for an hour
        Thread newHourlyRunThread = new Thread(new HourlyRunThread());
        newHourlyRunThread.start();
        LDAPNewUsersReport report = new LDAPNewUsersReport();
        Calendar calendar = Calendar.getInstance();
        calendar.set(0, 0, 0, 0, 0); //We tell the report to look for everything from 12AM Jan 1 0 AD, which should be sufficient to find all created AD objects.
        report.runReport(calendar.getTime(), new Date());
        HashSet<LDAPEntry> allEntries = report.getAllEntries();
        Iterator it = allEntries.iterator();
        while (it.hasNext()) {
            ContactParser.parseContact((LDAPEntry) it.next());
        }
}

The relevant methods from ContactParser are below:

public static void parseContact(LDAPEntry entry) {
    Contact chosenContact = null;
    Session session = HibernateUtil.getSessionFactory().getCurrentSession();
    session.beginTransaction();

    List contacts = session.getNamedQuery("ContactByCanonicalName").setString(0, entry.getDN()).list();
    Iterator it = contacts.iterator();
    if (it.hasNext()) {
        chosenContact = (Contact) it.next();
        chosenContact = ContactParser.fillContactFields(chosenContact, entry);
    } else {
        chosenContact = ContactParser.fillContactFields(new Contact(), entry);
    }
    session.saveOrUpdate(chosenContact);
    session.getTransaction().commit();
}

private static Contact fillContactFields(Contact chosenContact, LDAPEntry entry) {
    chosenContact.setCanonicalName(entry.getDN());
    chosenContact.setFirstName(ContactParser.getEntryField(entry, "givenName"));
    chosenContact.setLastName(ContactParser.getEntryField(entry, "sn"));
    chosenContact.setUserName(ContactParser.getEntryField(entry, "sAMAccountname"));
    chosenContact.setEmployeeID(ContactParser.getEntryField(entry, "employeeID"));
    chosenContact.setMiddleName(ContactParser.getEntryField(entry, "initials"));
    chosenContact.setEmail(ContactParser.getEntryField(entry, "mail"));
    if(chosenContact.getFirstSeen() == null){
        chosenContact.setFirstSeen(new Date());
    }
    chosenContact.setLastSeen(new Date());
    return chosenContact;
}

private static String getEntryField(LDAPEntry entry, String fieldName){
    String returnString = "";
    if(entry.getAttribute(fieldName) != null){
        returnString = entry.getAttribute(fieldName).getStringValue();
    }
    return returnString;
}

This all works very nicely if we're only running a single instance (so, no new threads are spawned after the fact), but if we run this thread more than once (IE, I speed up execution to ~30 seconds so that I can see issues), Hibernate is reporting a lack of Heap space. This doesn't seem like it's a particularly intense set of data (only about 6K entries), but I'm seeing that same error when we bump the code over to the staging error to prepare to push to production. I'm inexperienced when it comes to writing effective threads, and very inexperienced when it comes to Hibernate, so if anyone has an idea what might be exhausting our Heap space (the other major thread in this application isn't running at the same time, and takes up a few hundred kilobytes of memory total) from looking at the code, I'd greatly appreciate any suggestions.

Thanks in advance.

+1  A: 

Memory Analyzer is a free open source powerfull Java heap analyzer. I already used it several time to identify source of memory leaks. With this tool you will be able to quickly see if hibernate is the one to punish ;-)

Manuel Selva
@Manuel Selva: Thanks, I'll give that a shot.
EricBoersma
+3  A: 

You can re-write this using a ScheduledExecutorService, I suspect part of the problem is that you are creating lots of HourlyRunThread objects, when you only need one.

For example this test illustrates how to schedule a thread to run every second for 10 seconds

@Test(expected = TimeoutException.class)
public void testScheduledExecutorService() throws InterruptedException, ExecutionException, TimeoutException {
    final AtomicInteger id = new AtomicInteger();
    final ScheduledExecutorService service = Executors.newScheduledThreadPool(1);
    service.scheduleAtFixedRate(new Runnable() {
        public void run() {
            System.out.println("Thread" + id.incrementAndGet());
        }
    }, 1, 1, TimeUnit.SECONDS).get(10, TimeUnit.SECONDS);
}

This gives output you'd expect when running, where as this test creates almost 10k threads in its 10 second runtime

private static final class HourlyRunThread extends Thread {
    private static final AtomicInteger id = new AtomicInteger();
    private final int seconds;

    private HourlyRunThread(final int seconds) {
        super("Thread" + id.incrementAndGet());
        this.seconds = seconds;
    }

    public void run() {
        try {
            Thread.sleep(seconds);
            if (seconds < 10) {
                Thread newHourlyRunThread = new Thread(new HourlyRunThread(seconds));
                newHourlyRunThread.start();
            }
            // do stuff
            System.out.println(getName());
        } catch (InterruptedException e) {
        }
    }
}

@Test
public void testThreading() {
    final Thread t = new HourlyRunThread(1);
    t.start();
}
Jon Freedman
+2  A: 

It looks like you are doing batch insertions or updates, in which case you should be periodically flushing and clearing the Hibernate Session so that the Session-level cache does not fill up with more space than you have allocated.

See the chapter in the Hibernate manual about Batch Processing for advice on how to accomplish this.

In addition I'd strongly suggest finding another way to launch your tasks on a scheduled timeframe, either using a ScheduledExecutorService as suggested by Jon Freedman or using a library such as Quartz Scheduler. Sleeping the thread for 3600000 milliseconds before launching the actual thread to do the work seems like it'd be a highly problematic (and non-deterministic) way to handle this.

matt b
@matt b @jon freedman: Thanks for the suggestions on the thread launching. I had a feeling that this wasn't the best way to do it, and I'll look into moving that in a better direction than I currently am. I also really appreciate the information about batch processing for Hibernate, I'll see if that makes a difference.
EricBoersma
I'd also suggest looking into doing all of the work of the "task" running once an hour within a single transaction, rather than one transaction for each update - this may help the overall execution time, and also might be more semantically correct (that is, all of the job succeeds at once or none of it does). This depends on your desired level of isolation/atomicity though.
matt b
A: 

Thanks for the suggestions everyone, but as it turned out, the error we were receiving was actually caused by a configuration error between local testing and staging -- the database was new and the permissions weren't configured correctly to allow the staging area to speak to the created database. When run with the correct permissions, it works like a charm.

I will definitely look into setting up batch settings for Hibernate, and moving to a thread scheduler instead of my current hacked together system.

EricBoersma