tags:

views:

90

answers:

2

I have a class, the outline of which is basically listed below.

import org.apache.commons.math.stat.Frequency;
public class WebUsageLog {
    private Collection<LogLine> logLines;
    private Collection<Date> dates;

    WebUsageLog() {
        this.logLines = new ArrayList<LogLine>();
        this.dates = new ArrayList<Date>();
    }

    SortedMap<Double, String> getFrequencyOfVisitedSites() {
        SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
        Collection<String> domains = new HashSet<String>();
        Frequency freq = new Frequency();
        for (LogLine line : this.logLines) {
            freq.addValue(line.getVisitedDomain());
            domains.add(line.getVisitedDomain());
        }

        for (String domain : domains) {
            frequencyMap.put(freq.getPct(domain), domain);
        }

        return frequencyMap;
    }
}

The intention of this application is to allow our Human Resources folks to be able to view Web Usage Logs we send to them. However, I'm sure that over time, I'd like to be able to offer the option to view not only the frequency of visited sites, but also other members of LogLine (things like the frequency of assigned categories, accessed types [text/html, img/jpeg, etc...] filter verdicts, and so on). Ideally, I'd like to avoid writing individual methods for compilation of data for each of those types, and they could each end up looking nearly identical to the getFrequencyOfVisitedSites() method.

So, my question is twofold: first, can you see anywhere where this method should be improved, from a mechanical standpoint? And secondly, how would you make this method more generic, so that it might be able to handle an arbitrary set of data?

+1  A: 

I'd introduce an abstraction like "data processor" for each computation type, so you can just call individual processors for each line:

...
       void process(Collection<Processor> processors) {
            for (LogLine line : this.logLines) {
                for (Processor processor : processors) {
                    processor.process();
                }
            }

            for (Processor processor : processors) {
                processor.complete();
            }
        }
...

public interface Processor {
   public void process(LogLine line);
   public void complete();
}

public class FrequencyProcessor implements Processor {
    SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
    Collection<String> domains = new HashSet<String>();
    Frequency freq = new Frequency();

    public void process(LogLine line)
        String property = getProperty(line);
        freq.addValue(property);
        domains.add(property);
    }

    protected String getProperty(LogLine line) {
        return line.getVisitedDomain();
    }

    public void complete()
       for (String domain : domains) {
          frequencyMap.put(freq.getPct(domain), domain);
       }
    }
}

You could also change a LogLine API to be more like a Map, i.e. instead of strongly typed line.getVisitedDomain() could use line.get("VisitedDomain"), then you can write a generic FrequencyProcessor for all properties and just pass a property name in its constructor.

Eugene Kuleshov
I was probably a bit unclear in my question. All of the items would be frequency based, it's the question of how to find the frequency of those different items that I'm struggling with.Your answer as written still leaves me writing individual frequency methods for each type contained in LogLine (visitedDomain, Verdict, Category, etc).
EricBoersma
You could just pull out an AbstractProcessor that has an abstract method getProperty(LogLine line). Then where the code is calling line.getVisitedDomain() just replace it with getProperty(line).
Mike Deck
@EricBoersma I've updated suggested refactoring to accommodate your clarifications
Eugene Kuleshov
+2  A: 

This is basically the same thing as Eugene's solution, I just left all the frequency calculation stuff in the original method and use the strategy only for getting the field to work on.

If you don't like enums you could certainly do this with an interface instead.

public class WebUsageLog {
    private Collection<LogLine> logLines;
    private Collection<Date> dates;

    WebUsageLog() {
        this.logLines = new ArrayList<LogLine>();
        this.dates = new ArrayList<Date>();
    }

    SortedMap<Double, String> getFrequency(LineProperty property) {
        SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
        Collection<String> values = new HashSet<String>();
        Frequency freq = new Frequency();
        for (LogLine line : this.logLines) {
            freq.addValue(property.getValue(line));
            values.add(property.getValue(line));
        }

        for (String value : values) {
            frequencyMap.put(freq.getPct(value), value);
        }

        return frequencyMap;
    }

    public enum LineProperty {
        VISITED_DOMAIN {
            @Override
            public String getValue(LogLine line) {
                return line.getVisitedDomain();
            }
        },
        CATEGORY {
            @Override
            public String getValue(LogLine line) {
                return line.getCategory();
            }
        },
        VERDICT {
            @Override
            public String getValue(LogLine line) {
                return line.getVerdict();
            }
        };

        public abstract String getValue(LogLine line);
    }
}

Then given an instance of WebUsageLog you could call it like this:

WebUsageLog usageLog = ...
SortedMap<Double, String> visitedSiteFrequency = usageLog.getFrequency(VISITED_DOMAIN);
SortedMap<Double, String> categoryFrequency = usageLog.getFrequency(CATEGORY);
Mike Deck
Exactly what I was looking for. I knew there had to be a way to do it without explicitly building each method. Thanks a ton for your help.
EricBoersma