I need to gather some statistics in my software and i am trying to make it fast and correct, which is not easy (for me!)
first my code so far with two classes, a StatsService and a StatsHarvester
public class StatsService
{
private Map<String, Long> stats = new HashMap<String, Long>(1000);
public void notify ( String key )
{
Long value = 1l;
synchronized (stats)
{
if (stats.containsKey(key))
{
value = stats.get(key) + 1;
}
stats.put(key, value);
}
}
public Map<String, Long> getStats ( )
{
Map<String, Long> copy;
synchronized (stats)
{
copy = new HashMap<String, Long>(stats);
stats.clear();
}
return copy;
}
}
this is my second class, a harvester which collects the stats from time to time and writes them to a database.
public class StatsHarvester implements Runnable
{
private StatsService statsService;
private Thread t;
public void init ( )
{
t = new Thread(this);
t.start();
}
public synchronized void run ( )
{
while (true)
{
try
{
wait(5 * 60 * 1000); // 5 minutes
collectAndSave();
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
private void collectAndSave ( )
{
Map<String, Long> stats = statsService.getStats();
// do something like:
// saveRecords(stats);
}
}
At runtime it will have about 30 concurrent running threads each calling notify(key)
about 100 times. Only one StatsHarvester is calling statsService.getStats()
So i have many writers and only one reader. it would be nice to have accurate stats but i don't care if some records are lost on high concurrency.
The reader should run every 5 Minutes or whatever is reasonable.
Writing should be as fast as possible. Reading should be fast but if it locks for about 300ms every 5 minutes, its fine.
I've read many docs (Java concurrency in practice, effective java and so on), but i have the strong feeling that i need your advice to get it right.
I hope i stated my problem clear and short enough to get valuable help.
EDIT
Thanks to all for your detailed and helpful answers. As i expected there is more than one way to do it.
I tested most of your proposals (those i understood) and uploaded a test project to google code for further reference (maven project)
http://code.google.com/p/javastats/
I have tested different implementations of my StatsService
- HashMapStatsService (HMSS)
- ConcurrentHashMapStatsService (CHMSS)
- LinkedQueueStatsService (LQSS)
- GoogleStatsService (GSS)
- ExecutorConcurrentHashMapStatsService (ECHMSS)
- ExecutorHashMapStatsService (EHMSS)
and i tested them with x
number of Threads each calling notify y
times, results are in ms
10,100 10,1000 10,5000 50,100 50,1000 50,5000 100,100 100,1000 100,5000
GSS 1 5 17 7 21 117 7 37 254 Summe: 466
ECHMSS 1 6 21 5 32 132 8 54 249 Summe: 508
HMSS 1 8 45 8 52 233 11 103 449 Summe: 910
EHMSS 1 5 24 7 31 113 8 67 235 Summe: 491
CHMSS 1 2 9 3 11 40 7 26 72 Summe: 171
LQSS 0 3 11 3 16 56 6 27 144 Summe: 266
At this moment i think i will use ConcurrentHashMap, as it offers good performance while it is quite easy to understand.
Thanks for all your input! Janning