The lowest latency achievable depends on many factors your code won't have control over, mainly concerning your network.
Now, If I was the one doing this project, I'd see what algorithms and protocols are available for clock synchronization. Once you do this, each host should likely just send time-stamped packets to the server. At the server end, you can combine these packets somehow (maybe a bitwise or over all the bytes for a certain time slot from each machine) and send them out again via multicast.
Trouble is, even your code will have problems... you don't have a way to reliably get those packets to the server in real-time. UDP will drop packets, and you'll have to build a tolerance in for accepting late arrivals or no-shows. TCP is no better in this regard. Sure, the packets are guaranteed to arrive in order, but at what cost in time? Additionally, to compress the sound at each host, then un-compress it at the server, do your combining, and re-compress... all while maintaining the feel of real-time sounds awfully ambitious.
I am by NO MEANs to be considered as being an expert, nor do I have ANY experience doing this type of thing, but it just SOUNDS tough.