views:

109

answers:

4

Hi,

I have a list of email address which I want to distribute evenly by domain.

For example:

let the list be,

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

The output should be

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

The source list is not sorted by domain as in example, but can be sorted by domain, if that can help. What would be an efficient (single/two pass?) algorithm of doing this?

raj

A: 

My starting attempt would be a hash map of linked lists, so that once all the domain collisions were grouped, you could iterate though the linked lists one at a time.

If that makes any sense.

The following code is completely UNTESTED, and I know there is a bunch of stuff not right in the second loop, but it was faster than trying to explain further.

$sortedList = array();
$tempList
$emailList = array('[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]');

$emailCount = 0;
foreach ( $emailList as $email ) {
    list($username, $domain) = explode('@', $email);
    $tempList[$domain][] = $user;
    $emailCount++;
}

for ( $i = 0; $i < $emailCount; $i++ ) {
    $listIndex = $i % count($tempList);
    if ( !empty($tempList[$listIndex]) ) {
        $sortedList[] = $tempList[$listIndex][0];
        unset($tempList[$listIndex][0]);
    } else {
        unset$tempList[$listIndex]);
    }
}
Xenph Yan
A: 

Essentially, for each email address, put the address into linked-list for given domain. That's an O(n) operation, then creating a new, balanced list is another roughly O(n) operation by cycling through the domain lists.

That's about 2-pass complexity, as ordered.

Dan Fego
+1  A: 

Beware of answers that assume that the number of email addresses per domain are the same (or similar).

I tried to solve essentially the same problem, and it received a lot of discussion on my blog: First Article, Second Article

We didn't find a fast, optimal solution, but the fastest, close-enough solution (including Perl source code) came from a comment from Aristotle Pagaltzis.

Kudos to Aristotle.

Oddthinking
A: 

Here is my solution to this problem. Given an input like

[["A", 13], ["B", 5], ["C", 3], ["D", 1]]

The output is

AABABAACABAACABAACABAD

Ruby source of the program is:

require "pp"

def shuffle (total, num)
  ret_arr = Array.new
  intervel = total/num.to_f
  0.upto(num-1) do |i|
    val = i * intervel
    ret_arr << val.floor
  end
  return ret_arr
end

freq_table = [["A", 13], ["B", 5], ["C", 3], ["D", 1]]

pp freq_table
total = 0
freq_table.collect {|i| total += i[1] }
final_array = Array.new(total,0)
print  final_array.to_s + "\n\n"
placed = 0

freq_table.each do |i|
  placements =  shuffle(total - placed, i[1])
  placements.each do |j|
    free_index = -1
    0.upto final_array.size do |k|
      free_index += 1 if (final_array[k] == 0 || final_array[k] == i[0])
      if j == free_index
        final_array[k] = i[0]
        break
      end
    end
  end
  print "Placing #{i[1]} #{i[0]}s over #{total - placed} positions\n"
  pp placements
  print  final_array.to_s + "\n\n"
  placed += i[1]
end

The idea is to take the alphabet with highest frequency and distribute it first across an array whose size is the total number of all elements. Next distribute the alphabet with second highest frequency and distribute it across the free spaces and so on.

If you have questions let me know and I am happy to answer.

raj

Rajkumar S