First of all, one million lines is not at all a huge file.
A simple Perl script can chew a 2.7 million lines script in 6 seconds, without having to think much about the algorithm.
In any case hashing is the way to go and, as shown, there's no need to bother with hashing over an integer representation.
If we were talking about a really huge file, then I/O would become the bottleneck and thus the hashing method gets less and less relevant as the file grows bigger.
Theoretically in a language like C it would probably be faster to hash over an integer than over a string, but I doubt that in a language suited to this task that would really make a difference. Things like how to read the file efficiently would matter much much more.
Code
vinko@parrot:~$ more hash.pl
use strict;
use warnings;
my %ip_hash;
my %product_hash;
open my $fh, "<", "log2.txt" or die $!;
while (<$fh>) {
my ($timestamp, $ip, $product) = split (/;/,$_); #To fix highlighting
$ip_hash{$ip} = 1 if (!defined $ip_hash{$ip});
if (!defined $product_hash{$product}) {
$product_hash{$product} = 1
} else {
$product_hash{$product} = $product_hash{$product} + 1;
}
}
for my $ip (keys %ip_hash) {
print "$ip\n";
}
my @pkeys = sort {$product_hash{$b} <=> $product_hash{$a}} keys %product_hash;
print "Top product: $pkeys[0]\n";
Sample
vinko@parrot:~$ wc -l log2.txt
2774720 log2.txt
vinko@parrot:~$ head -1 log2.txt
1;10.0.1.1;DuctTape
vinko@parrot:~$ time perl hash.pl
11.1.3.3
11.1.3.2
10.0.2.2
10.1.2.2
11.1.2.2
10.0.2.1
11.2.3.3
10.0.1.1
Top product: DuctTape
real 0m6.295s
user 0m6.230s
sys 0m0.030s