tags:

views:

68

answers:

4

Let's say you have the following.

192.168.0.100
192.168.0.100
192.168.0.100
192.168.0.102
192.168.0.102
192.168.0.100

That's considered 3 unique hits. The way to distinguish it is that consecutive identical IPs count as one. How would you loop through the file and count accordingly?

+1  A: 

Am not familiar with bash scripting, but the idea would be to keep track of the previous checked IP. Then if previous == current, don't increment, else increment?

Inf.S
+1: This would be a simple way to do it.
sheepsimulator
+2  A: 

I would avoid using bash for this. Use a real language like Python, awk or even Perl.

Python

#!/usr/bin/env python 
from __future__ import print_function
import fileinput
def combine( source ):
    count, prev= 1, source.next()
    for line in source:
        if line == prev:
            count += 1
        else:
            yield count, prev
            count, prev = 1, line
    yield count, prev
 for count, text in combine( fileinput.input() ):
    print( count, text )

Simple and extremely fast compared to bash.

Since this reads from stdin and writes to stdout, you can use it as a simple command in a pipeline.

S.Lott
Elegant, but I'm not quite sure it solves the OP's problem. The above set of data should return three groups in the map, not two. My Python isn't super-great, but at first glance I'd say this would return two groups.
sheepsimulator
If you have two entries for the same IP, with another IP in-between, then will this count things correctly? I believe you need to discriminate between 1st, 2nd, nth occurrence of the same ip, provided that they are not consecutive. Also, please specify which version of Python you are using, perhaps with a shebang at the top.
Hamish Grubijan
@Hamish Grubijan: This will work with any version that includes collections.defaultdict. That is >=2.5.
S.Lott
Corrected to handle the non-adjacency issue.
S.Lott
+7  A: 

If your uniq is like mine, and works only similar strings in sequence, just don't sort before your uniq:

file foo.txt:

192.168.0.100
192.168.0.100
192.168.0.100
192.168.0.102
192.168.0.102
192.168.0.100

And:

$ cat foo.txt | uniq -c

edit: can I give myself a useless use of cat award?

$ uniq -c foo.txt

/edit
Output:

  3 192.168.0.100
  2 192.168.0.102
  1 192.168.0.100
Wrikken
This works, I sorted it prior to line count.
luckytaxi
A: 

Similar to @Wrikken's answer, but I think you want total counts:

If your file containing the data above is foo.txt, then:

$ cat foo.txt | uniq | wc -l
3

Which is what you want I think.

Dean Povey