views:

101

answers:

2

I need to suck data from stdin and create a object.

The incoming data is between 5 and 10 lines long. Each line has a process number and either an IP address or a hash. For example:

pid=123 ip=192.168.0.1 - some data
pid=123 hash=ABCDEF0123 - more data
hash=ABCDEF123 - More data
ip=192.168.0.1 - even more data

I need to put this data into a class like:

class MyData():
  pid = None
  hash = None
  ip = None
  lines = []

I need to be able to look up the object by IP, HASH, or PID.

The tough part is that there are multiple streams of data intermixed coming from stdin. (There could be hundreds or thousands of processes writing data at the same time.)

I have regular expressions pulling out the PID, IP, and HASH that I need, but how can I access the object by any of those values?

My thought was to do something like this:

myarray = {}

for each line in sys.stdin.readlines():
  if pid and ip:  #If we can get a PID out of the line
     myarray[pid] = MyData().pid = pid #Create a new MyData object, assign the PID, and stick it in myarray accessible by PID.
     myarray[pid].ip = ip #Add the IP address to the new object
     myarray[pid].lines.append(data) #Append the data

     myarray[ip] = myarray[pid] #Take the object by PID and create a key from the IP.
  <snip>do something similar for pid and hash, hash and ip, etc...</snip>

This gives my an array with two keys (a PID and an IP) and they both point to the same object. But on the next iteration of the loop, if I find (for example) an IP and HASH and do:

myarray[hash] = myarray[ip]

The following is False:

myarray[hash] == myarray[ip]

Hopefully that was clear. I hate to admit that waaay back in the VB days, I remember being able handle objects byref instead of byval. Is there something similar in Python? Or am I just approaching this wrong?

A: 

Python only has references.

Create the object once, and add it to all relevant keys at once.

class MyData(object):
  def __init__(self, pid, ip, hash):
    self.pid = pid
     ...

for line in sys.stdin:
  pid, ip, hash = process(line)
  obj = MyData(pid=pid, ip=ip, hash=hash)
  if pid:
    mydict[pid] = obj
  if ip:
    mydict[ip] = obj
  if hash:
    mydict[hash] = obj
Ignacio Vazquez-Abrams
I can't add it all at once. Each time I go through the loop, I get different combinations of pid, ip, and hash. Every time through the loop, I have figure out what combination I have, and look through the array for the object. Then I associate it with the newly found pid, ip, or hash.For example, the first line may be just a pid. The second line might be a pid and hash. The third line might be hash and ip.It appears that Python is creating copies of MyData when I domydict[ip] = mydict[hash] for example.If I then do mydict[hash].blah = 1, then mydict[ip] isn't changed.
Aaron C. de Bruyn
That's why `process()` returns `None` for the bits it doesn't have. And no, it doesn't make copies.
Ignacio Vazquez-Abrams
A: 

Make two separate dicts (and don't call them arrays!), byip and byhash -- why do you need to smoosh everything together and risk conflicts?!

BTW, you't possibly can have the following two lines back to back:

myarray[hash] = myarray[ip]
assert not(myarray[hash] == myarray[ip])

To make the assert hold you must be doing something else in between (perturbing the misnamed myarray).

BTW squared, assignments in Python are always by reference to an object -- if you want a copy you must explicitly ask for one.

Alex Martelli