views:

507

answers:

5

Could you guys please tell me how I can make the following code more pythonic?

The code is correct. Full disclosure - it's problem 1b in Handout #4 of this machine learning course. I'm supposed to use newton's algorithm on the two data sets for fitting a logistic hypothesis. But they use matlab & I'm using scipy

Eg one question i have is the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?

Thanks

import os.path
import math
from numpy import matrix
from scipy.linalg import inv #, det, eig

x = matrix( '0.0;0;1'  )
y = 11
grad = matrix( '0.0;0;0'  )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
theta = matrix( '0.0;0;0'  ) 


# run until convergence=6or7
for i in range(1, 6):
  #reset
  grad = matrix( '0.0;0;0'  )
  hess = matrix('0.0,0,0;0,0,0;0,0,0')

  xfile = open("q1x.dat", "r")
  yfile = open("q1y.dat", "r")


  #over whole set=99 items  
  for i in range(1, 100):    
    xline = xfile.readline()
    s= xline.split("  ")
    x[0] = float(s[1])
    x[1] = float(s[2])
    y = float(yfile.readline())

    hypoth = 1/ (1+ math.exp(-(theta.transpose() * x)))

    for j in range(0,3):
      grad[j] = grad[j] + (y-hypoth)* x[j]      
      for k in range(0,3):
        hess[j,k] = hess[j,k] - (hypoth *(1-hypoth)*x[j]*x[k])


  theta = theta - inv(hess)*grad #update theta after construction

  xfile.close()
  yfile.close()

print "done"
print theta
A: 

You could make use of the with statement.

Geo
except that nobody *gets* the with statement yet...
Daren Thomas
-1: in what way? Just randomly? or with something as aPurpose?
S.Lott
for file handling.
Geo
yeah, he could also make use httplib to add some spice.. ?
yairchu
that's the best you could come up with?
Geo
@Geo: it was the best I could come up with. sorry for the sarcasm :)
yairchu
+9  A: 

One obvious change is to get rid of the "for i in range(1, 100):" and just iterate over the file lines. To iterate over both files (xfile and yfile), zip them. ie replace that block with something like:

 import itertools

 for xline, yline in itertools.izip(xfile, yfile):
    s= xline.split("  ")
    x[0] = float(s[1])
    x[1] = float(s[2])
    y = float(yline)
    ...

(This is assuming the file is 100 lines, (ie. you want the whole file). If you're deliberately restricting to the first 100 lines, you could use something like:

 for i, xline, yline in itertools.izip(range(100), xfile, yfile):

However, its also inefficient to iterate over the same file 6 times - better to load it into memory in advance, and loop over it there, ie. outside your loop, have:

xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
data = zip([line.split("  ")[1:3] for line in xfile], map(float, yfile))

And inside just:

for (x1,x2), y in data:
    x[0] = x1
    x[1] = x2
     ...
Brian
that should be line.split(" ")[1:3] as yairchu did it. Split two spaces, this site edits my code.
MercerKernel
Oops, you're right - I missed that. Updated.
Brian
A: 

the code that reads the files into lists could be drastically simpler

for line in open("q1x.dat", "r"):
    x = map(float,line.split("  ")[1:])
y = map(float, open("q1y.dat", "r").readlines())
Nathan
you are overriding x all the time.perhaps you intended x = [map(float, line.split(" ")[1:] for line in open("q1x.dat", "r") ??
yairchu
wait doesn't his code do that too?
Nathan
@Nathan: to clear some confusion - lets call your "y" "ys" as it has a list of all "y"s in OP's code. your "x" however could not be renamed to "xs" as it has the same values of his "x". then it would seem odd that you calculate "x" and "ys" and not "xs" and "ys". I hope I'm clear..
yairchu
+3  A: 
x = matrix([[0.],[0],[1]])
theta = matrix(zeros([3,1]))
for i in range(5):
  grad = matrix(zeros([3,1]))
  hess = matrix(zeros([3,3]))
  [xfile, yfile] = [open('q1'+a+'.dat', 'r') for a in 'xy']
  for xline, yline in zip(xfile, yfile):
    x.transpose()[0,:2] = [map(float, xline.split("  ")[1:3])]
    y = float(yline)
    hypoth = 1 / (1 + math.exp(theta.transpose() * x))
    grad += (y - hypoth) * x
    hess -= hypoth * (1 - hypoth) * x * x.transpose()
  theta += inv(hess) * grad
print "done"
print theta
yairchu
I don't know, the code works fine. When they're matrices it seems to do inner product. I got the code from here (http://www.scipy.org/SciPy_Tutorial)
MercerKernel
@MercerKernel: cool - I learned something new! my use of arrays instead of matrices forced me do use dot. with matrices you can use "*"! I fixed the code and also allowed myself to make theta = -old_theta to simplify
yairchu
hmm, your map line complains about dimensions. I had to rewrite it as x[:2] = array([xline.split(" ")[1:3]], dtype=float).transpose() Same thing on the math.exp line i think you need theta.transpose()
MercerKernel
@MercerKernel: thanks. I think I fixed it. I don't have input files to run it with so I didn't test it.. :)
yairchu
hah, ok. It's still not very pythonic, but it's much more valuable :)
MercerKernel
several things make it *more* pythonic than your post: range(5) instead of range(1,6) because "real men" enumerate from 0. iterating files to get lines. zip. "a +=" instead of "a = a +". not closing the files and letting the garbage collector do it :)
yairchu
MercerKernel
This isn't so much an improvement in its "pythonicity" but you should never, ever, *ever* use inv(mat) * vector to solve a linear system. Use solve(mat, vector) - this does less floating point ops and generally leads to less error in the result due to rounding.
dwf
@dwf: I was refactoring the original code without changing it. and that's what the original code does.
yairchu
+3  A: 

the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?

At the top of your code:

from __future__ import division

In Python 2.6 and earlier, integer division always returns an integer unless there is at least one floating point number within. In Python 3.0 (and in future division in 2.6), division works more how we humans might expect it to.

If you want integer division to return an integer, and you've imported from future, use a double //. That is

from __future__ import division
print 1//2 # prints 0
print 5//2 # prints 2
print 1/2  # prints 0.5
print 5/2  # prints 2.5
Thomas Weigel