tags:

views:

515

answers:

2

I'm writing a little tool to monitor class openings at my school.

I wrote a python script that will fetch the current availablity of classes from each department every few minutes.

The script was functioning properly until the uni's site started returning this:

SIS Server is not available at this time

Uni must have blocked my server right? Well, not really because that is the output I get when I goto the URL directly from other PCs. But if I go through the intermediary form on uni's site that does a POST, I don't get that message.

The URL I'm requesting is https://s4.its.unc.edu/SISMisc/SISTalkerServlet

This is what my python code looks like:

data = urllib.urlencode({"progname" : "SIR033WA", "SUBJ" : "busi", "CRS" : "", "TERM" : "20099"})
f = urllib.urlopen("https://s4.its.unc.edu/SISMisc/SISTalkerServlet", data)
s =  f.read()
print (s)

I am really stumped! It seems like python isn't sending a proper request. At first I thought it wasn't sending a proper post data but I changed the URL to my localbox and the post data apache recieved seemed just fine.

If you'd like to see the system actually functioning, goto https://s4.its.unc.edu/SISMisc/browser/student_pass_z.jsp and click on the "Enter as Guest" button and then look for "Course Availability". (Now you know why I'm building this!)

Weirdest thing is this was working until 11am! I've had the same error before but it only lasted for few minutes. This tells me it is more of a problem somewhere than any blocking of my server by the uni.

update Upon suggestion, I tried to play with a more legit referer/user-agent. Same result. This is what I tried:

import httplib
import urllib
headers =  {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4',"Content-type": "application/x-www-form-urlencoded","Accept": "text/plain","Referrer": "https://s4.its.unc.edu/SISMisc/SISTalkerServlet"}
data = urllib.urlencode({"progname" : "SIR033WA", "SUBJ" : "busi", "CRS" : "", "TERM" : "20099"})
c = httplib.HTTPSConnection("s4.its.unc.edu",443)
c.request("POST", "/SISMisc/SISTalkerServlet",data,headers)
r = c.getresponse()
print r.read()
A: 

After seeing multiple requests from an odd non-browser User-Agent string, it's possible that they are blocking users not being referred to from the site. For example, PHP has a feature called $_SERVER['HTTP_REFERRER'] IIRC, which will check the page which reffered the user to the current one. Since your program is not including one in the User-Agent string (you are trying to directly access it) it is very possible they are preventing you access based upon that. Try adding a referrer into the headers of your http request and see how it goes. (preferably a page which links to the one you're trying to access)

http://whatsmyuseragent.com/ can assist you in building your spoofed user agent.

you then build headers like so...

headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain"}

and then send them as an additional parameter with your HTTPConnection request...

conn.request("POST", "/page/on/site", params, headers)

see the python doc on httplib for further reference and examples.

John T
No go. I'll paste the code I tried in OP since comment box won't allow > 300 chars.
The best i can suggest is to try and replicate the code in the form on the actual preceeding page. See what it posts, possibly hidden values. Alternatively you could go to that page first and follow the url you're trying to reach from there.
John T
Thanks. Including some more hidden fields(thanks to Tamper Data) + switching to wget over urllib worked. The mystery remains though as to why it was working fine till afternoon. We'll see if this was a one time change on their end or some bigger problem.
+2  A: 

This post doesn't attempt to fix your code, but suggest a debugging tool.

Once upon a time I was coding a program to fill out online forms for me. To learn exactly how my browser was handling the POSTs, and cookies, and whatnot, I installed WireShark ( http://www.wireshark.org/ ), a network sniffer. This application allowed me to view, chunk by chunk, the data that was being sent and received on the IP and hardware level.

You might consider trying out a similar program and comparing the network flow. This might highlight differences between what your browser is doing and your script is doing.

Willi Ballenthin