views:

257

answers:

3

Overall Plan

Get my class information to automatically optimize and select my uni class timetable

Overall Algorithm

  1. Logon to the website using its Enterprise Sign On Engine login
  2. Find my current semester and its related subjects (pre setup)
  3. Navigate to the right page and get the data from each related subject (lecture, practical and workshop times)
  4. Strip the data of useless information
  5. Rank the classes which are closer to each other higher, the ones on random days lower
  6. Solve a best time table solution
  7. Output me a detailed list of the BEST CASE information
  8. Output me a detailed list of the possible class information (some might be full for example)
  9. Get the program to select the best classes automatically
  10. Keep checking to see if we can achieve 7.

6 in detail Get all the classes, using the lectures as a focus point, would be highest ranked (only one per subject), and try to arrange the classes around that.

Questions

Can anyone supply me with links to something that might be similar to this hopefully written in python? In regards to 6.: what data structure would you recommend to store this information in? A linked list where each object of uniclass? Should i write all information to a text file?

I am thinking uniclass to be setup like the following attributes:

  • Subject
  • Rank
  • Time
  • Type
  • Teacher

I am hardly experienced in Python and thought this would be a good learning project to try to accomplish. Thanks for any help and links provided to help get me started, open to edits to tag appropriately or what ever is necessary (not sure what this falls under other than programming and python?)

EDIT: can't really get the proper formatting i want for this SO post ><

A: 

BeautifulSoup was mentioned here a few times, e.g get-list-of-xml-attribute-values-in-python.

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:

  1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
  2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
  3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.

Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours take only minutes with Beautiful Soup.

gimel
+2  A: 

Depending on how far you plan on taking #6, and how big the dataset is, it may be non-trivial; it certainly smacks of NP-hard global optimisation to me...

Still, if you're talking about tens (rather than hundreds) of nodes, a fairly dumb algorithm should give good enough performance.

So, you have two constraints:

  1. A total ordering on the classes by score; this is flexible.
  2. Class clashes; this is not flexible.

What I mean by flexible is that you can go to more spaced out classes (with lower scores), but you cannot be in two classes at once. Interestingly, there's likely to be a positive correlation between score and clashes; higher scoring classes are more likely to clash.

My first pass at an algorithm:

selected_classes = []
classes = sorted(classes, key=lambda c: c.score)
for clas in classes:
    if not clas.clashes_with(selected_classes):
        selected_classes.append(clas)

Working out clashes might be awkward if classes are of uneven lengths, start at strange times and so on. Mapping start and end times into a simplified representation of "blocks" of time (every 15 minutes / 30 minutes or whatever you need) would make it easier to look for overlaps between the start and end of different classes.

Alabaster Codify
A: 

There are waaay too many questions here.

Please break this down into subject areas and ask specific questions on each subject. Please focus on one of these with specific questions. Please define your terms: "best" doesn't mean anything without some specific measurement to optimize.

Here's what I think I see in your list of topics.

  1. Scraping HTML

    1 Logon to the website using its Enterprise Sign On Engine login

    2 Find my current semester and its related subjects (pre setup)

    3 Navigate to the right page and get the data from each related subject (lecture, practical and workshop times)

    4 Strip the data of useless information

  2. Some algorithm to "rank" based on "closer to each other" looking for a "best time". Since these terms are undefined, it's nearly impossible to provide any help on this.

    5 Rank the classes which are closer to each other higher, the ones on random days lower

    6 Solve a best time table solution

  3. Output something.

    7 Output me a detailed list of the BEST CASE information

    8 Output me a detailed list of the possible class information (some might be full for example)

  4. Optimize something, looking for "best". Another undefinable term.

    9 Get the program to select the best classes automatically

    10 Keep checking to see if we can achieve 7.

BTW, Python has "lists". Whether or not they're "linked" doesn't really enter into it.

S.Lott