views:

84

answers:

2

Brent's answer suggests me that has made a database of SO questions such that he can fast analyze the questions.

I am interested in making a similar database by MySQL such that I can practice MySQL with similar queries as Brent.

The database should include at the least the following fields (I am guessing here, since the API of SO's api seems to be sectet). I aim to list only relevant variables which would allow me to make similar analysis as Brent.

  • Questions
  • Question_id (private key)
  • Question_time

  • Comments

  • Comment_id (private key)
  • Comment_time

  • User_id (private key)

  • User_name

We need apparently scrape the data by Python's Beautiful Soap because Brent's database is apparently hidden.

How can you make such a MySQL database by Python's Beautiful Soap?**

+1  A: 

I don't know the details of how to import the data into MySQL, but the raw data of Stack Overflow is freely available: http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/

There's no secret API, nor any need to use Beautiful Soup.

RichieHindle
+1  A: 

I'm sure it's possible to work directly with the XML data dump @RichieHindle mentions, but I was much happier with @nobody_'s sqlite version -- especially after adding the indices as the README file in that sqlite version says.

If you have the complete, indexed sqlite version and want to load the Python-tagged subset into a MySQL database, that can be seen as a simple but neat exercise in using two DB API instances, reading from the sqlite one and writing to the MySQL one (personally I found the sqlite performance entirely satisfactory once the index-building is done, so I did no subset extraction nor any moving to other DB engines) -- no Soup nor Soap needed for the purpose. In any case, it was much simpler and faster for me than loading from XML directly, despite lxml and all.

Of course if you do still want to perform the subset-load, and if you experience any trouble at all coding it up, ask (with schema and code samples, error messages if any, etc) and SOers will try to answer, as usual!-)

Alex Martelli
@Thank you for your answer! --- I will do my best to get these small problems solved and post the solution to SO.
Masi