I have to do a class project for data mining subject. My topic will be mining stackoverflow's data for trending topics.
So, I have downloaded the data from here but the data set is so huge (posts.xml is 3gb in size), that I cannot process it on my machine.
So, what do you suggest, is going for AWS for data processing a good option or not worth it?
I have no prior experience on AWS, so how can AWS help me with my school project? How would you have gone about it?
UPDATE 1
So, my data processing will be in 3 stages:
1. Convert XML (from so.com dump) to .ARFF (for weka jar),
2. Mine the data using algos in weka,
3. Convert the output to GraphML format which will be read by prefuse library for visualization.
So, where does AWS fit in here? I support there are two features in AWS which can help me:
1. EC2 and
2. Elastic MapReduce,
but I am not sure how mapreduce works and how can I use it in my project. Can I?