Hi all
I'm interested in the problem of patterning mining among players of social networking games. For example detecting cheaters of a game, given a company's user database. So far I have been following the usual recipe for a data mining project:
- construct a data warehouse that aggregates significant information
- select a classifier, and train it with a subsectio of records from the warehouse
- validate classifier with another test set
- lather, rinse, repeat
Surprisingly, I've found very little in this area regarding literature, best practices, etc. I am hoping to crowdsource the information gathering problem here. Specifically what I'm looking for:
- What classifiers have worked will for this type of pattern mining (it seems highly temporal, users playing games, users receiving rewards, users transferring prizes etc).
- Are there any highly agreed upon attributes specific to social networking / gaming data?
- What is a practical amount of information that should be considered? One problem I've run into is data overload, where queries and data cleansing may take days to complete.
- Related to point above, what hardware resources are required to produce results? I've found it difficult to estimate the amount of computing power I will require for production use. It has become apparent that a white box in the corner does not have enough horse-power for such a project. Are companies generally resorting to cloud solutions? Are they buying clusters?
Basically, any resources (theoretical, academic, or practical) about implementing a social networking / gaming pattern-mining program would be very much appreciated.
Thanks.