Hi guys, I am doing an application that will compute all 2 size frequent itemset from a set of transactions. That is the application will have as input a data file (space delimited text file - with the items encoded as integers) and a percentage, given as an integer (e.g. input 2 represents 2%). The application will output in a distinct file each pair of numbers that appears together in the same transaction (a transaction is represented by one line in the file) in more than 2% of all transactions (where 2% is the percentage given in the input). The output file will contain each pair of items in a line together with their support (the number of transactions where they appear) also the application will output (on the screen on in a file) the duration (the time needed to execute the task).
the data file will be like
55 22 33 123 231 414
21 43 432 435 231 4324 534
22 21 33 123 231 534 666 222
...
each line is called a transaction and the input file contains thousands of transactions. I am thinking about using the data mining rule first to find all the single numbers whose appear frequency is larger than 2% in each transaction, and then form pairs for each transaction and at last compare each pair and generate the output file.
anyone has some ideas or code for this please help, if you have code(better in java) for this that will be very helpful Thanks a lot.