hey guys,
please help me in finding the optimized solutions to these interesting data structure questions:
- given a file containing approx 10 million words, design a data structure for finding the anagrams
- Write a program to display the ten most frequent words in a file such that your program be efficient in all complexity measures.
- you have a file with millions of lines of data. Only two lines are identical; the rest are all unique. Each line is so long that it may not even fit in the memory. What is the most efficient solution for finding the identical lines?
Adding some more questions:
4) (asked by MS) You are given an array of Strings of length 3. One of the string in the array is marked as Start string and another one as End string. You have to convert start string to end string, given the condition that the intermediate string which you will make should differ from its previous string by only one character and the string should be present in the input array. eg. If input is Array: {"fat", "tab", "eat", "see", "tub", "fab", "rat", "sel"} Start: "fat" End: "tub" Then the output should be fat -> fab -> tab -> tub
I had tried to solve the third one and had come up with two possible appraoches: 1) Read only the first word of all the lines and then eliminate all those lines whose first word does not match the first word of any other line. Keep getting the successive words of the remaining lines in this manner until you are left with just two lines. You got your answer! 2) Convert each line into a smaller representation. This can be achieved by coding each word in short binary form and then XORing the bits representing each line.
Edit: I now have a good collection of data-structure problems with me, if anyone is interested in discussing them here, then I can post some more.