Description of Scenario for project: To search and articles in a library computer system one uses keywords in combination with Boolean operators such as 'and'(&) and 'or(). For example, if you are going to search for articles and books dealin with uses of nanotechnology and bridge construction, the query would be Nanotechnology & bridge construction. In order to retrive the books and articlkes properly, every document is represented using a set of keywords that represent the content of the document.
Assume that each document (books, articles, etc.) is represented by a document number which is unique. You will be provided with a set of documents represented by their numbers and keywords that are contained in that document as given below.
887 5
nanotechnology
bridge construction
carbon fiber
digital signal processing
wireless
The number 887 above corresponds to the document number and 5 is the number of keywords that are given for the document. Each keyword will be on a separate line. The input for your project will contain a set of document numbers and keywords for each document. The first line of the input will contain an integer that corresponds to the number of document records to process.
An Inverted List data structure is where for each keyword we store a set of document numbers that contain the keyword. For example, for the keyword carbon fiber we will have the following:
bridge construction 887, 117, 665, 900
carbon fiber 887, 1098, 654, 665, 117
The documents numbered 887, 1098, 654, 665, and 117 all will contain the keyword carbon fiber and the keyword bridge construction is found in documents numbered 887, 117, 665 and 900. There are two main aspects to this project, one I am required to read a file (using standard input) that contains the document information and build the inverted list data structure, and two to apply Boolean queries to the Inverted List data structure.
The Boolean queries are processed as illustrated in the following example. To obtain the documents containing the keywords bridge construction & carbon fiber we perform a set intersection operation and get the documents 887, 117, and 665. The Boolean query bridge construction | carbon fiber will result in a set union operation and the documents for this query are 887, 1098, 654, 665, and 900.
OK SO MY QUESTION IS:
How do I read the document since on my first class is a setClass that stores a set of Document numbers?
My problem is that all documents are all in one text file for example:
25 //first document number
329 7 //second document number
ARAMA
ROUTING ALGORITHM
AD-HOC
CSMA
MAC LAYER
JARA
MANET
107 4 //third document number
ANALYSIS
CROSS-LAYER
GEOGRAPHIC FORWARDING
WIRELESS SENSOR NETWORKS
so how can I read the document numbers since they all have different amount of keywords right after another?
Please help me, anything is greatly appreciated?