tags:

views:

125

answers:

1

Description of Scenario for project: To search and articles in a library computer system one uses keywords in combination with Boolean operators such as 'and'(&) and 'or(). For example, if you are going to search for articles and books dealin with uses of nanotechnology and bridge construction, the query would be Nanotechnology & bridge construction. In order to retrive the books and articlkes properly, every document is represented using a set of keywords that represent the content of the document.

Assume that each document (books, articles, etc.) is represented by a document number which is unique. You will be provided with a set of documents represented by their numbers and keywords that are contained in that document as given below.

887 5

nanotechnology

bridge construction

carbon fiber

digital signal processing

wireless

The number 887 above corresponds to the document number and 5 is the number of keywords that are given for the document. Each keyword will be on a separate line. The input for your project will contain a set of document numbers and keywords for each document. The first line of the input will contain an integer that corresponds to the number of document records to process.

An Inverted List data structure is where for each keyword we store a set of document numbers that contain the keyword. For example, for the keyword carbon fiber we will have the following:

bridge construction 887, 117, 665, 900

carbon fiber 887, 1098, 654, 665, 117

The documents numbered 887, 1098, 654, 665, and 117 all will contain the keyword carbon fiber and the keyword bridge construction is found in documents numbered 887, 117, 665 and 900. There are two main aspects to this project, one I am required to read a file (using standard input) that contains the document information and build the inverted list data structure, and two to apply Boolean queries to the Inverted List data structure.

The Boolean queries are processed as illustrated in the following example. To obtain the documents containing the keywords bridge construction & carbon fiber we perform a set intersection operation and get the documents 887, 117, and 665. The Boolean query bridge construction | carbon fiber will result in a set union operation and the documents for this query are 887, 1098, 654, 665, and 900.

OK SO MY QUESTION IS:

How do I read the document since on my first class is a setClass that stores a set of Document numbers?

My problem is that all documents are all in one text file for example:

25 //first document number

329 7 //second document number

ARAMA

ROUTING ALGORITHM

AD-HOC

CSMA

MAC LAYER

JARA

MANET

107 4 //third document number

ANALYSIS

CROSS-LAYER

GEOGRAPHIC FORWARDING

WIRELESS SENSOR NETWORKS

so how can I read the document numbers since they all have different amount of keywords right after another?

Please help me, anything is greatly appreciated?

+1  A: 

Is the "25" on the first line actually the number of documents in the file? I'll go with that assumption (if not, just read documents until you hit EOF)

Here is some pseudo-code for reading the file:

int numDocs = readLine // assuming first number is number of docs

for (int i = 0; i < numDocs; ++i)
{
    string line = readLine
    int docNumber = getFirstNumber(line)
    int numKeywords = getSecondNumber(line)

    for (int j = 0; j < numKeywords; ++j)
    {
        string keyword = readline
        associate keyword with docNumber // however this works
    }
}
Andy White
im not sure if that first number is number of documents, but is a good observation. I email the professor to see what it was suppose to be but still have not gotten an answer.
ok so I tried to use your code, it keeps telling me that readLine is an undeclared identifierso is there is a header that i need to put in to use read line. Please let me know. thanks
Oh, that probably isn't a real function. It's just pseudo code to give you an idea of what it might look like. For c++ a "readLine" might be "cin >> myString" or whatever
Andy White