views:

117

answers:

4

I have a file with entries such as: 26 1 33 2 . . .

and another file with sentences in english

I have to write a script to print the 1st word in sentence number 26 and the 2nd word in sentence 33. How do I do it?

A: 

Here's a general sketch:

  • Read the first file into a list (a numeric entry in each element)
  • Read the second file into a list (a sentence in each element)
  • Iterate over the entry list, for each number find the sentence and print its relevant word

Now, if you show some effort of how you tried to implement this in Python, you will probably get more help.

Eli Bendersky
+2  A: 

The following code should do the task. With assumptions that files are not too large. You may have to do some modification to deal with edge cases (like double space, etc)

# Get numers from file
num = []
with open('1.txt') as file:
    num = file.readlines()

# Get text from file    
text = []
with open('2.txt') as file:
    text = file.readlines()

# Parse text into words list.
data = []
for line in text:                    # For each paragraoh in the text
    sentences = l.strip().split('.') # Split it into sentences
    words = []
    for sentence in sentences:       # For each sentence in the text
        words = sentence.split(' ')  # Split it into words list
        if len(words) > 0:
            data.append(words)

# get desired result
for i = range(0, len(num)/2):
     print data[num[i+1]][num[i]]
Findekano
Wrong Python syntax in the `with` statements.
Alex Martelli
Oops, I have fixed it, Thanks for pointing out.
Findekano
Problem 2: You are assuming that there is one number per line; this does not correspond with the OP's example. Problem 3: `num` is a list of *str* objects. If you actually ran this code, it would blow up in the last line trying to use a str object as an index into a list. Problem 4: the arguments of `range()` are incorrect.
John Machin
A: 

The big issue is that you have to decide what separates "sentences". For example, is a '.' the end of a sentence? Or maybe part of an abbreviation, e.g. the one I've just used?-) Secondarily, and less difficult, what separates "words", e.g., is "TCP/IP" one word, or two?

Once you have sharply defined these rules, you can easily read the file of text into a a list of "sentences" each of which is a list of "words". Then, you read the other file as a sequence of pairs of numbers, and use them as indices into the overall list and inside the sublist thus identified. But the problem of sentence and word separation is really the hard part.

Alex Martelli
A: 

In the following code, I am assuming that sentences end with '. '. You can modify it easily to accommodate other sentence delimiters as well. Note that abbreviations will therefore be a source of bugs.

Also, I am going to assume that words are delimited by spaces.

sentences = []
queries = []
english = ""

for line in file2:
    english += line
while english:
    period = english.find('.')
    sentences += english[: period+1].split()
    english = english[period+1 :]
q=""
for line in file1:
    q += " " + line.strip()

q = q.split()
for i in range(0, len(q)-1, 2):
    sentence = q[i]
    word = q[i+1]
    queries.append((sentence, query))

for s, w in queries:
    print sentences[s-1][w-1]

I haven't tested this, so please let me know (preferably with the case that broke it) if it doesn't work and I will look into bugs

Hope this helps

inspectorG4dget
The q thing can be built in one go: `q = map(int, file1.read().split())` ... The queries thing can be built in one go: `queries = [(q[i], q[i+1]) for i in xrange(0, len(q)-1, 2)]` ... Your bug will manifest itself in the last line, but I've already fixed it, way back :-) The queries thing and the print thing could also be combined. Above remarks have been tested to the same extent as your code :-)
John Machin
I understand your comments and I would normally have used a lot of the list comprehension. However, This was tagged as possible-homework, so I tried to make my code more transparent
inspectorG4dget
Bah. You are teaching n00bs how to write fugly code which hides the essence of what is happening under a midden of gruesome detail. Overdone: `q += " " + line.strip()` when the next thing is `q.strip()`. Bad habit: re-cycling names: `q = q.split()` aarrgghh. ANOTHER BUG: consider what happens if there is text after the last '.'. More sources of bugs: numbers (1.23), URLs (http://www.thedailywtf.com), IP addresses, ...
John Machin
First of all, I don't think anyone appreciates calling a user a n00b for a asking question that they have tagged as homework. I understand your concerns. He clearly is in the learning stages and is asking for help. I'm sorry to say this, but calling him names at this point is not a very nice thing to do. It will only make him feel bad for no good reason. Further, doing so is unbecoming of the community feeling that is nurtured here. Since this IS homework, I made a few assumptions about the input. Perhaps I should have made these clearer, but your assault on a beginner is uncalled for.
inspectorG4dget
I really hope that I misunderstood what you have said, because otherwise, you come across as very mean.
inspectorG4dget
Most of the comments here are legitimate technical points (albeit subjective perhaps) - I don't think I'd call them "mean".
Marc Gravell
@inspectorG4dget: You have misunderstood severely. A "n00b" is any new user whether they tag their question as homework or not. It is not namecalling of the OP. s/n00bs/new users/ and get over it.
John Machin