ansaurus

Question

Answer 1

+8 A:

Your indentation is possibly wrong, you should check count > 1 within the for j in i loop, not within the one that checks every single character in j[1:].

Also, here's a much easier way to do the same thing:

def count_slashes(items):
    return sum(item.count('/') for item in items)

for item in li:
    if count_slashes(item[1:]) > 1:
        print item[0][1:10]

Or, if you need the IDs in a list:

result = [item[0][1:10] for item in li if count_slashes(item[1:]) > 1]

Python list comprehensions and generator expressions are really powerful tools, try to learn how to use them as it makes your life much simpler. The count_slashes function above uses a generator expression, and my last code snippet uses a list comprehension to construct the result list in a nice and concise way.

Tamás 2010-06-04 12:17:03

Python surprises me again and again, how easy some things can be. Great answer +1

Felix Kling 2010-06-04 15:34:27

Answer 2

A:

import itertools
import glob

lis = []
with open('output.txt', 'w') as outfile:
    for file in glob.iglob('*.ext'):
        content = open(file).read()
        if content.partition('\n')[2].count('/') > 1:
            lis.append(content[1:10])
            next_func(lis, outfile)

The reason you digits to all entries, is because you're not resetting the counter.

SilentGhost 2010-06-04 12:26:31

Could you possibly tell me how I would reset the the counter? This happens all the time to me so I generally run everything through a function to remove duplications. Thanks!

Seafoid 2010-06-04 12:37:18

@seafoid: you need to move `count = 0` after `for in li:` line, but you're better off using my code, it's more efficient and there's no need for all those nested loops.

SilentGhost 2010-06-04 12:41:14

@SilentGhost - Thanks! Can your code be modified to exclude counting '/' if present in the first string within each sublist?

Seafoid 2010-06-04 12:52:41

@seafoid: sure, see my edit

SilentGhost 2010-06-04 13:34:53

why the downvote?

SilentGhost 2010-06-04 13:51:45

It didn't come from me. Thanks for your help!

Seafoid 2010-06-04 13:55:41

oops, it wasn't a downvote. someone just took back his upvote.

SilentGhost 2010-06-04 14:00:37

Answer 3

+5 A:

Tamás has suggested a good solution, although it uses a very different style of coding than you do. Still, since your question was "I am having some trouble with a piece of code below", I think something more is called for.

How to avoid these problems in the future

You've made several mistakes in your approach to getting from "I think I know how to write this code" to having actual working code.

You are using meaningless names for your variables which makes it nearly impossible to understand your code, including for yourself. The thought "but I know what each variable means" is obviously wrong, otherwise you would have managed to solve this yourself. Notice below, where I fix your code, how difficult it is to describe and discuss your code.

You are trying to solve the whole problem at once instead of breaking it down into pieces. Write small functions or pieces of code that do just one thing, one piece at a time. For each piece you work on, get it right and test it to make sure it is right. Then go on writing other pieces which perhaps use pieces you've already got. I'm saying "pieces" but usually this means functions, methods or classes.

Fixing your code

That is what you asked for and nobody else has done so.

You need to move the count = 0 line to after the for i in li: line (indented appropriately). This will reset the counter for every sub-list. Second, once you have appended to lis and run your next_func, you need to break out of the for k in j[1:] loop and the encompassing for j in i: loop.

Here's a working code example (without the next_func but you can add that next to the append):

>>> li = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'],   ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
>>> lis = []
>>> for i in li:
        count = 0
        for j in i:
            break_out = False
            for k in j[1:]:
                if k == '/':
                    count += 1
                if count > 1:
                    lis.append(i[0][1:10])
                    break_out = True
                    break
            if break_out:
                break

>>> lis
['012345678']

Re-writing you code to make it readable

This is so you see what I meant in the beginning of my answer.

>>> def count_slashes(gene):
    "count the number of '/' character in the DNA sequences of the gene."
    count = 0
    dna_sequences = gene[1:]
    for sequence in dna_sequences:
        count += sequence.count('/')
    return count
>>> def get_gene_name(gene):
    "get the name of the gene"
    gene_title_line = gene[0]
    gene_name = gene_title_line[1:10]
    return gene_name
>>> genes = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'],   ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
>>> results = []
>>> for gene in genes:
        if count_slashes(gene) > 1:
            results.append(get_gene_name(gene))

>>> results
['012345678']
>>>

taleinat 2010-06-04 13:16:48

`sum(seq.count('/') for seq in gene[1:])` would do the job just fine.

SilentGhost 2010-06-04 13:40:29

Great answer - I would have voted it up more than once if I could.

Tamás 2010-06-04 14:56:45

ansaurus

tags:

views:

answers:

Python - making counters, making loops?

related questions