ansaurus

Question

Count Parenthesis in a file with a Python program?

Answer 1

A:

Do you want to ensure that they parens & braces are matched? That "[(])" fails? If not, you're along the right path, except that you need to change your "=" to "+=". You're discarding the values of previous lines.

Pestilence 2009-12-10 01:50:12

I think the big problem is actually comparing the line to a single character to decide whether to count it up or not.

Anon. 2009-12-10 01:52:06

yes that: "[(])" doesnt fail!where is the: "=" that you mean?yes...i wanna count the parentheses!if the right ones=left ones then its right if not it gives a message "invalid syntax"any ideas? thnxxxx

FILIaS 2009-12-10 01:57:17

In what world is `[(])` syntactically valid?

Anon. 2009-12-10 02:00:12

Pestilence 2009-12-10 02:00:50

Pestilence 2009-12-10 02:03:59

Answer 2

A:

I think if you change the:

            if line=='(':
                 c1=line.count('(')
            elif line==')':
                 c2=line.count(')')
            elif line=='[':
                 c3=line.count('[')
            elif line==']':
                 c4=line.count(']')
            elif line=='':
                break

to something like:

SearchFor = ['(', ')', '[', ']']
d = {}
for itm in SearchFor:
    d[itm] = line.count(itm)


# Then do the comparison
if d['['] == d[']'] and  d['('] == d[')']:
     print "Valid Syntax"
else:
     print "Invalid Syntax" #You could look at each to find the exact cause.

and the While True: as mentioned by others. I had missed that. :0)

Frank V 2009-12-10 01:55:18

i dont wanna count generally the '(', ')', '[', ']'but actually compare them...thats why i have the next loop body

FILIaS 2009-12-10 02:03:26

I'm sorry, I had completely mistyped the last part of the code. I've corrected it. This should better represent what you are trying to accomplish. The `d` variable should end up being something like `{'[':5, ']':5, '(':4, ')':4}` if both match.

Frank V 2009-12-10 14:48:06

Answer 3

+1 A:

Remove the 'while True' line and this bit:

elif line=='':
    break

and then replace this:

for line in filename:
    line=f.readline()

with this:

for line in f:

Now you'll be looping over the lines in the file.

Next, replace all of these sorts of things:

if line=='(':
    c1=line.count('(')

with:

    c1+=line.count('(')

The if and elif lines are just preventing you from counting when you should. If the line doesn't have what you're looking for the count will be 0, which is fine.

That should at least get you closer to a solution.

Laurence Gonsalves 2009-12-10 01:55:31

Answer 4

A:

My solution will try to help you understand a bit more precise way of doing this, and hopefully you'll learn a bit about data structures in the process.

To properly do this, you're going to want to use a stack. You'll want to pull out all instances of (, ), [, and ] (perhaps using a regular expression... hint) and go through the array that that generates:

Say your file is like this:

(this [is] foobar)

Your regular expression will yield this array:

['(', '[', ']', ')']

You will pop(0) off of this array into a stack.

Algorithmically:

1) Put all tags {(,),[,]} in an array.

2) For each element in the array, pop(0) from it and push it onto your stack. Test it against the element before it. If it closes the element before it, then pop() twice from the array (eg, if you have '(' on the stack, and the next element to be pushed onto the stack is a ')', ')' closes '(', so you pop them both.) If it doesn't, continue.

3) If your array is empty and your stack is empty when this is over, then your file is well formed. If it's not, then you have a poorly formed file {something like (foo[bar)] }.

Bonus: regular expression: REGEX = re.compile(r"\)\(\[\]"), REGEX.findall(your string to search). See more about regexes in Python here.

Isaac Hodes 2009-12-10 02:00:20

I see your stack based parser and raise you an LL* based parser.

Pestilence 2009-12-10 02:07:09

Fold! I'm not sure we're ready for that yet.

Isaac Hodes 2009-12-10 02:21:42

-1 ... pop(0) is a nonsense. Secondly, there is no point pushing the input onto the stack under conditions where it is immediately going to be pulled off again.

John Machin 2009-12-10 03:59:48

Actually, pop(0) pops the first item of the array. Look in the Python docs. Secondly, the way I described is a standard way to parse tag-based documents like XML. It's not only a decent way to do it, but it's the algorithmically standard way described in at least a few algorithm texts. Sure, it can be made more efficient, but that's not the point here. Thanks for the -1...

Isaac Hodes 2009-12-10 04:14:03

Of course pop(0) pops the first (0) item of the list. The point is that it is totally unnecessary to remove any item from the input list. All you need to do is something like `for input_item in input_list:`. Parsing tag-based documents is irrelevant. All the parsing algorithms I have seen that use a stack base action on some function of the input element and the current top element of the parse stack (if any) ... if none, it's easier to test for, which your pseudocode doesn't.

John Machin 2009-12-10 04:29:51

So a downvote for a correctly functioning algorithm that doesn't work the way you'd like it?

Isaac Hodes 2009-12-10 04:49:25

It doesn't work correctly; you don't test for an empty stack. However the downvote was for the sheer ludicrousness of telling a n00b to do pop(0) from an "array". It's not a matter of leaving it to later optimisation; nobody but nobody should write pop(0) first up in this scenario. Compare your answer with Thillakan's.

John Machin 2009-12-10 05:09:11

Just tested. Algorithm works. We can debate on how bad it is to show a newbie a non-trivial data structure (Python's wonky array), but the algorithm does work. Can you give me an input for which it fails? I will improve my question with your aid, if you're willing to help rather than simply criticize.

Isaac Hodes 2009-12-10 06:54:39

None of your "we can debate" sentence makes much sense. If you are talking about array as in the array module, you shouldn't be. If you mean a list, call it that, and explain "wonky". In any case, the point was that you introduced pop(0) which is utter overkill; all that you need demand from your input is that you can iterate over it in original order.

John Machin 2009-12-10 11:35:29

About your algorithm: I've already told you: "you don't test for an empty stack". Help: Read Thillakan's answer, it works properly. However, here's an example: **ANY INPUT**. You say to push the incoming element onto the parse stack. Then "Test it against the element before it". You DON'T specify what to do if the stack is empty. First time up, there is no element before the just-pushed input item. You try to access parse_stack[-2] and crash with an IndexError. If "Algorithm works", either your implementation doesn't follow your answer, or you have a strange definition of "works".

John Machin 2009-12-10 11:47:09

"If it closes the element before it, then pop() twice from the array (eg, if you have '(' on the stack, and the next element to be pushed onto the stack is a ')', ')' closes '(', so you pop them both.) If it doesn't, continue." If there is no element before it, it doesn't pop the element before it, right? But I appreciate your attempt at providing a substantive argument this time: this is the sort of constructive criticism that could have gotten us somewhere earlier.

Isaac Hodes 2009-12-10 17:08:49

Answer 5

+1 A:

I believe, You are looking for balanced symbol checker. It is better to use stack.

Make an empty stack,
For each symbol in the string:

2.1 If the symbol is an opening symbol, push it on the stack.

2.2 If it is a closing symbol, then

2.2.1 If the stack is empty, then it in false.

2.2.2 If the top of the stack does not match the closing symbol, return false. [check this step if you are checking for matched braces]

2.2.3 Pop the stack.
Return true if the stack is empty, otherwise false.

hth.

Thillakan 2009-12-10 02:08:06

Answer 6

A:

Ok, why should you be doing this on your own?? you should get a syntax checker. I am sure you can search for one in PyPi.

2009-12-10 02:22:00

for what u saw...i tried!:\

FILIaS 2009-12-10 02:26:09

Answer 7

A:

All of these answers are wrong and will not work in all cases, so either use python parser e.g. tokenize etc or just use this

count = min(text.count("("), text.count(")"))

2009-12-10 06:38:29

Answer 8

A:

Please excuse the length of this reply.

If I understand you correctly, you want to do simple syntax checking of parentheses to make sure they are balanced correctly. In your question you specify a test based on simple counting, but as others have pointed out, this does not catch things like "([)]".

I'd also like to offer some constructive criticism on other aspects of your code.

To start with, it is better to get the filename from the command line, and not to prompt for it. This is so that you can easily run the program repeatedly, when developing it, without having to type in the filename all the time. This is your way:

$ python foo.py
Give a file name:data
[some output]
$ python foo.py
Give a file name:data
[some output]
$ python foo.py
Give a file name:data
[some output]

You need to type in the filename every time. You don't need to type in the command to run the program more than once. After the first time, you can use the arrow key to get it from the shell's command history. If you get the filename from the command line, you can do this instead:

$ python foo.py testfile
[some output]
$ python foo.py testfile
[some output]
$ python foo.py testfile
[some output]

This way, when you test the second time, you don't need to type more than the up arrow and Enter keys. This is a small convenience, but it is important: when you're developing software, even small things can start annoying. It's like a large grain of sand under your foot when going on a long walk: you won't even notice it for the first couple of kilometers, but after a few more, you're bleeding.

In Python, to access the command line arguments, you need the sys.argv list. The relevant changes to your program:

import sys
filename = sys.argv[1]

If you do want to prompt, you should use something else than the built-in input function. It interprets whatever the user types as a Python expression, and that causes all sorts of problems. You could read using sys.stdin.readline.

Anyway, we've now got the name of the file safely stored in the filename variable. It's time to do something with it. Your parentheses function does pretty much everything, and experience has shown that that's often not the best way of doing things. Every function should, instead, do just one thing, but do that well.

I suggest that you should keep the parts of actually opening and closing a file separate from the actual counting. This will simplify the logic of the counting, since it does not need to worry about the rest. In code:

import sys

def check_parentheses(f):
    pass # we'll come to this later

def main():
    filename = sys.argv[1]
    try:
        f = file(filename)
    except IOError:
        sys.stderr.write('Error: Cannot open file %s' % filename)
        sys.exit(1)
    check_parentheses(f)
    f.close()

main()

I changed a couple of other things, too, in addition to rearranging things. First, I write the error message to the standard error output. This is the proper way to do it, and means fewer surprises to shell users to redirect the output. (If that doesn't make any sense to you, don't worry about it, just accept it as a given for now.)

Second, if there's an error, I exit the program with sys.exit(1). This tells whoever started the program that it failed. In Unix shell, this lets you do things like the following:

if python foo.py inputfile
then
    echo "inputfile is OK!"
else
    echo "inputfile is BAD!"
fi

The shell script might do something more interesting than just reporting success or failure, of course. It might, for example, remove all broken files, or e-mail whoever wrote them to ask them to fix them. The beauty is that you, who write the checker program, do not need to care. You just set the program exit code properly, and let whoever writes the shell script to worry about the rest.

The next step is to actually read the contents of the file. This can be done in various ways. The easiest way is to do it line-by-line, like this:

for line in f:
    # do something with the line

We then need to look at each character in the line:

for line in f:
    for c in line:
        # do something with the character

We're now ready to actually start checking parentheses. As suggested by others, a stack is the appropriate data structure for this. A stack is basically a list (or array) where you add items to one end, and take them out in reverse order. Think of it as a stack of coins: you can add a coin to the top, and you can remove the topmost coin, but you can't remove one from the middle or bottom.

(Well, you can, and it's a neat trick if you do, but computers are simple beasts and get upset by magic tricks.)

We will use a Python list as a stack. To add an item, we use the list's append method, and to remove we use the pop method. An example:

stack = list()
stack.append('(')
stack.append('[')
stack.pop() # this will return '['
stack.pop() # this will return '('

To look at the topmost item in the stack, we use stack[-1] (in other words, the last item in the list).

We use the stack as follows: when we find an opening parentheses ('('), bracket ('['), or brace ('{'), we put it on the stack. When we find a closing one, we check the topmost item on the stack, and make sure that it matches the closing one. If not, we print an error. Like this:

def check_parentheses(f):
    stack = list()
    for line in f:
        for c in line:
            if c == '(' or c == '[' or c == '{':
                stack.append(c)
            elif c == ')':
                if stack[-1] != '(':
                    print 'Error: unmatched )'
                else:
                    stack.pop()
            elif c == ']':
                if stack[-1] != '[':
                    print 'Error: unmatched ]'
                else:
                    stack.pop()
            elif c == '}':
                if stack[-1] != '{':
                    print 'Error: unmatched }'
                else:
                    stack.pop()

This now does find unmatched parentheses of various kinds. We can improve it a little bit by reporting the line and column where we find the problem. We need a line number and column number counter.

def error(c, line_number, column_number):
    print 'Error: unmatched', c, 'line', line_number, 'column', column_number

def check_parentheses(f):
    stack = list()
    line_number = 0
    for line in f:
        line_number = line_number + 1
        column_number = 0
        for c in line:
            column_number = column_number + 1
            if c == '(' or c == '[' or c == '{':
                stack.append(c)
            elif c == ')':
                if stack[-1] != '(':
                    error(')', line_number, column_number)
                else:
                    stack.pop()
            elif c == ']':
                if stack[-1] != '[':
                    error(']', line_number, column_number)
                else:
                    stack.pop()
            elif c == '}':
                if stack[-1] != '{':
                    error('}', line_number, column_number)
                else:
                    stack.pop()

Note also how I added a helper function, error, to do the actual printing of the error message. If you want to change the error message, you now only need to do it in one place.

Another thing to notice is that the cases for handling the closing symbols are all very similar. We could make that to a function, too.

def check(stack, wanted, c, line_number, column_number):
    if stack[-1] != wanted:
        error(c, line_number, column_number)
    else:
        stack.pop()

def check_parentheses(f):
    stack = list()
    line_number = 0
    for line in f:
        line_number = line_number + 1
        column_number = 0
        for c in line:
            column_number = column_number + 1
            if c == '(' or c == '[' or c == '{':
                stack.append(c)
            elif c == ')':
                check(stack, '(', ')', line_number, column_number)
            elif c == ']':
                check(stack, '[', ']', line_number, column_number)
            elif c == '}':
                check(stack, '{', '}', line_number, column_number)

The program can be refined further, but this should suffice for now. I'll include the whole code at the end.

Note that this program only cares about parentheses of various kinds. If you really want to check a whole Python program for syntactic correctness, you'll need to parse all of Python's syntax, and that is pretty complicated, and too much for one Stack Overflow answer. If that's what you really do want, please ask a followup question.

The whole program:

import sys

def error(c, line_number, column_number):
    print 'Error: unmatched', c, 'line', line_number, 'column', column_number

def check(stack, wanted, c, line_number, column_number):
    if stack[-1] != wanted:
        error(c, line_number, column_number)
    else:
        stack.pop()

def check_parentheses(f):
    stack = list()
    line_number = 0
    for line in f:
        line_number = line_number + 1
        column_number = 0
        for c in line:
            column_number = column_number + 1
            if c == '(' or c == '[' or c == '{':
                stack.append(c)
            elif c == ')':
                check(stack, '(', ')', line_number, column_number)
            elif c == ']':
                check(stack, '[', ']', line_number, column_number)
            elif c == '}':
                check(stack, '{', '}', line_number, column_number)

def main():
    filename = sys.argv[1]
    try:
        f = file(filename)
    except IOError:
        sys.stderr.write('Error: Cannot open file %s' % filename)
        sys.exit(1)
    check_parentheses(f)
    f.close()

main()

Lars Wirzenius 2009-12-10 07:48:05

ansaurus

tags:

views:

answers:

Count Parenthesis in a file with a Python program?

related questions