views:

263

answers:

4

What happens internally when I press Enter?

My motivation for asking, besides plain curiosity, is to figure out what happens when you

from sympy import *

and enter an expression. How does it go from Enter to calling

__sympifyit_wrapper(a,b)

in sympy.core.decorators? (That's the first place winpdb took me when I tried inspecting an evaluation.) I would guess that there is some built-in eval function that gets called normally, and is overridden when you import sympy?

+5  A: 

I just inspected the code of sympy (at http://github.com/sympy/sympy ) and it looks like __sympifyit_wrapper is a decorator. The reason it will called is because there is some code somewhere that looks like this:

class Foo(object):
    @_sympifyit
    def func(self):
        pass

And __sympifyit_wrapper is a wrapper that's returned by @_sympifyit. If you continued your debugging you may've found the function (in my example named func).

I gather in one of the many modules and packages imported in sympy/__init__.py some built in code is replaced with sympy versions. These sympy versions probably use that decorator.

exec as used by >>> won't have been replaced, the objects that are operated on will have been.

Jerub
+4  A: 

All right after playing around with it some more I think I've got it.. when I first asked the question I didn't know about operator overloading.

So, what's going on in this python session?

>>> from sympy import *
>>> x = Symbol(x)
>>> x + x
2*x

It turns out there's nothing special about how the interpreter evaluates the expression; the important thing is that python translates

x + x

into

x.__add__(x)

and Symbol inherits from the Basic class, which defines __add__(self, other) to return Add(self, other). (These classes are found in sympy.core.symbol, sympy.core.basic, and sympy.core.add if you want to take a look.)

So as Jerub was saying, Symbol.__add__() has a decorator called _sympifyit which basically converts the second argument of a function into a sympy expression before evaluating the function, in the process returning a function called __sympifyit_wrapper which is what I saw before.

Using objects to define operations is a pretty slick concept; by defining your own operators and string representations you can implement a trivial symbolic algebra system quite easily:

symbolic.py --

class Symbol(object):
    def __init__(self, name):
        self.name = name
    def __add__(self, other):
        return Add(self, other)
    def __repr__(self):
        return self.name

class Add(object):
    def __init__(self, left, right):
        self.left = left
        self.right = right
    def __repr__(self):
        return self.left + '+' + self.right

Now we can do:

>>> from symbolic import *
>>> x = Symbol('x')
>>> x+x
x+x

With a bit of refactoring it can easily be extended to handle all basic arithmetic:

class Basic(object):
    def __add__(self, other):
        return Add(self, other)
    def __radd__(self, other): # if other hasn't implemented __add__() for Symbols
        return Add(other, self)
    def __mul__(self, other):
        return Mul(self, other)
    def __rmul__(self, other):
        return Mul(other, self)
    # ...

class Symbol(Basic):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.name

class Operator(Basic):
    def __init__(self, symbol, left, right):
        self.symbol = symbol
        self.left = left
        self.right = right
    def __repr__(self):
        return '{0}{1}{2}'.format(self.left, self.symbol, self.right)

class Add(Operator):
    def __init__(self, left, right):
        self.left = left
        self.right = right
        Operator.__init__(self, '+', left, right)

class Mul(Operator):
    def __init__(self, left, right):
        self.left = left
        self.right = right
        Operator.__init__(self, '*', left, right)

# ...

With just a bit more tweaking we can get the same behavior as the sympy session from the beginning.. we'll modify Add so it returns a Mul instance if its arguments are equal. This is a bit trickier since we have get to it before instance creation; we have to use __new__() instead of __init__():

class Add(Operator):
    def __new__(cls, left, right):
        if left == right:
            return Mul(2, left)
        return Operator.__new__(cls)
    ...

Don't forget to implement the equality operator for Symbols:

class Symbol(Basic):
    ...
    def __eq__(self, other):
        if type(self) == type(other):
            return repr(self) == repr(other)
        else:
            return False
    ...

And voila. Anyway, you can think of all kinds of other things to implement, like operator precedence, evaluation with substitution, advanced simplification, differentiation, etc., but I think it's pretty cool that the basics are so simple.

Ahh, so your question was how sympy worked, not how the interpreter worked. You will get the bounty regardless, but I was really hoping for some detailed descriptions of how the interpreter actually did its job. I assume this isn't too complicated and probably involves `eval` and the readline library, but I was really curious. If that question of mine isn't answered, I will change the title of the question to be more accurate.
Omnifarious
Sorry about that.. after looking at a few of your posts you clearly know a lot more about python than I do and probably got nothing out of your bounty :/ I mistakenly thought sympy was doing something magical with how the interpreter evaluated expressions, but the 'magic' was just in the operators. I'm pretty new to python and programming in general, and I didn't know operator overloading existed. Why don't you try writing a python interpreter? I didn't understand sympy til I tried writing my own symbolic algebra system.
@secondbanana - That's good advice. :-) I find it really interesting how so much of learning is figuring out which question to ask. There are a lot of questions on StackOverflow where its clear that someone actually had a different question and instead of asking that, they asked some other question because they were pre-supposing a solution. "Why do you want to do that?" is one of the best response/clarification questions.
Omnifarious
+3  A: 

This doesn't have much to do with secondbanana's real question - it's just a shot at Omnifarious' bounty ;)

The interpreter itself is pretty simple. As a matter of fact you could write a simple one (nowhere near perfect, doesn't handle exceptions, etc.) yourself:

print "Wayne's Python Prompt"

def getline(prompt):
    return raw_input(prompt).rstrip()

myinput = ''

while myinput.lower() not in ('exit()', 'q', 'quit'):
    myinput = getline('>>> ')
    if myinput:
        while myinput[-1] in (':', '\\', ','):
            myinput += '\n' + getline('... ')
        exec(myinput)

You can do most of the stuff you're used to in the normal prompt:

Waynes Python Prompt
>>> print 'hi'
hi
>>> def foo():
...     print 3
>>> foo()
3
>>> from dis import dis
>>> dis(foo)
  2           0 LOAD_CONST               1 (3)
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> quit
Hit any key to close this window...

The real magic happens in the lexer/parser.

Lexical Analysis, or lexing is breaking the input into individual tokens. The tokens are keywords or "indivisible" elements. For instance, =, if, try, :, for, pass, and import are all Python tokens. To see how Python tokenizes a program you can use the tokenize module.

Put some code in a file called 'test.py' and run the following in that directory:

from tokenize import tokenize f = open('test.py') tokenize(f.readline)

For print "Hello World!" you get the following:

1,0-1,5: NAME 'print'
1,6-1,19: STRING '"hello world"'
1,19-1,20: NEWLINE '\n'
2,0-2,0: ENDMARKER ''

Once the code is tokenized, it's parsed into an abstract syntax tree. The end result is a python bytecode representation of your program. For print "Hello World!" you can see the result of this process:

from dis import dis
def heyworld():
    print "Hello World!"
dis(heyworld)

Of course all languages lex, parse, compile and then execute their programs. Python lexes, parses, and compiles to bytecode. Then the bytecode is "compiled" (translated might be more accurate) to machine code which is then executed. This is the main difference between interpreted and compiled languages - compiled languages are compiled directly to machine code from the original source, which means you only have to lex/parse before compilation and then you can directly execute the program. This means faster execution times (no lex/parse stage), but it also means that to get to that initial execution time you have to spend a lot more time because the entire program must be compiled.

Wayne Werner
Thanks. :-) You know, this question ought to be split into two. :-)
Omnifarious
You're welcome and thank you. And you're right, perhaps someone with mod powers can do such a thing.
Wayne Werner
+1  A: 

The Python interactive interpreter doesn't do a lot that's any different from any other time Python code is getting run. It does have some magic to catch exceptions and to detect incomplete multi-line statements before executing them so that you can finish typing them, but that's about it.

If you're really curious, the standard code module is a fairly complete implementation of the Python interactive prompt. I think it's not precisely what Python actually uses (that is, I believe, implemented in C), but you can dig into your Python's system library directory and actually look at how it's done. Mine's at /usr/lib/python2.5/code.py

Walter Mundt