views:

85

answers:

2

For those who want to skip the text wall, the question is at the bottom.

Background

I'm a self-taught programmer, and I've found that I learn best using a top down approach. For example, I started programming by setting up a PHP application, and then when I wanted to add some functionality, I started figuring out how it worked. The big epiphany came when I found I could print out data while the application was running and see what was happening to it. Suddenly web pages were no longer magic. I learned object oriented principles the same way, by playing with other's software and watching how they used objects to organize and manipulate their data. To really learn MVC, I took apart an MVC framework I liked, and rewrote it into my own.

Even now that I'm exploring more and more of Python and Ruby, I find I understand new techniques much better with a syntax guide and playing around with code than I do by reading a traditional teaching book. However, I've been feeling lately that I have a big conceptual blank when it comes to what the scripting language processor is actually doing (exacerbated by my lack of a traditional computer science background). So that leads me to my...

Dilemma

I just can't seem to find a much info available on lower level concepts geared towards someone who is comfortable with advanced concepts in scripting languages. Particularly for people who learn the way I do. I am aware that there are a lot of resources for those who want to learn from the bottom up, but that's not for me.

Why is this a problem? It doesn't really get in the way of day to day work, but I hate when I do things that feel like voodoo. If I can't write it myself, I don't really understand it. I've been using opcode caching with PHP deployments since forever. I vaguely understand what it does, and can even give a concise explanation, but if I had to write one myself, I'd barely know where to begin. Plus, every time I've taken a step lower into the libraries and tools I'm using, I always come out a better coder on the higher level.

I learn best by working on something that is applicable to what I'm already doing. What I'd like to be able to do is watch the data. Once I can really grok what happens as a request is received by a server, passed to my application, interpreted, and passed back, I'll be a happy feller. Well, until a few months later when I become disturbed that I don't get how the data in my stack translates itself into the CPU registry, or whatever.

Question

What tools/setup/process would you recommend for someone who is comfortable with very high level concepts to drill down into lower level concepts by observing what the program running your program is doing? I'm most interested in either general techniques, or Python, Ruby, and PHP.

Also, given the current approachability of modern scripting languages by non compsci/mathematical types, techniques for slowly transitioning from that background to a more fundamental understanding of computers would be relevant.

+2  A: 

If you'd like to analyze bytecode and see what the virtual machine is doing on a relatively low level, these might help:

They're fairly straightforward to use, the documentation for both has examples. This is how I first learned how bytecode works (in Python). Or you can see how virtual machines work in the first place (the article uses Java, but it applies to almost any language).

For a more advanced topic, you can take a look at how JIT (just-in-time) compilers work, such as the ones in Java, .NET, etc., in this blog post. It's a bit harder to understand, but it's really interesting.

musicfreak
Thanks, sometimes half the battle is knowing the right terms. Looks like "bytecode" is one of them.
Dylan
Bytecode is just the instructions that the virtual machine (the program that actually runs the code) uses to execute commands. All virtual machines have a built-in compiler that takes your source code and compiles it down to bytecode, which is then executed (or in some cases turned into native code).
musicfreak
And when I drop it into a search engine, I find such wonderful things! Looks like _why is working on a [ruby to python bytecode translator](http://hackety.org/2008/05/05/sneakingRubyThroughGoogleAppEngine.html). Could provide good insights into both languages.
Dylan
The JVM doesn't have a built-in compiler for source code, you have to use javac. The JIT compiler simply watches your execution stream and identifies code you're using a lot and re-compiles it from bytecode into native code. It's not really a "compiler" in the traditional sense, more of a translater.
TMN
@TMN: I was referring to high-level languages like Python and Ruby, but yeah, thanks for clarifying.
musicfreak
@musicfreak Ruby 1.8 is one example where bytecode is not used. Instead, the syntax tree is directly evaluated. Ruby 1.9 moved to bytecode VM though.
abababa22
+1  A: 

Start looking into writing your own interpreter or compiler, I'd go for an interpreter first. Interpreters don't have to be particularly long but give an effective insight into how a language does what it does.

If you start using lower and lower level languages to implement the interpreter or compiler then you'll gain an appreciation as to how lower aspects of the code works, for instance memory management or variable scoping.

There are plenty of great books available in these areas (authors like Appel), several detailing how to use various languages to create interpreters or compilers. I'd recommend looking at an online booklist for a university programming language design and implementation course.

Chris
Thanks for your answer. I've considered this in the past, and even read bits of SICP and AoP. It seems like a fantastic way to gain knowledge in this area. However, it's a world away from the work I currently do (web applications), and I would consider it a "bottom up" approach, whereas I'm looking for a top down approach. Part of the top down view is not reinventing the wheel (like writing an interpreter or compiler), and instead figuring how the current compiler/interpreter is doing things, and playing with that.
Dylan