views:

231

answers:

4

So I'm currently working on a new programming language. Inspired by ideas from concurrent programming and Haskell, one of the primary goals of the language is management of side effects. More or less, each module will be required to specify which side effects it allows. So, if I were making a game, the graphics module would have no ability to do IO. The input module would have no ability to draw to the screen. The AI module would be required to be totally pure. Scripts and plugins for the game would have access to a very restricted subset of IO for reading configuration files. Et cetera.

However, what constitutes a side effect isn't clear cut. I'm looking for any thoughts or suggestions on the subject that I might want to consider in my language. Here are my current thoughts.

Some side effects are blatant. Whether its printing to the user's console or launching your missiles, anything action that reads or write to a user-owned file or interacts with external hardware is a side effect.

Others are more subtle and these are the ones I'm really interested in. These would be things like getting a random number, getting the system time, sleeping a thread, implementing software transactional memory, or even something very fundamental such as allocating memory.

Unlike other languages built to control side effects (looking at you Haskell), I want to design my language to be pragmatic and practical. The restrictions on side effects should serve two purposes:

  • To aid in the separations of concerns. (No one module can do everything).
  • To sandbox each module in the application. (Any module could be used as a plugin)

With that in mind, how should I handle "pseudo"-side effects, like random numbers and sleeping, as I mention above? What else might I have missed? In what ways might I manage memory usage and time as resources?

A: 

Give a serious look to Clojure, and their use of software transactional memory, agents, and atoms to keep side effects under control.

Charlie Martin
I will admit I scoured the Clojure site the day they first announced it on Reddit. I think the language is really darn cool. I wanted to get in on their development, but there's something about lispy languages that is thoroughly unappealing to me. I will be sure to check out what progress they've made since then for ideas!
Tac-Tics
I'm sort of curious what you full isn't pracgmatic and practical about Haskell. It's *different*, sure, bt there is a good bit of real software being written in it. You might also look at Erlang, which is certainly seeing practical use.
Charlie Martin
Or, in English, "what you feel isn't pragmatic and practical".
Charlie Martin
+1  A: 

A side effect is having any effect on anything in the world other than returning a value, i.e. mutating something that could be visible in some way outside the function.

A pure function neither depends on or affects any mutable state outside the scope of that invocation of the function, which means that the function's output depends only on constants and its inputs. This implies that if you call a function twice with the same arguments, you are guaranteed to get the same result both times, regardless of how the function is written.

If you have a function that modifies a variable that it has been passed, that modification is a side effect because it's visible output from the function other than the return value. A void function that is not a no-op must have side effects, because it has no other way of affecting the world.

The function could have a private variable only visible to that function that it reads and modifies, and calling it would still have the side effect of changing the way the function behaves in the future. Being pure means having exactly one channel for output of any kind: the return value.

It is possible to generate random numbers purely, but you have to pass around the random seed manually. Most random functions keep a private seed value that is updated each time its called so that you get a different random each time. Here's a Haskell snippet using System.Random:

randomColor              :: StdGen -> (Color, Int, StdGen)
randomColor gen1         = (color, intensity, gen2)
 where (color, gen2)     = random gen1
       (intensity, gen3) = randomR (1, 100) gen2

The random functions each return the randomized value and a new generator with a new seed (based on the previous one). To get a new value each time, the chain of new generators (gen1,gen2,gen3) have to be passed along. Implicit generators just use an internal variable to store the gen1.. values in the background.

Doing this manually is a pain, and in Haskell you can use a state monad to make it a lot easier. You'll want to implement something less pure or use a facility like monads, arrows or uniqueness values to abstract it away.

Getting the system time is impure because the time could be different each time you ask.

Sleeping is fuzzier because sleep doesn't affect the result of the function, and you could always delay execution with a busy loop, and that wouldn't affect purity. The thing is that sleeping is done for the sake of something else, which IS a side effect.

Memory allocation in pure languages has to happen implicitly, because explicitly allocating and freeing memory are side effects if you can do any kind of pointer comparisons. Otherwise, creating two new objects with the same parameters would still produce different values because they would have different identities (e.g. not be equal by Java's == operator).

I know I've rambled on a bit, but hopefully that explains what side effects are.

Chris Smith
I'm well aware of what a side effect is. My point was that side effects aren't totally well defined.Note that I'm speaking a little more loosely than the Haskellesque definition you gave above. My language will NOT be pure. This means that calling the same function twice is never guaranteed to be indempotent. What is important to me in my design is simply containment of side effects. There is a world of difference between Haskell's every-value-is-immutable philosophy and simply preventing any old Joe launch his missiles.
Tac-Tics
+4  A: 

The problem of how to describe and control effects is currently occupying some of the best scientific minds in programming languages, including people like Greg Morrisett of Harvard University. To my knowledge, the most ambitious pioneering work in this area was done by David Gifford and Pierre Jouvelot in the FX programming language started in 1987. The language definition is online, but you may get more insight into the ideas by reading their 1991 POPL paper.

Norman Ramsey
+2  A: 

This is a really interesting question, and it represents one of the stages I've gone through and, frankly, moved beyond.

I remember seminars in which Carl Hewitt, in talking about his Actors formalism, discussed this. He defined it in terms of a method giving a response that was solely a function of its arguments, or that could give different answers at different times.

I say I moved beyond this because it makes the language itself (or the computational model) the main subject, as opposed to the problem(s) it is supposed to solve. It is based on the idea that the language should have a formal underlying model so that its properties are easy to verify. That is fine, but still remains a distant goal, because there is still no language (to my knowledge) in which the correctness of something as simple as bubble sort is easy to prove, let alone more complex systems.

The above is a fine goal, but the direction I went was to look at information systems in terms of information theory. Specifically, assuming a system starts with a corpus of requirements (on paper or in somebody's head), those requirements can be transmitted to a program-writing machine (whether automatic or human) to generate source code for a working implementation. THEN, as changes occur to the requirements, the changes are processed through as delta changes to the implementation source code.

Then the question is: What properties of the source code (and the language it is encoded in) facilitate this process? Clearly it depends on the type of problem being solved, what kinds of information go in and out (and when), how long the information has to be retained, and what kind of processing needs to be done on it. From this one can determine the formal level of the language needed for that problem.

I realized the process of cranking through delta changes of requirements to source code is made easier as the format of the code comes more to resemble the requirements, and there is a nice quantitative way to measure this resemblence, not in terms of superficial resemblence, but in terms of editing actions. The well-known technology that best expresses this is domain specific languages (DSL). So I came to realize that what I look for most in a general-purpose language is the ability to create special-purpose languages.

Depending on the application, such special-purpose languages may or may not need specific formal features like functional notation, side-effect control, paralellism, etc. In fact, there are many ways to make a special-purpose language, from parsing, interpreting, compiling, down to just macros in an existing language, down to simply defining classes, variables, and methods in an existing language. As soon as you declare a variable or subroutine you're created new vocabulary and thus, a new language in which to solve your problem. In fact, in this broad sense, I don't think you can solve any programming problem without being, at some level, a language designer.

So best of luck, and I hope it opens up new vistas for you.

Mike Dunlavey
Thanks for the reply. I'll see if I can't dig up anything on Carl Hewitt's seminars.I wouldn't say that the language itself is the central focus. I have really no idea what kind of language I'm going to be writing yet. I may end up focusing on writing a VM instead of a language and then come up with a simple demo language to show off its merit. DSLs and language productivity concerns aren't terribly important at this stage (eventually, but not quite yet!), but my threading and memory models are extremely important.
Tac-Tics
Good luck. Here's a start: http://en.wikipedia.org/wiki/Carl_HewittHe's one smart dude.
Mike Dunlavey