views:

798

answers:

11

To take an example, consider a set of discounts available to a supermarket shopper.

We could define these rules as data in some standard fashion (lists of qualifying items, applicable dates, coupon codes) and write generic code to handle these. Or, we could write each as a chunk of code, which checks for the appropriate things given the customer's shopping list and returns any applicable discounts.

You could reasonably store the rules as objects, serialised into Blobs or stored in code files, so that each rule could choose its own division between data and code, to allow for future rules that wouldn't fit the type of generic processor considered above.

It's often easy to criticise code that mixes data in, via if statements that check for 6 different things that should be in a file or a database, but is there a rule that helps in the edge cases?

Or is this the point of Object Oriented design, to stop us worrying about the line between data and code?

To clarify, the underlying question is this: How would you code the above example? Is there a rule of thumb that made you decide what is data and what is code?

(Note: I know, code can be compiled, but in a world of dynamic languages and JIT compilation, even that is a blurry concept.)

+2  A: 

It all depends on the requirement. If the data is like lookup data and changes frequently you dont really want to do it in code, but things like Day of the Week, should not chnage for the next 200 years or so, so code that.

You might consider changing your topic, as the first thing I thought of when I saw it, was the age old LISP discussion of code vs data. Lucky in Scheme code and data looks the same, but thats about it, you can never accidentally mix code with data as is very possible in LISP with unhygienic macros.

leppie
A: 

Code is any data which can be executed. Now since all data is used as input to some program at some point of time, it can be said that this data is executed by a program! Thus your program acts as a virtual machine for your data. Hence in theory there is no difference between data and code!

In the end what matters is software engineering/development considerations like performance, efficiency etc. For example data driven programs may not be as efficient as programs which have hard coded (and hence fragile) conditional statements. Hence I choose to define code as any data which can be efficiently executed and all else being plain data.

It's a tradeoff between flexibility and efficiency. Executable data (like XML rules) offers more flexibility (sometimes) while the same data/rules when coded as part of the application will run more efficiently but changing it frequently becomes cumbersome. In other words executable data is easy to deploy but is inefficient and vice-versa. So ultimately the decision rests with you - the software designer.

Please correct me if I wrong.

SDX2000
well it all boils down to 1's and 0's anyways
Robert Gould
Surely if your data is 'executed' then the actual data forms a programming language of some kind and becomes stored code rather than data?
Lazarus
Yes one more way of looking at it. Compare this with the equivalence between energy and mass.
SDX2000
"Hence I choose to define code as any data which can be efficiently executed and all else being plain data" - What determines whether it can be "efficiently executed"?
Rik
Ans: What form it (the data) has and often where its executed. Eg. XML config data is executed by program (a VM) while C++ code describing the same directly on the microprocessor.
SDX2000
+1  A: 

In Lisp, your code is data, and your data is code

In Prolog clauses are terms, and terms are clauses.

Anonymous
A: 

The line between data and code (program) is blurry. It's ultimately just a question of terminology - for example, you could say that data is everything that is not code. But, as you wrote, they can be happily mixed together (although usually it's better to keep them separate).

Joonas Pulakka
+2  A: 

Data are information that are processed by instructions called Code. I'm not sure I feel there's a blurring in OOD, there are still properties (Data) and methods (Code). The OO theory encapsulates both into a gestalt entity called a Class but they are still discrete within the Class.

How flexible you want to make your code in a matter of choice. Including constant values (what you are doing by using if statements as described above) is inflexible without re-processing your source, whereas using dynamically sourced data is more flexible. Is either approach wrong? I would say it really depends on the circumstances. As Leppie said, there are certain 'data' points that are invariate, like the days of the week that can be hard coded but even there it may be advantageous to do it dynamically in certain circumstances.

Lazarus
But in using an object, you can't tell what is stored as data and what is generated by code. Hence parameters in .Net: obj.Length could be generated (code) or just stored (data). The blurring lies in the fact that you do not care when you use it.
Phil H
+1  A: 

This is a rather philosophical question (which I like) so I'll answer it in a philosophical way: with nothing much to back it up. ;)

Data is the part of a system that can change. Code defines behavior; the way in which data can change into new data.

To put it more accurately: Data can be described by two components: a description of what the datum is supposed to represent (for instance, a variable with a name and a type) and a value. The value of the variable can change according to rules defined in code. The description does not change, of course, because if it does, we have a whole new piece of information. The code itself does not change, unless requirements (what we expect of the system) change.

To a compiler (or a VM), code is actually the data on which it performs its operations. However, the code to-be-compiled code does not specify behavior for the compiler, the compilers own code does that.

Rik
+1 for data is information and code is behavior (actually had answered the same at pretty much the same time) - don't agree that much on calling data static though / real time data is highly "dynamic", although usually handled as measurements it makes it harder to describe the difference
eglasius
What do you mean by code is "behaviour"? Whose behaviour? The computers? If yes then its data for the microprocessor. Think about this.
SDX2000
Of course, data can change. A better term might be that data _values_ can be considered "immutable". So, the values are static, and the data can change -or more accurately, change it's value- according to the rules defined by the code.
Rik
I don't think that distinction holds. Consider a declarative programming language: Is it code or data?
troelskn
@troelskn check my answer, I actually didn't go much into the definition but into the "it's not about" arguments ... I don't think it is that hard to reason what is information vs. behavior, but getting into where they go is what adds confusion
eglasius
Part of my question is that you can change even the code by acquiring code dynamically - you could write code that accepted a new class and added an instance of it to an array of rule objects. Then your code has changed, but it's still code, isn't it?
Phil H
@Phil Agreed, the fact that you loaded it up dynamically doesn't make it be something else. Notice that you could perfectly be loading behavior+data when getting that instance of rule into the system, or you could have one that is just behavior and will receive all data from the system at runtime.
eglasius
Notice that we are talking about defining what is code vs. data, not where/how you are supposed to have it/integrate it/etc.
eglasius
+1  A: 

The important note is that you want to separate out the part of your code that will execute the same every time, (i.e. applying a discount) from the part of your code which could change (i.e. the products to be discounted, or the % of the discount, etc.)

This is simply for safety. If a discount changes, you won't have to re-write your discount code, you'll only need to go into your discounts repository (DB, or app file, or xml file, or however you choose to implement it) and make a small change to a number.

Also, if the discount code is separated into an XML file, then you can give the entire application to a manager, and with sufficient instructions, they won't need to pester you whenever they want to change the discount rates.

When you mix in data and code, you are exponentially increasing the odds of breaking when anything changes. So, as leppie said, you need to extract the constantly changing parts, and put them in a separate place.

A: 

Data is information. It's not about where you decide to put it, be it a db, config file, config through code or inside the classes.

The same happens for behaviors / code. It's not about where you decide to put it or how you choose to represent it.

eglasius
Then how would you define data's code subclass?
Phil H
not sure I got your question right, but you can have hierarchical data just like you can have hierarchical code, that's organizing them, doesn't change which they are
eglasius
A: 

I would say that the distinction between data, code and configuration is something to be made within the context of a particular component. Sometimes it's obvious, sometimes less so.

For example, to a compiler, the source code it consumes and the object code it creates are both data - and should be separated from the compiler's own code.

In your case you seem to be describing the option of a particularly powerful configuration file, which can contain code. Much as, for example, the GIMP lets you 'configure' plugins using Scheme. As the developer of the component that reads this configuration, you would think of it as data. When working at a different level -- writing the configuration -- you would think of it as code.

This is a very powerful way of designing.

Applying this to the underlying question ("How would you code the above example?"), one option might be to adopt or design a high level Domain Specific Language (DSL) for specifying rules. At startup, or when first required, the server reads the rule and executes it.

Provide an admin interface allowing the administrator to

  • test a new rule file
  • replace the current configuration with that from a new rule file

... all of which would happen at runtime.

A DSL might be something as simple as a table parser or an XML parser, or it could be something as sophisticated as a scripting language. From C, it's easy to embed Python or Lua. From Java it's easy to embed Groovy or Clojure.

You could switch in compiled code at runtime, with clever linking or classloader tricks. This seems more difficult and less valuable than the embedded DSL option, in my opinion.

slim
But aren't most configuration files read on start-up? Surely a dynamic system that acquired these discount rules on-the-fly would be necessary for a 24/7 supermarket chain. In which case, do you pick the flexibility of adding a new object (code), or the limits of data?
Phil H
It's up to the designer when a configuration file gets read - at startup; once lazily; always; when triggered by an admin command. I was thinking in terms of an embedded scripting language, but you could also (re)link object code at runtime.
slim
Clarified the answer.
slim
+4  A: 

Fundamentally, there is of course no difference between data and code, but for real software infrastructures, there can be a big difference. Apart from obvious things like, as you mentioned, compilation, the biggest issue is this:

Most sufficiently large projects are designed to produce "releases" that are one big bundle, produced in 3-month (or longer) cycles, tested extensively and cannot be changed afterwards except in tightly controlled ways. "Code" most definitely cannot be changed, so anything that does need to be changed has to be factored out and made "configuration data" so that changing it becomes palatable those whose job it is to ensure that a release works.

Of course, in most cases bad configuration data can break a release just as thoroughly as bad code, so the whole thing is largely an illusion - in reality it doesn't matter whether it's code or "configuration data" that changes, what matters is that the interface between the main system and the parts that change is narrow and well-defined enough to give you a good chance that the person who does the change understands all consequences of what he's doing.

This is already harder than most people think when it's really just a few strings and numbers that are configured (I've personally witnessed a production mainframe system crash because it had one boolean value set differently than another system it was talking to). When your "configuration data" contains complex logic, it's almost impossible to achieve. But the situation isn't going to be any better ust because you use a badly-designed ad hoc "rules configuration" language instead of "real" code.

Michael Borgwardt
A: 

Relationship between code and data is as follows:

code after compiled to a program processes the data while execution

program can extract data, transform data, load data, generate data ...

Also program can extract code, transform code, load code, generate code tooooooo...

Hence code without compiled or interperator is useless, data is always worth..., but code after compiled can do all the above activities....

For eg)

Sourcecontrolsystem process Sourcecodes

here source code itself is a code

Backupscripts process files

here files is a data and so on...

lakshmanaraj