ansaurus

Question

Why do we still program with flat files?

Answer 1

+7 A:

A colleague of mine wrote this blog about this subject

http://www.atalasoft.com/cs/blogs/rickm/archive/2008/06/06/why-are-our-programs-still-represented-by-flat-files.aspx

Lou Franco 2008-10-02 02:33:39

The text layout of this blog gave me a headache for some reason.

Martin Cote 2008-10-02 02:45:49

I don't know if it's because you planted the idea in my head, but me too.

Kevin 2008-10-02 20:28:13

its the background color, makes the font really weirdish, iono, its really weird.

radioactive21 2008-10-02 20:30:26

He fixed the font -- looks normal now.

Lou Franco 2008-10-07 00:24:21

Answer 2

+124 A:

you can diff them
you can merge them
anyone can edit them
they are simple and easy to deal with
they are universally accessible to thousands of tools

davetron5000 2008-10-02 02:36:26

No one owns the format either.

Jeff Yates 2008-10-02 04:03:19

"they are universally accessible to thousands of tools" - bingo. Chicken and egg problem, really.

Bernard 2008-10-02 05:35:13

It's all about the source control, baby!

Richard Morgan 2008-10-02 08:06:02

You can just as easily diff tree nodes in memory as you can lines in a text file. Though the biggest advantage would be you could more easily ignore formatting and comments in the diffs. But yeah, chicken and egg problem still.

Daemin 2008-10-02 12:02:43

This doesn't answer the question at all. A tree data-structure could just as easily be serialized just before diffing, merging, or interacting with other tools.

Jonathan Tran 2008-10-04 03:40:25

And they are robust -- how many times have binary file formats been broken due to upgrades or other glitches in programs that parse them? Number of times vi has damaged my plain text LaTeX document: 0. Number of times Word has corrupted the binary document structure: $\infinity$.

Hudson 2009-01-16 16:47:51

Answer 3

+12 A:

Why are essays written in text? Why are legal documents written in text? Why are fantasy novels written in text? Because text is the single best form - for people - of persisting their thoughts.

Text is how people think about, represent, understand, and persist concepts - and their complexities, hierarchies, and interrelationships.

Justice 2008-10-02 02:36:49

that's not true at all. Some things are best described using diagrams. Think of a flow chart: take a flow chart and you can create a textual representation of it, but it'd be much easier to understand as a diagram.

nickf 2008-10-02 02:55:08

I think the question is about flat files and not just about the usage of text.

jop 2008-10-02 03:47:21

nickf, I disagree; pseudo-code is generally a clearer way to show an algorithm than a flowchart. Things like for loops can't be directly represented in a flowchart so when you read it you have to work out which bits are really a loop etc.

Mark Baker 2008-10-02 09:35:06

Most research papers are published in PS or PDF, not ASCII. You're confusing representation with presentation. The diagrams in rich formats are very helpful - not to mention the nice presentation of equations.

Matt Cruikshank 2008-10-02 16:40:09

Answer 4

+21 A:

In my opinion, any possible benefits are outweighed by being tied to a particular tool.

With plain-text source (that seems to be what you're discussing, rather than flat files per se) I can paste chunks into an email, use simple version control systems (very important!), write code into comments on Stack Overflow, use one of a thousand text editors on any number of platforms, etc.

With some binary representation of code, I need to use a specialized editor to view or edit it. Even if a text-based representation can be produced, you can't trivially roll back changes into the canonical version.

Rich 2008-10-02 02:40:49

Answer 5

+12 A:

Smalltalk is an image-based environment. You are no longer working with code in a file on disk. You are working with and modifying the real objects in runtime. It still is text but classes are not stored in human readable files. Instead the whole object memory (the image) is stored on a file in binary format.

But the biggest complaints of those trying out smalltalk is because it doesn't use files. Most of the file-based tools that we have (vim, emacs, eclipse, vs.net, unix tools) will have to be abandoned in favor of smalltalk's own tools. Not that the tools provided in smalltalk in inferior. It is just different.

jop 2008-10-02 02:42:12

Is Smalltalk interpreted realtime?

Jon Limjap 2008-10-02 02:44:46

Squeak smalltalk is interpreted. Other smalltalks are compiled to bytecode

jop 2008-10-02 02:48:06

Are you sure Squeak is interpreted ? Running `(Number>>#asInteger) inspect` opens an inspector on a `CompiledMethod`, where you can see the bytecode.

Sébastien RoccaSerra 2008-10-02 08:26:59

"http://www.squeak.org/Features/" lists "interpreted" as one of the features. But now I'm curious as to what that actually means. Compiled into bytecode but interpreted by the VM? I know VisualWorks also compiled to bytecode but it has a JITter. Not sure about Squeak though.

jop 2008-10-02 13:28:50

Some "interpreted" languages (CPython for one) are actually compiled to a very high-level bytecode: http://docs.python.org/library/dis.html#python-bytecode-instructions

skymt 2008-10-03 22:49:16

Answer 6

+5 A:

Ironically there ARE programming constructs that use precisely what you describe.

For example, SQL Server Integration Services, which involve coding logic flow by dragging components into a visual design surface, are saved as XML files describing precisely that back end.

On the other hand SSIS is pretty difficult to source-control. It is also fairly difficult to design any sort of complex logic into it: if you need a little bit more "control", you'll need to code VB.NET code into the component, which brings us back to where we started.

I guess that, as a coder, you should consider the fact that for every solution to a problem there are consequences that follow. Not everything could (and some argue, should) be represented in UML. Not everything could be visually represented. Not everything could be simplified enough to have a consistent binary file representation.

That being said, I would posit that the disadvantages of relegating code to binary formats (most of which will also tend to be proprietary) far outweight the advantages of having them in plain text.

Jon Limjap 2008-10-02 02:42:33

Fine, so let me drop <img src="UMLDiagram.png"> into my comments, and make an IDE smart enough to show me that image. It's still just text. There's no magic. It's just that the IDE is too stupid to do it right now.

Matt Cruikshank 2008-10-02 16:42:58

Or better yet, let me embed things like Google Chart API objects, all with clever text comments, and have the IDE resolve it and display it.For cripes sake, the IDE should at least understand DOxygen comments, and be as smart and navigable as the DOxygen output!!!

Matt Cruikshank 2008-10-02 16:45:00

Roll your own DOxygen parser, perhaps? What IDEs are we talking about?

Jon Limjap 2008-10-03 00:14:37

Visual Studio for me.

Matt Cruikshank 2008-10-03 18:12:43

Wait for VS10 then write it yourself :)

Simon Buchan 2009-01-30 07:14:46

Answer 7

+3 A:

IMHO, XML and binary formats would be a total mess and wouldn't give any significant benefit.

OTOH, a related idea would be to write into a database, maybe one function per record, or maybe a hierarchical structure. An IDE created around this concept could make navigating source more natural, and easier to hide anything not relevant to the code you're reading at a given moment.

Javier 2008-10-02 02:45:16

Answer 8

A:

The code of your program define the structure that would be created with xml or the binary format. Your programming language is a more direct representation of your program's structure than an XML or Binary representation would be. Have you ever noticed how Word misbehaves on you as you give structure to your document. WordPerfect at least would 'reveal codes' to allow you to see what lay beneath your document. Flat files do the same thing for your program.

minty 2008-10-02 02:50:02

You're missing the point entirely.Picture your source code. Now imagine that you can embed a picture in your comments. That's the only difference I'm talking about. Even as an <img src="UMLDiagram.png"> in the comments.I'm NOT talking about converting { into <scope> or something like that.

Matt Cruikshank 2008-10-02 16:48:53

Answer 9

+4 A:

It's a good question. FWIW, I'd love to see a Wiki-style code management tool. Each functional unit would have its own wiki page. The build tools pull together the source code out of the wiki. There would be a "discuss" page linked to that page, where people can argue about algorithms, APIs and such like.

Heck, it wouldn't be that hard to hack one up from a pre-existing Wiki implementation. Any takers...?

dysfunctor 2008-10-02 02:50:42

See the neat thing is, because code is stored in plain text, it won't be hard to write a wiki system which does this.If the code was XML in some predefined structure, this would be at best much harder, and more likely, not possible at all

Orion Edwards 2008-10-02 02:58:08

slashmais 2008-10-02 11:08:49

Answer 10

+1 A:

You mention that we should use "some form of XML"? What do you think XHTML and XAML are?

Also XML is still just a flat file.

Chris Pietschmann 2008-10-02 02:54:11

Answer 11

A:

Neat idea's. I have myself wondered on a smaller scale ... much smaller, why can't IDE X generate this or that.

I don't know if I am capable as a programmer yet to develop something as cool and complex as your talking about or what I am thinking about, but I would be interested in trying.

Maybe start out with some plugins for .NET, Eclipse, Netbeans, and so on? Show off what can be done, and start a new trend in coding.

J.J. 2008-10-02 03:01:49

Answer 12

+9 A:

Lisp programs are not flat files. They are serialization of data structures. This code-as-data is an old idea, and actually one of the greatest idea in computer science.

sanxiyn 2008-10-02 03:03:26

I guess these ideas hadn't caught up yet. :)

jop 2008-10-02 03:16:43

I'm a day-to-day Lisper, and I have to disagree with you to an extent. Lisp source code contains a lot of information that's not preserved when parsed -- reader macros and comments, for example. You could also say that C source code is a serialization of a C compiler's parse tree.

Rich 2008-10-02 06:10:15

I have to agree with Rich -- all computer languages provide some sort of depth in a parse tree. Lisp's greatness is the clarity of this concept and the ability of the programs to work on their own parse tree.

Svante 2009-01-16 15:59:23

Answer 13

+5 A:

Here's why:

Human readable. That makes a lot easier to spot a mistake, in both the file and the parsing method. Also can be read out loud. That's one that you just cannot get with XML, and might make a difference, specially in customer support.
Insurance against obsolescence. As long as regex exist, it is possible to write a pretty good parser in just a few lines of code.
Leverage. Almost everything there is, from revision control systems to editors to filter, can inspect, merge and operate on flat files. Merging XML can be a mess.
Ability to integrate them rather easily with UNIX tools, such as grep, cut or sed.

Edu Felipe 2008-10-02 03:16:02

There are languages which are not parsable this way (C++).

phjr 2008-10-02 07:02:05

Answer 14

+4 A:

People have tried for a long time to create an editing environment that goes beyond the flat file and everyone has failed to some extent. The closest I've seen was a prototype for Charles Simonyi's Intentional Programming but then that got downgraded to a visual DSL creation tool.

No matter how the code is stored or represented in memory, in the end it has to be presentable and modifiable as text (without the formatting changing on you) since that's the easiest way we know to express most of the abstract concepts that are needed for solving problems by programming.

With flat files you get this for free and any plain old text editor (with the correct character encoding support) will work.

Mark Cidade 2008-10-02 03:19:14

Answer 15

A:

I think another aspect of this is that the code is what is important. It is what is going to be executed. For example, in your UML example, I would think rather than having UML (presumably created in some editor, not directly related to the "code") included in your "source blob" would be almost useless. Much better would be to have the UML generated directly from your code, so it describes the exact state the code is in as a tool for understanding the code, rather than as a reminder of what the code should have been.

We've been doing this for years regarding automated doc tools. While the actual programmer generated comments in the code might get out of sync with the code, tools like JavaDoc and the like faithfully represent the methods on an object, return types, arguments, etc. They represent them as they actually exist,not as some artifact that came out of endless design meetings.

It seems to me that if you could arbitrarily add random artifacts to some "source blob", these would likely be out of date and less than useful right away. If you can generate such artifacts directly from the code, then the small effort to get your build process to do so is vastly better than the previously mentioned pitfalls of moving away from plain text source files.

Related to this, an explanation of why you'd want to use a plain-text UML tool (UMLGraph) seems to apply nearly equally as well to why you want plain-text source files.

ramanman 2008-10-02 03:37:54

Answer 16

+1 A:

Old habits die hard, I guess.

Until recently, there weren't many good-quality, high-performing, widely-available libraries for general storage of structured data. And I would emphatically not put XML in that category even today--too verbose, too intensive to process, too finicky.

Nowadays, my favorite thing to use for data that doesn't need to be human-readableis SQLite and make a database. It's so incredibly easy to embed a full-featured SQL database into any app... there are bindings for C, Perl, Python, PHP, etc... and it's open-source and really fast and reliable and lightweight.

I <3 SQLite.

Dan 2008-10-02 04:01:51

I don't really think the medium programming tools work in is a matter of 'habit'. :)

Bernard 2008-10-02 05:37:04

Answer 17

+6 A:

<?xml version="1.0" encoding="UTF-8"?><code>Flat files are easier to read.</code></xml>

Mark Stock 2008-10-02 04:52:57

Answer 18

+3 A:

Steve McConnell has it right, as always - you write programs for other programmers (including yourself), not for computers.

That said, Microsoft Visual Studio must internally manage the code you write in a very structured format, or you wouldn't be able to do such things as "Find All References" or rename or re-factor variables and methods so readily. I'd be interested if anyone had links to how this works.

David Grigg 2008-10-02 05:29:19

Answer 19

+1 A:

The trend we are seeing about DSL's are the first thing that comes to mind when reading your question. The problem has been that there does not exist a 1-to-1 relationship between models (like UML) and an implementation. Microsoft among others are working on getting there, so that you can create your app as something UML-like, then code can be generated. And the important thing - as you opt to change your code, the model will reflect this again.

Windows Workflow Foundation is a pretty good example. Of cause there are flat files and/or XML in the background, but you usually end up defining your business logic in the orchestration tool. And that is pretty cool!

We need more of the "software factories" thinking, and will see a richer IDE experience in the future, but as long as computers run on zeroes and ones, flat text files can and (probably) will always be an intermediate stage. As stated be several people already, simple text files are very flexible.

Torbjørn 2008-10-02 05:41:50

Answer 20

+3 A:

Actually, roughly 10 years ago, Charles Simonyi's early prototype for intentional programming attempted to move beyond the flat file into a tree representation of code that can be visualized in different ways. Theoretically, a domain expert, a PM, and a software engineer could all see (and piece together) application code in ways that were useful to them, and products could be built on a hierarchy of declarative "intentions", digging down to low-level code only as needed.

ETA (per request in the questions) There's a copy of one of his early papers on this at the Microsoft research web site. Unfortunately, since Simonyi left MS to start a separate company several years ago, I don't think the prototype is still available for download. I saw some demos back when I was at Microsoft, but I'm not sure how widely his early prototype was distributed.

His company, IntentSoft is still a little quiet about what they're planning to deliver to the market, if anything, but some of the early stuff that came out of MSR was pretty interesting.

The storage model was some binary format, but I'm not sure how much of those details were disclosed during the MSR project, and I'm sure some things have changed since the early implementations.

JasonTrue 2008-10-02 06:52:30

Can you point out site(s) with more info on what you describe in your first paragraph. I would very much appreciate it. Thanks.

slashmais 2008-10-02 11:12:28

Answer 21

A:

It's pretty obvious why plain text is king. But it is equally obvious why a structured format would be even better.

Just one example: If you rename a method, your diff/merge/source control tool would be able to tell that only one thing had changed. The tools we use today would show a long list of changes, one for every place and file that the method was called or declared.

(By the way, this post doesn't answer the question as you might have noticed)

Arne Evertsson 2008-10-02 07:19:55

Answer 22

A:

This might not answer exactly your question but here is an editor allows having an higher view of code: http://webpages.charter.net/edreamleo/front.html

2008-10-02 07:46:18

Answer 23

A:

I've wistfully wondered the same thing, as described in the answer to: What tool/application/whatever do you wish existed?

While it's easy to imagine a great number of benefits I think the biggest hurdle that would have to be addressed is that no-one has produced a viable alternative.

When people think of alternatives to storing source as text they seem to often immediately think in terms of graphical representations (I'm referring here to the commercial products that have been available - eg. HP-vee). And if we look at the experience of people like the FPGA designers, we see that programming (exclusively) graphically just doesn't work - hence languages like Verilog and VHDL.

But I don't see that the storage of source necessarily needs to be bound to the method of writing it in the first place. Entry of source can be largely done as text - which means that the issues of copying/pasting can still be achieved. But I also see that by allowing merges and rollbacks to be done on the basis of tokenised meta-source we could achieve more accurate and more powerful manipulation tools.

Andrew Edgecombe 2008-10-02 08:01:52

Answer 24

+1 A:

For a example of a language that does away with traditional text-programming, see the Lava Language.

Another nifty thing I just recently discovered is subtext2 (video demo).

David Schmitt 2008-10-02 09:14:38

Answer 25

A:

I think the reason of why text files are used in development is that they are universal against various development tools. You can look inside or even fix some errors using a simple text editor (you can't do it in a binary file because you never know how any fix would destroy other data). It doesn't mean, however, that text files are best for all those purposes.

Of course, you can diff and merge them. But it doesn't mean that the diff/merge tool understand the distinct structure of the data encoded by this text file. You can do the diff/merge, but (especially seen in XML files) the diff tool won't show you the differences correctly, that is, it will show you where the files differ and which parts of the data the tool "thinks" are the same. But it will not show you the differences in the structure of XML file - it will just match lines that look the same.

Regardless whether we're using binary files or text files, it's always better that the diff/merge tools take care of the data structure this file represents rather than the lines and characters. For C++ or Java files, for example, report that some identifier changed its name, report that some section was surrounded with additional if(){}, but, on the other hand, ignore changes in indents or EOL characters. The best approach would be that a file is read into internal structures and dumped using specific format rules. This way the diff-ing will be made through the internal structures and the merge result will be generated from the merged internal structure.

2008-10-02 10:12:03

Answer 26

+1 A:

Why do text files rule? Because of McIlroy's test. It is vital to have the output of one program be acceptable as the source code for another, and text files are the simplest thing that works.

Michael Dorfman 2008-10-02 11:45:59

You've just successfully argued databases out of existence. Congratulations.

Matt Cruikshank 2008-12-03 16:06:19

Thanks, but credit goes to McIlroy. Naturally, "the simplest thing that works" gets more complicated if one needs atomicity of transactions, two-phase commit, etc., on source code.Do you know of any programming languages that use databases as the primary way to represent source code?

Michael Dorfman 2008-12-04 13:01:24

Answer 27

A:

Flat files rock.

Prog 2008-10-02 11:47:35

Answer 28

+1 A:

Anyone ever tryed Mathematica?

The pic above is from an old version but it was the best google could give me.

Anyway...compare the first equation there to Math.Integrate(1/(Math.Pow("x",3)-1), "x") like you would have to write if you were coding with plain text in most common languages. Imo the mathematical representation is much easier to read, and that is still a pretty small equation.

And yes, you can both input and copy-paste the code as plain text if you want.

See it as the next generation syntax highlighting. I bet there are alot of other stuff than math that could take benifit from this kind of representation.

2008-10-02 14:52:53

Answer 29

A:

Modern programs are composed of flat pieces, but are they flat? There are usings, and includes, and libraries of objects, etc. An ordinary function call is a peek into a different place. The logic isn't flat, due to having multiple threads, etc.

SeaDrive 2008-10-02 20:23:18

Answer 30

+3 A:

Labview and Simulink are two graphical programming environments. They are both popular in their fields (interfacing to hardware from a PC, and modeling control systems, respectively), but not used much outside of those fields. I've worked with people who were big fans of both, but never got into them myself.

KeyserSoze 2008-10-02 23:45:54

Answer 31

A:

I have the same vision! I really wish this would exists.

You might want to take a look at Fortress, a research language by Sun. It has special support for formulas in source code. The quote below is from Wikipedia

Fortress is being designed from the outset to have multiple syntactic stylesheets. Source code can be rendered as ASCII text, in Unicode, or as a prettied image. This will allow for support of mathematical symbols and other symbols in the rendered output for easier reading.

The major reason for the persistence of text as source is the lack for powertools, as eg version control, for non-text date. This is based on my experience working with Smalltalk, where plain byte-code is kept in a core-dump all time. In a non-text system, with today's tools, team development is a nightmare.

Adrian 2008-10-03 00:08:42

Answer 32

+1 A:

Visual FoxPro uses dbf table structures to store code and metadata for forms, reports, class libs, etc. These are binary files. It also stores code in prg files that actual text files...

The only advantage I see is being able to use the built in VFP data language to do code searches on those files... other than that it is a liability imo. At least once every few months, one of these files will become corrupted for no apparent reason. Integration with source control and diffs very painful as well. There are workarounds for this, but involve converting the file to text temporarily!

Brian Vander Plaats 2008-10-03 22:33:05

Answer 33

A:

Who works with flat files?

Eclipse gives you views into your source so that I can see inner classes, methods and data, all sorted and grouped. if I want to edit the inner class I click on it. While technically there is a flat file underlying I almost never navigate it like that.

DJClayworth 2008-10-07 15:36:36

Answer 34

A:

One thing not touched on is that some languages have the concept of a source file builtin with respect to things like variable scoping. Changing to something else (like storing functions in a database) would require you to alter the language itself.

2009-01-16 15:58:55

Answer 35

A:

While having a drink this night with my friends(programmers too), one of them told me that they use UML to generated code. But he said that they still need to manually edit the generated code, there are some problem domains that can't be easily described with UML.

With all the LINQ-goodness, lambda and all, some problem domains cannot be represented by UML, we still need to make our way around the generated code for the computer to do our bidding.

How could we represent in UML, let alone XML, the following problem? http://stackoverflow.com/questions/448203/linq-to-sql-using-group-by-and-countdistinct

The amount of answers to that simple problem is very telling that UML, SQL(the most important assembly language, whatever those ORM guys tell you otherwise), XML are not an XOR proposition. We will still use the combinations of these technology, not using just one of them to the exclusion of others.

Michael Buen 2009-01-16 16:35:36

Answer 36

A:

It's still flat files because maybe that's how they can sell softwares tools :D

Source Code should be itself Object Oriented that is encapsulated as Member. There is only one Product I know that does so, it exists since very long (Windows 3.0) and designed by Paul Allen himself. It was originally inspired by Hypercard on Mac but as Bill Gates told it: http://community.seattletimes.nwsource.com/archive/?date=19900522&slug=1073140

``It's generations beyond HyperCard,'' says Gates.

Unfortunately they didn't target the right people:

In pursuing (interests of) software developers,'' says Alsop, Asymetrixmay have made ToolBook too complex for the little guy.''

They should have targeted Professional Programmers instead of Hobbysts.

Still today on concept level it's still beyond other languages except Rebol of course ;)

Rebol Tutorial 2009-10-08 03:19:57

ansaurus

tags:

views:

answers:

Why do we still program with flat files?

related questions