views:

75

answers:

5

I'm going to start the development of my own document format(like PDF, XPS, DOC, RTF...), but I want to know where I can read some tutorials, how-to's...? I don't want code, this is a project that I want to learn how to build it, not use the experience of someone other.

PS: I want to make it like a XML file:

[Command Argument="Define it" Argument2="Something"]

It's like PDF, but this syntax will be interpreted by a program that I will build it using C#, just like HTML and your browser ;)

Remember that my question is about the program that will interpret this code, but it's good to start with a tutorial of interpreting XML code ;)

+2  A: 

I'm confused as to what you're asking, but if you need your own format like an XML file, why not just use XML to describe the format?

Edit: Okay, I think I understand now. If you're doing this for fun and for learning (which is great), then there are lots of approaches to take. In fact, it may even be better to not do any research, try to come up with a solution on your own and see if it works, what you need to do to make it better, etc.

Jon Seigel
I'm going to build something like a PDF, but it syntax will be like XML and I will build a program that read that syntax and interpret it, just like HTML and your browser.
Nathan Campos
You don't need to. If your format is valid XML, you can use existing tools to parse it for you. Then all you have to do is interpret the information.
Jon Seigel
The thing is that I want to build a syntax like that, but the thing is the program that will interpret it.
Nathan Campos
+3  A: 

Hi Nathan, I assume you're doing this for the sake of learning how to do it. If that's the case, it is a worthwhile venture and I understand.

You'll want to start out by learning LL parsers and grammars. That will help you interpret the document that has been read from a file into a document object model (DOM). From there you can create routines to manipulate or render that document tree.

Good luck!

uosɐſ
+1 Very good answer!
Nathan Campos
Good answer, with one suggestion. If Nathan is more interested in learning about document trees than stream parsing, he can shortcut the stream parsing with Regular Expressions.
Kennet Belenky
Maybe I do it wrong, but regex seems to be 1) magical, 2) inefficient, and 3) difficult to prove. But seriously, I could just have been doing it wrong, maybe regex is a very clean solution. BUT, I think LL parser is a neat thing to learn.
uosɐſ
The thing is that Regex in my opinion is confusing.
Nathan Campos
If Regexes are confusing, that's all the more reason to learn how to use them :)
Kennet Belenky
The Regexes were just a suggestion. From your original post, it was unclear if you wanted to learn stream parsing or document data structures, or both.
Kennet Belenky
Ok, I'm going to take a look at Regex ;)
Nathan Campos
+1  A: 

Far be it from me to forbid you from re-inventing the wheel for the sake of learning something new. Good for you for trying this out. However, if you are going to ask questions about how to do it you are going to need to specify your questions a little more.
Are you looking for help on:

  • Designing your framework / format
  • Planning your time / Estimating deadlines
  • Working with XML
  • Working with C#
  • Building a web-based C# application
  • Building a PC-based C# application
  • Other aspects of development entirely

There are many people here who want to help -- but the best answers are given to focused questions (not necessarily specific, but always focused.)

Sean Vieira
Building a interpreter of a my own document type(specified in the question) in C#.
Nathan Campos
+1  A: 

There are a couple of ways to approach this. One way would be to define the format of the file first, then use a parser-generator to crate C# code that can read that format. doing a Google search on "c# parser generator" will get you links to a number of different libraries you can use.

Alternatively, you could code your own parser, from scratch. This will be more work than using a parser generation tool, but might be more educational in the end.

The define-a-grammar approach may be total overkill for a simple format. Another way to approach the problem is to design the object tree that you'll use in-app first, then write serialization and de-serialization routines to save and load the contents from a file. The serialization interface in C# is pretty flexible, and you can serialize to binary or XML files easily.

I think it should be relatively straightforward to create your own serializer to create a file formatted however you like, but MSDN is not being my friend today, so I can't find the relevant documentation.

Mark Bessey
+1  A: 

Sounds like a good learning project and you've got some good pointers here already. I would just add that you should remember that there is a difference between a document file language and a document format.

Consider OOXML, it is a document format that is built on top of XML (what I'd describe as the file language). If your purpose is to learn about building your own document format then I'd highly recommend starting with XML so that you don't have to reinvent a language parser. This will let you focus on the concerns around building the format.

That said, good on you if you want to play around with creating your own language; just wanted to make sure you realized that they are different beasts.

Here are some links that will help you get started using XML in C#:

akmad
+1 Thanks ;) Very good answer!
Nathan Campos
But you can add some links to tutorials of interpretation of XML code please?
Nathan Campos