Looking for parsers (in C#) for a bunch of formats. (PHP, ASP, some XML based formats, HTML,...pretty much anything I can get my hands on.)
So far we have:
HTML:
* Majestic-12
* Html Agility Pack
I am having a hard time believing that these are the only free parsers for c# in existence, so I am adding a bounty to the question.
For my own needs (see below for details), it looks like I will have to roll out my own, but I still would like to get a list of free parsers, if there are any.
Note that by parser, I mean parser. Not parser generator. Something ready to use, where you can just call .loadFile(FileName)
and .next(item)
without having to study the format RFQ, define the grammar, the terminal and non-terminal tokens and whatnot.
Original question: The purpose is to separate the text from the code and do some edits without messing up the code.
I had a look at ANTLR, but while it seems like the "right tool", there is just too much prior knowledge assumed. I have an easier time writing a parser from scratch than understanding how to "easily" generate parsers from ANTLR. (I wrote a small parser for a specific type of RTF files within a couple days, so the task is probably within my reach, but as I have no formal knowledge of parsing/lexing, I am at loss with ANTLR)
Then it occurred to me that there must existing parsers for many formats, so before I start writing yet another a brand new and potentially buggy version of the wheel, I figured I would check what parsers already exist and can be reused in a commercial product.
I could use parsers for just about every format in existence, so this question would be a good place to make a list of all existing free parsers written in C#, if there are any.
Thanks in advance for your suggestions
=====
Edit: To clarify, I just need to identify strings that could potentially require translation and protect the rest. Not a full parser (although full parsers can be used in this context)/
It is impossible to identify strings to be translated automatically, but looking at the problem backward, it is possible to identify the parts of a file which should never be translated. The idea here is to do as much preparation as possible automatically, and allow the user to run regexes on the result. Ideally, bring it to the point that the user can fix it manually with little effort. I am not going for an absolute solution, but for a practical one.
For a better understanding of what I am doing and how, have a look at the video tutorials on www.preptags.com.