ansaurus

Question

Existing parsers in c# (BSD license or similar)

Answer 1

+2 A:

HTML:

Andrew Lewis 2010-06-17 16:00:15

Answer 2

+1 A:

The Gardens Point Parser Generator generates a C# parser given a YACC-like language syntax.

Dour High Arch 2010-06-17 16:29:09

Answer 3

+1 A:

If what you want to do is to harness a large set of pre-existing language definitions to identify the text strings in those langauges (and to have a foundation for building efficient text-string extractors for arbitrary other documents), you might want to look at SD Source Code Search Engine (SCSE).

The SCSE uses compiler techonology (essentially big mean versions of FLEX) to break source code files apart into its constituent tokens (keywords, operators, numbers, comments ... and text strings). The individual tokens are then indexed by file name, line, column. The resulting index is used to enable lightning-fast searches over very large sets of source files, accomodating multiple languages. The SCSE has extractors for PHP, VB6, C#, COBOL and some 20 other computer languages including HTML and XML. Being built on top of DMS Software Reengineering Toolkit, it is possible to add other document types easily using DMS's lexer generators.

[The DMS machinery is like ANTLR in basic capability, but goes far beyond ANTLR if your goal is to actually analyze and transform source code, but that's not relevant to your specifically proposed solution. The language definitions used for SCSE are used for many other purposes with DMS, and so they are tested by fire and extremely robust. And DMS is exactly one of those parser frameworks supporting a family of parsers I said would be difficult to find.]

The relevance to your task is that the extracted tokens fed to the SCSE indexer identify the token type (especially "string literal") and the precise location (start line/column, end line/column). This information appears to be precisely what you want.

The SCSE's output isn't documented and wasn't intended for this purpose, but that's a curable problem. Nor is it BSD-licensed, but you said you were interested in a commercial solution. It does run under Windows, and a C# based tool could easily read the results.

You can arguably do this with ANTLR's technology, too, but the existing ANTLR parsers don't produce the tokens directly ready for consumption for your purpose in the way the SCSE does. I'm unsure if ANTLR handles Unicode; the SCSE absolutely does. Similarly, you can do this with FLEX or any very strong regular expression compiler, but you won't get the large stable of robust language processors as a starting point.

I'm the architect behind DMS and the SCSE. If you have further interest contact me directly; see my SO bio.

Ira Baxter 2010-06-18 07:04:58

@Ira: Thanks for your suggestions and interest. It would probably work but I can't afford it for PrepTag. I sell my software licenses for €39 a piece, and the market is fairly small. I asked about BSD license parsers because I can't use GNU or GPLed code (PrepTags is not open-source) and can't afford commercial products, as you can probably understand. The purpose of this question is to get a list of ready-to-use parsers for C# under a BSD or MIT license. If I can't find any, I will just write some.

Sylverdrag 2010-06-18 09:35:05

Answer 4

A:

Two previous entries in stack overflow that might be interesting:

http://stackoverflow.com/questions/1257268/good-parser-generator-think-lex-yacc-or-antlr-for-net-build-time-only

and here is a very simple one, check the second answer: http://stackoverflow.com/questions/673113/poor-mans-lexer-for-c

jgauffin 2010-06-19 11:01:23

@jgauffin: this doesn't answer the question, but I give +1 because playing with Irony (from the first entry) made me realize that trying to use a formally built parser, following closely all the rules, will not work out well for me. Using the built-in SQL grammar from Irony on a "simple" SQL dump turned out a whole bunch of syntax errors. Probably the parser was made for a different brand of SQL, but it made me realize that my purposes are somewhat the opposite of that of a compiler. A compiler validates everything and fails loudly if there is the slightest problem. The user must fix his code.

Sylverdrag 2010-06-19 14:03:14

On the other hand, for my purposes, the file is always right, even if it violates every official rule of the format. The file can not, must not be changed and the parser must ignore any irregularity and do its best to prepare the file anyway, and shut up if it can't. While the general idea is the same, the purposes and requirements are virtually opposed one to another. Looks like I will have to write my own parsers.

Sylverdrag 2010-06-19 14:11:41

Answer 5

+1 A:

jgauffin 2010-06-23 06:46:07

Thanks for your suggestions. The question has not changed, actually. It has been reworded, but the bottom line was to get ready-to-use parsers from the beginning.

Sylverdrag 2010-06-23 14:07:14

ok. I kept the old answer as a reference. Others might be interested in those links.

jgauffin 2010-06-23 18:55:58

Answer 6

+1 A:

There's also:

Coco/R:
http://en.wikipedia.org/wiki/Coco/R

GOLD:
http://en.wikipedia.org/wiki/GOLD_(parser)

code4life 2010-06-26 01:52:40

@code4life: If I am not mistaken, these are parser generators, right? I am looking for parsers. Not parser generators.

Sylverdrag 2010-06-26 05:03:32

ansaurus

tags:

views:

answers:

Existing parsers in c# (BSD license or similar)

related questions