ansaurus

Question

C# Tokenizer - keeping the seperators

Answer 1

A:

Not easily, no. You may have to parse the string manually or look for a third party tokenizer library.

Edit: I found an interesting article on tokenizing using Regex.Split. Perhaps that will help?

Odd that the link isn't working. It appears to be the underscores. Here's the full URL:

http://en.csharp-online.net/CSharp_Regular_Expression_Recipes—A_Better_Tokenizer

Edit2: Got the link working; it was the long-dash in the title. Had to manually encode it. W00t

Randolpho 2009-07-15 21:56:57

Answer 2

A:

If you want a very flexible, powerful, reliable, and expandable solution, you can use the C# port of ANTLR. There is some initial overhead (link is setup information for VS2008) that would likely result in overkill for such a tiny project. Here's a calculator example with support for variables.

Probably overkill for your class, but if you're interested in learning about "real" solutions to this type of real-world problem, have a look-see. I even have a Visual Studio package for working with the grammars, or you can use ANTLRWorks separately.

280Z28 2009-07-15 22:02:34

Answer 3

+1 A:

You can use Regex.Split with zero-width assertions. For example, the following will split on +-*/:

Regex.Split(str, @"(?=[-+*/])|(?<=[-+*/])");

Effectively this says, "split at this point if it is followed by, or preceded by, any of -+*/. The matched string itself will be zero-length, so you won't lose any part of the input string.

Pavel Minaev 2009-07-15 22:04:38

Answer 4

+1 A:

This produces your output:

string s = "24+3";
string seps = @"(\t)|(\n)|(\+)|(-)|(\*)|(/)|(\()|(\))";
string[] tokens = System.Text.RegularExpressions.Regex.Split(s, seps);

foreach (string token in tokens)
    Console.WriteLine(token);

Shane Cusson 2009-07-15 22:08:51

ansaurus

tags:

views:

answers:

C# Tokenizer - keeping the seperators

related questions