tags:

views:

233

answers:

8
+1  Q: 

How to parse this?

I need to parse out the string that has following structure

x:{a,b,c,}, y:{d,e,f} etc.

where all entries are numbers so it would look something like this

411:{1,2,3},241:{4,1,2} etc.

Forgot to mention: number of comma delimited entries in between {} has no upper limit but has to have at least one entry.

  1. I need to get the unique list of the numbers before :, in above case 411,241

Can this be done with regex and how?

+1  A: 

I think this might work, Pseudo-Code

foreach match in Regex.Matches(yourInputString, "[0-9]{3}:\{[0-9,]\},")
    firstNumber = match.Value.Substring(0, 3)
    numbers() = match.Value.Substring(4, match.Value.Length - 5).Split(",")
next

Bobby

Bobby
You need a quantifier on the character class between the braces. Also, this would allow a comma-only entry.
brianary
+1  A: 

Why do you want to do this with regex? I mean, you're querying the string for id's and given an id, want to retrieve it's values. I'd just break the string up and create a map structure that has the id as key, and a collection of numbers as their values.

Bart Kiers
Many people would use Regex to break up the string, which appears to be his question.
John Fisher
Well, I though that Regex is faster. Is it?
epitka
@John Fisher, perhaps I misunderstood the question. I thought epitka was querying the string possibly multiple times, in which case, using regex would not be logical.
Bart Kiers
+8  A: 

Regex:

(?<1>[\d]+):{(?<2>\d+),(?<3>\d+),(?<4>\d+)}

For data:

411:{1,2,3},241:{4,1,2},314:{5,6,7}

will produce the following match/groups collections:

Match 0
Group 0: 411:{1,2,3}
Group 1: 411
Group 2: 1
Group 3: 2
Group 4: 3

Match 1
Group 0: 241:{4,1,2}
Group 1: 241
Group 2: 4
Group 3: 1
Group 4: 2

Match 2
Group 0: 314:{5,6,7}
Group 1: 314
Group 2: 5
Group 3: 6
Group 4: 7

You can use the following code:

string expression = "(?<1>[\d]*):{(?<2>\d),(?<3>\d),(?<4>\d)}";
string input = "411:{1,2,3},241:{4,1,2},314:{5,6,7}";

Regex re = new Regex(expression, RegexOptions.IgnoreCase);

MatchCollection matches = re.Matches(input);

for (int i = 0; i < matches.Count; i++)
{
Match m = matches[i];
// for i==0
// m.groups[0] == 411:{1,2,3}
// m.groups[1] == 411
// m.groups[2] == 1
// m.groups[3] == 2
// m.groups[4] == 4
}

Update Having trouble getting it to work with pure regex and variable number of items in the list - maybe someone else can chime in here. A simple solution would be:

string expression = "(?<1>[\d]+):{(?<2>[\d,?]+)}";
string input = "411:{1,2,3,4,5},241:{4,1,234}";

Regex re = new Regex(expression, RegexOptions.IgnoreCase);

MatchCollection matches = re.Matches(input);

for (int i = 0; i < matches.Count; i++)
{
Match m = matches[i];
// for i==0
// m.groups[0] == "411:{1,2,3}"
// m.groups[1] == "411"
// m.groups[2] == "1,2,3"
int[] list = m.Groups[1].Split(",");
// now list is an array of what was between the curly braces for this match
}

Match list for above:

Match 0
Group 0: 411:{1,2,3,4,5}
Group 1: 411
Group 2: 1,2,3,4,5

Match 1
Group 0: 241:{4,1,234}
Group 1: 241
Group 2: 4,1,234
David Lively
can you adjust it to work with unlimited number of elements in the inner list ({}) and unlimited number of outer groups.
epitka
See update above. Not pure regex, but effective via string.split().
David Lively
Thanks for help, I voted you up, but went with the other solution.
epitka
A: 

The first one is achievable with the following regex:

\d*(?=:)
Pete OHanlon
That doesn't really fully answer the question, though, does it?
Platinum Azure
That doesn't answer the question.
David Lively
Well, thanks a bunch guys - the question that's there now isn't the question that was there when I answered. That question was how to get the parts before the colons in the first question, and how to get the inner portions as the second question. There must be some way to see the pre-edit question so you can see what I answered against.
Pete OHanlon
Good point. I tried bumping the points back up but can't because the vote is too old.
David Lively
Thanks David. I appeciate the attempt anyway.
Pete OHanlon
A: 

If we consider x:{a,b,c} an element, the following would give you a list of matches with two named grounps: Outer and Inner. Outer being x, Inner being a,b,c.

(?<outer>\d+):\{(?<inner>\d+(,\d+)*)\}

Update

Here is a code sample:

        String input = "411:{1,2,3},241:{4,1,2},45:{1},34:{1,34,234}";
        String expr = @"(?<outer>\d+):\{(?<inner>\d+(,\d+)*)\}";

        MatchCollection matches = Regex.Matches(input, expr);

        foreach (Match match in matches)
        {
            Console.WriteLine("Outer: {0} Inner: {1}", match.Groups["outer"].Value, match.Groups["inner"]);
        }
HackedByChinese
No, they don't come in pairs, number of elements is variable, and yes inner list must have at least one number but has no upper limit.
epitka
ok, then the first one should work exactly as you described.
HackedByChinese
this regex is working for me at all. It creates following groups{0:{0,1,2}}, {,2}, {0}, {0,1,2}, {}, {}
epitka
hmm works well for me. I updated my answer with a code sample.
HackedByChinese
don't know if you've changed anything but it work perfectly now. Thanks
epitka
i had a typo in the group names before i updated it. sorry about that!
HackedByChinese
+1  A: 

this string have the json format. so you can use Json.Net to parse it for you

w35l3y
It kinda looks like Json, but I don't think it is.
mgroves
+1  A: 

Are you working with JSON? If so, you might want to check out the JavaScriptSerializer Class on MSDN,

http://msdn.microsoft.com/en-us/library/system.web.script.serialization.javascriptserializer.aspx

Anjisan
+1  A: 

Here's an alternative without RegEx that will run faster.

This returns a Dictionary<Double, List<Double>>....

public Dictionary<double, List<double>> Example()
        {
            String[] aSeparators = {"{", "},", ",", "}"};
            String data = "411:{1,2,3},843:{6,5,4,3,2,1},241:{4,1,2}";
            String[] bases = data.Split(aSeparators, StringSplitOptions.RemoveEmptyEntries);
            Dictionary<double, List<double>> aDict = null;

            double aHeadValue = 0;
            List<Double> aList = null;
            foreach (var value in bases)
            {
                if (value.EndsWith(":"))
                {
                    if (aDict == null)
                        aDict = new Dictionary<double, List<double>>();
                    else
                        aDict.Add(aHeadValue, aList);
                    aHeadValue = Double.Parse(value.TrimEnd(':'));
                    aList = new List<Double>();
                }
                else
                {
                    aList.Add(Double.Parse(value));
                }
            }
            aDict.Add(aHeadValue, aList);
            return aDict;
        }
Tony Lambert