tags:

views:

2093

answers:

6

Assume I have the following string:

<script language="javascript">
     var league = new Array(
      "Soccer","Germany - 2. Bundesliga","38542195","102","24 May 2009 14:00","24 May 2009 14:00","1X2","1","0"
     );
     var matches = new Array(
      "125","1.FC Nurnberg - TSV 1860 Munich","24 May 2009 14:00","Sun, 24.05.09 14:00","1|1.40|4.10|6.40|-","||||","1|1.90|3.50|2.20|0:1","1|1.05|2.20|1.18|-","1|2.00||1.60|2.5","1|3.40|3.20|1.60|2","1|1.70|2.50|5.50|-","||||-","1",
      "126","FC Ingolstadt 04 - TuS Koblenz","24 May 2009 14:00","Sun, 24.05.09 14:00","1|3.60|2.80|2.00|-","||||","||||:","1|1.68|1.25|1.26|-","1|1.90||1.70|2.5","1|3.10|3.10|1.70|2","1|3.60|2.10|2.45|-","||||-","1",
      "127","FC St.Pauli 1910 - FSV Frankfurt","24 May 2009 14:00","Sun, 24.05.09 14:00","1|2.50|2.95|2.60|-","||||","||||:","1|1.41|1.44|1.28|-","1|2.00||1.60|2.5","1|3.40|3.20|1.60|2","1|2.95|2.00|3.05|-","||||-","1",
      "128","MSV Duisburg - VfL Osnabruck","24 May 2009 14:00","Sun, 24.05.09 14:00","1|2.30|3.60|2.40|-","||||","||||:","1|1.35|1.51|1.27|-","1|2.10||1.55|2.5","1|3.60|3.20|1.55|2","||||-","||||-","1",
      "129","FSV Mainz 05 - SC Rot-Weiss Oberhausen","24 May 2009 14:00","Sun, 24.05.09 14:00","1|1.40|3.80|7.00|-","||||","1|1.95|3.50|2.50|0:1","1|1.05|2.50|1.18|-","1|2.00||1.60|2.5","1|3.40|3.20|1.60|2","1|1.70|2.30|5.50|-","||||-","1",
      "130","Rot-Weiss Ahlen - SpVgg Greuther Furth","24 May 2009 14:00","Sun, 24.05.09 14:00","1|2.55|3.20|2.55|-","||||","||||:","1|1.42|1.42|1.28|-","1|2.10||1.55|2.5","1|3.60|3.20|1.55|2","1|3.00|2.00|3.00|-","||||-","1",
      "131","SC Freiburg - 1.FC Kaiserslautern","24 May 2009 14:00","Sun, 24.05.09 14:00","1|1.75|3.25|4.20|-","||||","||||:","1|1.17|1.91|1.24|-","1|2.10||1.55|2.5","1|3.60|3.20|1.55|2","1|2.30|2.10|3.80|-","||||-","1",
      "132","SV Wehen Wiesbaden - FC Hansa Rostock","24 May 2009 14:00","Sun, 24.05.09 14:00","1|5.00|3.70|1.55|-","||||","||||:","1|2.23|1.09|1.23|-","1|1.90||1.70|2.5","1|3.10|3.10|1.70|2","1|4.50|2.25|2.00|-","||||-","1",
      "133","TSV Alemannia Aachen - FC Augsburg","24 May 2009 14:00","Sun, 24.05.09 14:00","1|1.60|3.45|5.10|-","||||","||||:","1|1.11|2.13|1.23|-","1|2.10||1.55|2.5","1|3.60|3.20|1.55|2","1|2.10|2.20|4.30|-","||||-","1"
     );
     var events = showLeague(league, matches);
     hasEvents = hasEvents + events;
</script>

What I'm trying to do is to parse the part where it reads "var matches" and extract anything contained between two quotation marks. Therefore the desired result should be an array containing:

(0): 125
(1): 1.FC Nurnberg - TSV 1860 Munich
(2): 24 May 2009 14:00 
etc.

NB: I saw a similar question being answered, but after some time spent I wasn't able to make it work. Thanks!

A: 

I think you need to remove the " at the beginning and the end and split by ","

string [] test=Regex.Split(s.SubString(1,s.length-2), "\",\"");
weismat
A: 

Try the following:

using System.Text.RegularExpressions;

public static MatchCollection getMatches(String input, String pattern) {
   Regex re = new Regex(pattern);
   return re.Matches(input);
}

public static void Example() {
   String pattern1 = "var matches = new Array\\(([^\\)]+)\\)";

   MatchCollection results = getMatches(RandomTest, pattern1);
   String marray = results[0].Groups[1].Value;

   String pattern2 = "\"([^\"]+)\"";
   List<String> values = new List<String>();
   foreach (Match value in getMatches(marray,pattern2)) {
      //Your values are in the Groups property
      values.Add(value.Groups[1].Value);
      Console.WriteLine(value.Groups[1].Value);
   }
}

The first pattern extracts the matches array and the second grabs all the quoted values in that array

RobV
+1  A: 

Please don't use regular expressions for this, CSV should be handled by a parser. Doing this with regex is the slowest and most error-prone method of all.

Here is a ready-to-use parser: codeproject.com: A Fast CSV Reader. Others examples can easily be found, as implementing a CSV parser is a popular excercise.

You can also use the OLEDB built-in parser: C# Tutorial - Using The Built In OLEDB CSV Parser.

With your example, I would use IndexOf() to cut out the string between "var matches = new Array(" and ");", and treat the result as a CSV string.

Tomalak
To be fair, most parser generators I've came across use regular expressions to define the grammar.
DrJokepu
@DrJokepu: It’s not impossible to parse CSV data with regular expressions. But it’s not the most efficient way. Time consumption of regular expressions can be exponential O(n^c) (due to backtracking) while a parser has just O(n). (I hope the estimates are correct.)
Gumbo
+1  A: 

I would use the following Regex pattern to match the entire Array contents :

"var matches = new Array\(\s+(.*?)\s+\)"

... and then perform a String.Split on the comma separator.

Cerebrus
A: 

If you really want to use regular expressions, try something like this:

var matches = new Array\(\s*("(?:[^\\"]*|\\.)*"\s*(?:,\s*"(?:[^\\"]*|\\.)*")*)\s*\);

That should get you the array value list. Then another regular expression can get you the single values:

"(?:[^\\"]*|\\.)*"

But again: Using regular expressions in this case is not that efficient. A simple CSV parser will be way better.

Gumbo
A: 

If you need a single list for all the lines together use this:

/// <returns>Returns all values inside matches array in a single list</returns>
        public static List<string> GetMatchesArray(String inputString)
        {
            // Matches var matches = new Array( ... );
            Regex r = new Regex("(var matches = new Array\\([^\\)]*\\);)",
                RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);

            string arrayString = r.Match(inputString).Groups[0].Value;

            List<string> quotedList = new List<string>();

            // Matches all the data between the quotes inside var matches
            r = new Regex("\"([^\"]+)\"", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
            for (Match m = r.Match(arrayString); m.Success; m = m.NextMatch())
            {
                quotedList.Add(m.Groups[1].Value);
            }

            return quotedList;
        }

If you want to have a separate list per line, you should have a list of lines, and within each list you should have a list of quoted text. The code below will do that:

/// This will help you store the data in a list in a more meaningful way, 
/// so that you are able to organize the data per line
/// Returns all the quoted text per line in a list of lines
public static List<List<string>> GetMatchesArrayPerLine(String inputString)
{
    // Matches var matches = new Array( ... )
    Regex r = new Regex("(var matches = new Array\\([^\\)]*\\);)",
        RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);

    string arrayString = r.Match(inputString).Groups[0].Value;

    List<string> lineList = new List<string>();

    // Matches all the lines and stores them in lineList one line per item. For e.g.
    // "125","1.FC Nurnberg - TSV 1860 Munich", ...
    // "126","FC Ingolstadt 04 - TuS Koblenz", ...
    r = new Regex("\n(.*)", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
    for (Match m = r.Match(arrayString); m.Success; m = m.NextMatch())
    {
        lineList.Add(m.Groups[1].Value);
    }

    List<List<string>> quotedListPerLine = new List<List<string>>();

    // Matches the quoted text per line. 
    // This will help you store data in an organised way rather than just a list of values
    // Similar to a 2D array
    // quotedListPerLine[0] = List<string> containing { "125", "1.FC Nurnberg - TSV 1860 Munich", ... }
    // quotedListPerLine[1] = List<string> containing { "126","FC Ingolstadt 04 - TuS Koblenz", ... }
    r = new Regex("\"([^\"]+)\"", RegexOptions.IgnoreCase | RegexOptions.Compiled);
    foreach (string line in lineList)
    {
        List<string> quotedList = new List<string>();
        for (Match m = r.Match(line); m.Success; m = m.NextMatch())
        {
            quotedList.Add(m.Groups[1].Value);
        }
        quotedListPerLine.Add(quotedList);
    }

    return quotedListPerLine;
}

Calling code for your convenience:

List<List<string>> quotedListLines = MyRegEx.GetMatchesArrayPerLine(a);
foreach (List<string> line in quotedListLines)
{
    Console.WriteLine("----LINE---");
    foreach (string quotedText in line)
        Console.WriteLine(quotedText);
}
Rashmi Pandit