tags:

views:

100

answers:

4

My file has certain data like::

/Pages 2 0 R/Type /Catalog/AcroForm

/Count 1 /Kids [3 0 R]/Type /Pages

/Filter /FlateDecode/Length 84

What is the regular expression to get this output..

Pages Type Catalog AcroForm Count Kids Type Pages Filter FlateDecode Length

I want to fetch string after '/' & before 2nd '/' or space.

Thanks in advance.

+3  A: 
\/[^\/\s]+

\/ -- A slash (escaped)
[^ ] -- A character class not (^) containing...
\/ -- ... slashes ...
\s -- ... or whitespace
+ -- One or more of these

Svante
it doesnt work...it gives error- Unrecognized escape sequence
Royson
+4  A: 
class Program
{
    static void Main() 
    {
        string s = @"/Pages 2 0 R/Type /Catalog/AcroForm
/Count 1 /Kids [3 0 R]/Type /Pages
/Filter /FlateDecode/Length 84";

        var regex = new Regex(@"[\/]([^\s^\/]*)[\s]");
        foreach (Match item in regex.Matches(s))
        {
            Console.WriteLine(item.Groups[1].Value);
        }

    }
}

Remark: Don't use regular expressions to parse PDF files.

Darin Dimitrov
Out of curiousity, why not?
cwap
Because the PDF specification is 8.6MB (http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf) and it is unlikely you will get it right with a regular expression. There are tools for this.
Darin Dimitrov
+1  A: 

Here it is for c#:

@"/([^\s/]+)"

You can test it here just adding what is in between quotes: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Benjamin Ortuzar
A: 

I wouldn't use a regex for this, I find that using string operations is more readable:

string[] lines = input.split(@"\");
foreach(string line in lines)
{
    if(line.contains(" "))
    {
         // Get everything before the space
    }
    else
    {
         // Get whole string
    }
}
Oded