ansaurus

Question

How to modify RegularExpression to Parse vCard/vCalendar to allow a particular field type?

Answer 1

A:

If all you want is to capture TEL;WORK;VOICE then this will do it:

^(.*?:)

this essentially captures everything from the beginning of the line until and including the first colon. To exclude the colon simply move it outside the capturing parens

here's the full regex (without the matching variables FIELDNAME AND CONTENT):

^(.*?):(.*)$

so ^(.*?): captures everything up until the first colon and (.*)? matches everything after the first colon until the end of line. You can put the matching variable names before the 2 parts of the regex

ennuikiller 2009-09-06 14:19:40

That sounds like what I need, however where do I put the modification in my RegEx so the FIELDNAME will be captured correctly, like all the other fields, as I read the FIELDNAME and CONTENT in my code to populate a field list.

RoguePlanetoid 2009-09-06 14:47:30

Answer 2

A:

I believe this does what you want. It's in C# because I'm not set up to test VB, but you shouldn't have any trouble converting it.

Regex r = new Regex(
    @"^(?<FIELD>[^\s:;]+)(;(?<PARAM>[^;:]+))*:(?<CONTENT>.*(?>\r\n[ \t].*)*)$",
    RegexOptions.ExplicitCapture | RegexOptions.Multiline);
string target = @"TEL;WORK;VOICE:0200 0000000";
Match m = r.Match(target);
if (m.Success)
{
  Console.WriteLine("field name: {0}", m.Groups["FIELD"].Value);
  foreach (Capture c in m.Groups["PARAM"].Captures)
  {
    Console.WriteLine("  type:  {0}", c.Value);
  }
  Console.WriteLine("content: {0}", m.Groups["CONTENT"].Value);
}

EDIT: Now that I know where you got the regex from, I can see the author is trying to do too much work in the regex. "Encoding" and "charset" are just two of many possible parameter names; I don't see any reason to match those two by name and not any others. Just iterate through the "PARAM" captures like I did and handle each one as appropriate.

The author also allows for line folding, which probably does belong in the regex. The rules governing line folding seem pretty simple: if a line starts with a space or a tab, it's a continuation of the previous line. That also means the "FIELD" subexpression needs to be revised to disallow whitespace as well as colons and semicolons.

I've revised my regex and added the Multiline modifier, which should have been there all along. :-/

I feel I should mention that, if you're writing a complete vCard processing app, you probably shouldn't be building it on top of regexes. A non-regex solution will be easier to write (though not as much fun) and easier to maintain.

Alan Moore 2009-09-07 04:02:53

I got the Regular Expression from here : http://blog.smithfamily.dk/CategoryView,category,vcard.aspx, it was the only example that was not tied to the fieldnames themselves.

RoguePlanetoid 2009-09-07 08:55:07

Actually this inspired me to find a solution that works, so will mark this as answer!

RoguePlanetoid 2009-09-07 10:27:09

Okay, thanks. I was editing the answer when you did that, and SO didn't notify me. I still think my way is better, but if you're happy...

Alan Moore 2009-09-07 11:10:09

Answer 3

A:

The Regular Expression which works is:

^(?<FIELDNAME>[\w-]{1,})(?:(?:;?)(?:ENCODING=(?<ENC>[^:;]*)|CHARSET=(?<CHARSET>[^:;]*)|(?<PARAM>[^:;]+))){0,2}:(?:(?<CONTENT>(?:[^\r\n]*=\r\n){1,}[^\r\n]*)|(?<CONTENT>[^\r\n]*))

Hopefully if someone else finds this useful, as it solved the problem with getting the Parameters from the vCard Data

RoguePlanetoid 2009-09-07 10:27:59

Answer 4

A:

This is a pretty good and detailed blog post that describes parsing VCard fields and gives the regular expressions that it uses. It could be of help to you.

http://borick.blogspot.com/

Rick 2010-04-15 17:33:01

ansaurus

tags:

views:

answers:

How to modify RegularExpression to Parse vCard/vCalendar to allow a particular field type?

related questions