views:

3258

answers:

13

My string is as follows:

smtp:[email protected];SMTP:[email protected];X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;

I need back:

smtp:[email protected]
SMTP:[email protected]
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;

The problem is the semi-colons seperate the addresses and also part of the X400 address. Can anyone suggest how best to split this?

PS I should mentioned the order differs so it could be:

X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:[email protected];SMTP:[email protected]

There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500.

+8  A: 

EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here.

Will you always have three parts, and you just want to split on the first two semi-colons?

If so, just use the overload of Split which lets you specify the number of substrings to return:

string[] bits = text.Split(new char[]{';'}, 3);
Jon Skeet
+1 never even knew about this overload. Looks like Java has it, too!
Outlaw Programmer
There can be more than 3 address, 4, 5.. 10 etc including an X500 address!
They do all start with either smtp: SMTP: X400 or X500
So split on the SMTP: then
Geoffrey Chetwood
Can you update the question with this information? Will the SMTP entries always be before the other stuff? Do you know how many entries there are before you parse this string?
Outlaw Programmer
I've updated the question, I won't know how many address and SMTP may or may not be first. Thanks
This solution isn't going to work based on the updated question. I'm retracting my +1 but I'm still grateful for the information!
Outlaw Programmer
+1  A: 

http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx check there, you can specify the number of splits you want. so in your case you would do

string.split(new char[]{';'}, 3);
The.Anti.9
As specified, he doesn't know the number of splits that need to be made.
Orion Adrian
A: 

Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element.

string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G="
  +"Black;;smtp:[email protected];SMTP:[email protected]";

string[] rawSplit = input.Split(';');

List<string> result = new List<string>();
  //now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
  if (buffer == string.Empty)
  {
    buffer = s;
  }
  else if (s.Contains(':'))
  {   
    result.Add(buffer);
    buffer = s;
  }
  else
  {
    buffer += ";" + s;
  }
}
result.Add(buffer);

foreach (string s in result)
  Console.WriteLine(s);
David B
It looks like someone went around downvoting correct answers. I'm upvoting any answer I think is correct (especially answers that are downvoted with no explanation) and downvoting the question (because he keeps changing requirements).
David B
'Changing the requirements' is part of what this site is all about. Sometimes the questioner doesn't exactly know what or how the question should be asked. Seeing various answers come in helps them to refine the question.
Aaron Palmer
Sure, but it's one thing to change the question... another thing to ask, get answers, change, and then downvote the correct answers to the original.
David B
+1  A: 

This caught my curiosity .... So this code actually does the job, but again, wants tidying :)

My final attempt - stop changing what you need ;=)

static void Main(string[] args)
{
    string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Jack;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Jackl;G=Blackl;smtp:[email protected];X500:C=US500;A= ;P=Test;O=Exchange;S=Jack;G=Black;SMTP:[email protected];";

    string[] parts = fneh.Split(new char[] { ';' });

    List<string> addresses = new List<string>();
    StringBuilder address = new StringBuilder();
    foreach (string part in parts)
    {
        if (part.Contains(":"))
        {
            if (address.Length > 0)
            {
                addresses.Add(semiColonCorrection(address.ToString()));
            }
            address = new StringBuilder();
            address.Append(part);
        }
        else
        {
            address.AppendFormat(";{0}", part);
        }
    }
    addresses.Add(semiColonCorrection(address.ToString()));

    foreach (string emailAddress in addresses)
    {
        Console.WriteLine(emailAddress);
    }
    Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
    if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
    {
        return string.Format("{0};", address);
    }
    else
    {
        return address;
    }
}
Rob
Hi Rob, This almost works for me, apart from the semi-colon is removed from the x400 address which breaks it (I need to leave the semi-colons in or append one also) plus I've just dicovered one address has two x400 addresses! X400 (uppercase) indicates a primary address x400 (lower) secondary! Thks
That should be close enough... You could always add a conditional to "semiColonCorrection" that only adds the ";" back to the end for xNNN addresses, not smtp ones..
Rob
Worked a charm Rob, you are the MAN! Many thanks!
A: 

Try these regexes. You can extract what you're looking for using named groups.

X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)

Make sure when constructing them you specify case insensitive. They seem to work with the samples you gave

Conrad
It's picking up the words smtp after the X400 address.
Orion Adrian
Are you checking the matches or the match groups?
Conrad
+1  A: 

Not the fastest if you are doing this a lot but it will work for all cases I believe.

        string input1 = "smtp:[email protected];SMTP:[email protected];X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
        string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:[email protected];SMTP:[email protected]";
        Regex splitEmailRegex = new Regex(@"(?<key>\w+?):(?<value>.*?)(\w+:|$)");

        List<string> sets = new List<string>();

        while (input2.Length > 0)
        {
            Match m1 = splitEmailRegex.Matches(input2)[0];
            string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
            sets.Add(s1);
            input2 = input2.Substring(s1.Length);
        }

        foreach (var set in sets)
        {
            Console.WriteLine(set);
        }

        Console.ReadLine();

Of course many will claim Regex: Now you have two problems. There may even be a better regex answer than this.

Greg
+1  A: 

You could always split on the colon and have a little logic to grab the key and value.

string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
 string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
 string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
 values.Add(String.Concat(key, ":", value));
}

Tested it with both of your samples and it works fine.

Samuel
+4  A: 

May I suggest building a regular expression

(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?

or protocol-less

.*?:((?![^:;]*:).)*;?

in other words find anything that starts with one of your protocols. Match the colon. Then continue matching characters as long as you're not matching one of your protocols. Finish with a semicolon (optionally).

You can then parse through the list of matches splitting on ':' and you'll have your protocols. Additionally if you want to add protocols, just add them to the list.

Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions.

The protocol-less version doesn't care what the names of the protocols are. It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon.

Orion Adrian
This nearly works but there is the assumption that addresses themselves do not contain the text smtp, X400 or X500.
AnthonyWJones
Simple string formats really, really don't need regexes to split them IMHO, but this is a decent case for a regex.
Robert P
I agree that you shouldn't use Regex where string.Split will work (and that's why the colon processing is done string.Split), but as soon as you end up writing a chunk of code for string processing I think you should start to look at regex's for doing the same thing (they're better at it).
Orion Adrian
Then we're in violent agreement. :)
Robert P
+3  A: 

Split by the following regex pattern

string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");

EDIT: better one can accept more special chars in the protocol name.

string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");
Dennis Cheung
Doesn't work for the expressions involved if there is a space after the semi-colon.
Orion Adrian
I've tested it on http://www.regextester.com/ and it works(can identify the correct ";" in the string). Could be the flag or preg compatibility issue?
Dennis Cheung
Try adding a space after the semicolon, but before the next protocol and it will fail to match that semicolon.
Orion Adrian
I see. I try to improve it. But how could the name of protocol start with a space?
Dennis Cheung
I am just wondering, how this answer get downvoted but #483931 and #483931 get relatively more upvote?
Dennis Cheung
Answer corrected and downvote removed. To answer your first question, people are lazy. If they see a question downvoted at all perhaps they don't bother reading it to upvote it.
Orion Adrian
I was not wondering that the answer not getting upvote. I was wondering how a downvote happen to an answer that answering the original and the revised question with less code, less hard-coded logic, less assertion and less problem.
Dennis Cheung
Because it didn't actually walk the person through the problem. You said to split on those, but without also mentioning for example that you need to do a substring operation or such, they're unlikely to know what to do next.
Orion Adrian
I got it, thanks
Dennis Cheung
A: 

here is another possible solution.

string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });

bits[0], bits[1], and bits[2] will then contains the three parts in the order from your original string.

mangokun
Obviously, the example as it is will not match 'smtp' at the beginning. I think all the replaces should be without the semicolon, e.g. text.Replace("smtp","|smtp"). Final tidy up on the resulting can have a TrimEnd(';') and the split can be done with RemoveEmptyEntries to get rid of the first blank
Carl
my solution tries to cater for the other input string as well, which is `X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:[email protected];SMTP:[email protected]`
mangokun
So will my altered one, it depends on how important the semicolons are, as yours is removing them.
Carl
A: 

Lots of attempts. Here is mine ;)

string src = "smtp:[email protected];SMTP:[email protected];X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";

Regex r = new Regex(@"
   (?:^|;)smtp:(?<smtp>([^;]*(?=;|$)))|
   (?:^|;)x400:(?<X400>.*?)(?=;x400|;x500|;smtp|$)|
   (?:^|;)x500:(?<X500>.*?)(?=;x400|;x500|;smtp|$)",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

foreach (Match m in r.Matches(src))
{
 if (m.Groups["smtp"].Captures.Count != 0)
  Console.WriteLine("smtp: {0}", m.Groups["smtp"]);
 else if (m.Groups["X400"].Captures.Count != 0)
  Console.WriteLine("X400: {0}", m.Groups["X400"]);
 else if (m.Groups["X500"].Captures.Count != 0)
  Console.WriteLine("X500: {0}", m.Groups["X500"]); 
}

This finds all smtp, x400 or x500 addresses in the string in any order of appearance. It also identifies the type of address ready for further processing. The appearance of the text smtp, x400 or x500 in the addresses themselves will not upset the pattern.

AnthonyWJones
A: 

You can use the following regular expression, compiled IgnoreCase and RightToLeft to grab each individual part:

((?<protocol>x[45]00|smtp):(?<payload>([^:;]+);?)+)

The RightToLeft is required to keep a trailing 'smtp' or 'x400' from getting added onto a match. Works testing with Ultrapico Expresso 3.0 and all of your test inputs.

sixlettervariables
A: 

This works!

    string input =
        "smtp:[email protected];SMTP:[email protected];X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
    string[] parts = input.Split(';');
    List<string> output = new List<string>();
    foreach(string part in parts)
    {
        if (part.Contains(":"))
        {
            output.Add(part + ";");
        }
        else if (part.Length > 0)
        {
            output[output.Count - 1] += part + ";";
        }
    }
    foreach(string s in output)
    {
        Console.WriteLine(s);
    }
Leon Tayson