views:

448

answers:

12

Update

If you were forced to use a single char on a split method, which char would be the most reliable?

Definition of reliable: a split character that is not part of the individual sub strings being split.

+5  A: 

There are overloads of String.Split that take string separators...

nitzmahone
Correct me if I'm wrong, but this doesn't answer the updated question.
Alastair Pitts
+7  A: 

I usually prefer a '|' symbol as the split character. If you are not sure of what user enters in the text then you can restrict the user from entering some special characters and you can choose from those characters, the split character.

rahul
+1  A: 

It depends very much on the context in which it's used. If you're talking about a very general delimiting character then I don't think there is a one-size-fits-all answer.

I find that the ASCII null character '\0' is often a good candidate, or you can go with nitzmahone's idea and use more than one character, then it can be as crazy as you want.

Alternatively, you can parse the input and escape any instances of your delimiting character.

Quick Joe Smith
+4  A: 

\0 is a good split character. It's pretty hard (impossible?) to enter from keyboard and it makes logical sense.

\n is another good candidate in some contexts.

And of course, .Net strings are unicode, no need to limit yourself with the first 255. You can always use a rare Mongolian letter or some reserved or unused Unicode symbol.

yu_sha
can it end up in ex.message texts?
JL
Depends on who throws an exception. \n can actually occur. But you can use some rare Unicode character!
yu_sha
+15  A: 

We currently use

public const char Separator = ((char)007);

I think this is the beep sound, if i am not mistaken.

astander
so I guess hopefully it should never occur in exception messages from try and catch blocks?
JL
Not sure if this is the best answer, but it was the most original from what I was given.
JL
The name is Beep. Console Beep. ;)
RCIX
Should be fun when writing your lines to the console :)
Carra
I thought it is "bell" rather than "beep"? Anyhow, it's gonna sound like beep anyway :P
o.k.w
LOL... bell is a good one to use.. who's going to type that!
CVertex
I think it's good to have a bit of Bond in our software...
Skilldrick
The ASCII name is BEL.
Guffa
+2  A: 

I'd personally say that it depends on the situation entirely; if you're writing a simple TCP/IP chat system, you obviously shouldn't use '\n' as the split.. But '\0' is a good character to use due to the fact that the users can't ever use it!

Siyfion
+3  A: 

It depends what you're splitting.

In most cases it's best to use split chars that are fairly commonly used, for instance

value, value, value

value|value|value

key=value;key=value;

key:value;key:value;

You can use quoted identifiers nicely with commas:

"value", "value", "value with , inside", "value"

I tend to use , first, then |, then if I can't use either of them I use the section-break char §

Note that you can type any ASCII char with ALT+number (on the numeric keypad only), so § is ALT+21

Keith
+9  A: 

Aside from 0x0, which may not be available (because of null-terminated strings, for example), the ASCII control characters between 0x1 and 0x1f are good candidates. The ASCII characters 0x1c-0x1f are even designed for such a thing and have the names File Separator, Group Separator, Record Separator, Unit Separator. However, they are forbidden in transport formats such as XML.

In that case, the characters from the unicode private use code points may be used.

One last option would be to use an escaping strategy, so that the separation character can be entered somehow anyway. However, this complicates the task quite a lot and you cannot use String.Split anymore.

nd
+2  A: 

First of all, in C# (or .NET), you can use more than one split characters in one split operation.

String.Split Method (Char[]) Reference here
An array of Unicode characters that delimit the substrings in this instance, an empty array that contains no delimiters, or null reference (Nothing in Visual Basic).

In my opinion, there's no MOST reliable split character, however some are more suitable than others.

Popular split characters like tab, comma, pipe are good for viewing the un-splitted string/line.

If it's only for storing/processing, the safer characters are probably those that are seldom used or those not easily entered from the keyboard.

It also depend on the usage context. E.g. If you are expecting the data to contain email addresses, "@" is a no no.

Say we were to pick one from the ASCII set. There are quite a number to choose from. E.g. " ` ", " ^ " and some of the non-printable characters. Do beware of some characters though, not all are suitable. E.g. 0x00 might have adverse effect on some system.

o.k.w
A: 

"|" pipe sign is mostly used when you are passing arguments.. to the method accepting just a string type parameter. This is widely used used in SQL Server SPs as well , where you need to pass an array as the parameter. Well mostly it depends upon the situation where you need it.

Sumeet
+2  A: 

You can safely use whatever character you like as delimiter, if you escape the string so that you know that it doesn't contain that character.

Let's for example choose the character 'a' as delimiter. (I intentionally picked a usual character to show that any character can be used.)

Use the character 'b' as escape code. We replace any occurance of 'a' with 'b1' and any occurance of 'b' with 'b2':

private static string Escape(string s) {
   return s.Replace("b", "b2").Replace("a", "b1");
}

Now, the string doesn't contain any 'a' characters, so you can put several of those strings together:

string msg = Escape("banana") + "a" + Escape("aardvark") + "a" + Escape("bark");

The string now looks like this:

b2b1nb1nb1ab1b1rdvb1rkab2b1rk

Now you can split the string on 'a' and get the individual parts:

b2b1nb1nb1
b1b1rdvb1rk
b2b1rk

To decode the parts you do the replacement backwards:

private static string Unescape(string s) {
   return s.Replace("b1", "a").Replace("b2", "b");
}

So splitting the string and unencoding the parts is done like this:

string[] parts = msg.split('a');
for (int i = 0; i < parts.length; i++) {
  parts[i] = Unescape(parts[i]);
}

Or using LINQ:

string[] parts = msg.Split('a').Select<string,string>(Unescape).ToArray();

If you choose a less common character as delimiter, there are of course fewer occurances that will be escaped. The point is that the method makes sure that the character is safe to use as delimiter without making any assumptions about what characters exists in the data that you want to put in the string.

Guffa
the problem I see with this method, is lets say your string contained b1 in the original string for example "point b1: an apple", now you would have "point b1: b1n b1pple" after escape, now when you unescape you would have "point a: an apple" - so this completely destroys your method. Might as well have started off with an obscure character in the first place, don't you think?
JL
@JL: You are mistaken. After escaping the string is "point b21: b1n b1pple". Unescaping it gives the original string. The reason that I chose a common character is to prove that the method is completely safe. Usually you would choose a less used character to minimise the number of characters that needs to be escaped.
Guffa
A: 

I personally use one of those Chinese characters.Because:

  • I don't have any Chinese customer.
  • Characters with Code under 256 can be writen without changing keyboard layout and all Chinese chars have a code more than 255.
  • They are hard to insert(some of them need more than 20 keystrokes).
  • I Cannot find any harder language.
  • Chinese use 99% of their CPU for speaking(it is a hard language) ==> no one can bruteforce his/her keyboard to find the character.
Behrooz