tags:

views:

97

answers:

3

I need to create an easy way to split up some strings into formatted strings, for example, i have this string

":[email protected] PRIVMSG #channel :test message"

and i need to split that into:

string nickname = "JStoker"
string ident = "stoker"
string host = "jcs.me.uk"
string channel = "#channel"
string message = "test message"

and i need to do that in a way that if say i get a string like

":irc.testnet.com PRIVMSG #channel :test message"

for instance, i would need something like

string nickname = "irc.testnet.com"
string ident = ""
string host = ""
string channel = "#channel"
string message = "test message"

through the same thing, without throwing an error... and the string im using changes all the time, if your familiar this is raw IRC data.. i just need to know how to parse the data efficiantly.

possibly could be done through Regex but im not sure. please help! ~ code examples please

+2  A: 

Yes, a regular expression like this should do it:

^:(\w+)(?:!(\w+)@([\w\.]+))? PRIVMSG (#\w+) :(.+)$

Example:

Match m = Regex.Match(input, @"^:(\w+)(?:!(\w+)@([\w\.]+))? PRIVMSG (#\w+) :(.+)$");
string nickname = m.Groups[1].Value;
string ident = m.Groups[2].Value;
string host = m.Groups[3].Value;
string channel = m.Groups[4].Value;
string message = m.Groups[5].Value;

Note: \w matches A-Z a-z 0-9 _, you might want to use a different set depending on which characters the different identifiers may contain.

Guffa
how can i use that to get the different strings?
Tommy
@sniperX: I added an example above.
Guffa
+1  A: 
/\"\:(?:(.+)\!(.+)\@)?([^ ]+) PRIVMSG([^ ]+) \:(.+)\"/

$nick = $3
$ident = $1
$host = $2
$chan = $4
$message = $5

i escaped all chars just because it depends on the regexp engine. you should unescape the ones that arent special chars depending on what you use

john
+1  A: 

What I do for IRC message splitting is (in simple terms as I don't remember the exact code in C#), is:

  • Remove the first :
  • Split on :, this gives you two elements, the last "message" parameter, and everything else
  • Split the "everything else" on space, which will give you all the other parameters.
  • Then you can use a simple method to parse the nick string into its different parts (two more splits should do it)

This method, to me, is more apt than creating a regex for it, though I am unsure about the performance difference (I'd be willing to bet it doesn't really matter either way if you're just writing a client)

Alternatively you could do this:

  • Split the string on space
  • Walk through the resulting array, and check if the element starts with :, if it does, join that and the following elements with a space to get the full string.

I'm not sure which is "faster", though, but I believe the second is less elegant.

These should work no matter the command you're getting in (and as such can be used for generic parsing), and you have to pay attention to the fact that not all commands will have an element that starts with :. For instance, the NICK command allows only a single word, and such does not usually come escaped with :, other commands have multiple single words before the : (the USER command has two)

Daniel Bruce