views:

43

answers:

4

I'm new to regular expressions, and I need to write a set of regular expressions that match different data packet formats.

My problem is, usually I only need to look for the start and ending parts of the packet to distinguish between them, the data in between is irrelevant.

What's the most efficient way to ignore the data between the start and end?

Here's a simple example. The packet I'm looking for starts with $CH; and ends with #

Currently my regex is \$CH;.*?#

It's the .*? I'm worried about. Is there a better (or more efficient) way to accept any character between the packet header and ending character?

Also, some of the packets have \n chars in the data, so using . won't work at all if it means [^\n].

I've also considered [^\x00]*? to detect any characters since null is never used in the data.

Any suggestions?

+1  A: 

Try this:

\$CH;[\s\S]*?#
Rubens Farias
+4  A: 

\$CH;.*?# is fine and should be quite efficient. You can make it more explicit that there should be no backtracking by writing it as \$CH;[^#]*#, if you like.

You can use (.|\n) or [\w\W] to match truly any character--or even better, use the RegexOptions.Singleline option to change the behavior of .:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

John Kugelman
I like \$CH;[^#]*# best so far. Question: if I used \$CH;[^#]*?# instead, would the question mark be purely redundant?
CodeFusionMobile
Yep, it would be redundant.
John Kugelman
A: 

I would recommend checking the initial and terminal sequences separately using anchored regular expressions.

Sinan Ünür
Can't. The packet I'm trying to match is in the middle of an essentially random data set, so there is nothing to anchor to.
CodeFusionMobile
@CSharperWithJava OK, I did not realize that. In that case, if packets cannot be empty, use `\$CH;[^#]+#`.
Sinan Ünür
+1  A: 

To detect start of line/data use ^ anchor, to detect the end, use $ anchor:

^start.*?end$

Be aware that .*? may fail to match newlines, one option is to change it for [\s\S]*?

Ast Derek
Can't. The packet I'm trying to match is in the middle of an essentially random data set, so there is nothing to anchor to.
CodeFusionMobile