ansaurus

Question

How Parsing string between [STX] and [ETX] using C# - Split/Append output using Regex or String Functions

Answer 1

A:

Try this:

Regex regex = new Regex(@"\[STX\](.*?)\[ETX\]", RegexOptions.IgnoreCase);

And then just pick out the group to get the string between the tags

Oskar Kjellin 2010-09-27 19:40:57

This will fail in strings like `[STX] not this [STX] THIS! [ETX]`.

Tim Pietzcker 2010-09-28 11:59:43

@Tim it depends on what you want... If you want the things between the outer or the inner.

Oskar Kjellin 2010-09-28 12:40:11

He said he wants the inner ones. However, he wrote this in an answer instead of editing his question...

Tim Pietzcker 2010-09-28 12:56:23

@Tim Didn't see that (it was actually posted after my answer)

Oskar Kjellin 2010-09-28 13:09:41

Answer 2

A:

EDIT: to fit your updated requirements you should use this pattern that takes advantage of look-arounds to skip all STX groups except the last one that has an ETX after it:

string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";

Here's a complete example:

string input = @"[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd
[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]
[STX] djkdsj [STX]dskd1[ETX] ddd
kdsj [STX]dskd1[ETX] dsnds[ETX] 
[STX] djk[STX]dsj [STX]dskd2[ETX] ddd";

string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";

foreach(Match m in Regex.Matches(input, pattern))
{
    // result will be in first group
    Console.WriteLine(m.Groups[1].Value);
}

I also added the \s* between the grouping to eliminate extra whitespace. By doing so you no longer need to use Trim() as I suggested in my earlier response below.

PREVIOUS RESPONSE

This pattern should fit: "\[STX](.+?)\[ETX]"

Notice that the opening bracket, [, must be escaped to prevent it from being interpreted as a character class in regex. The closing bracket, ] need not be escaped. The (.+?) is a capturing group (due to the parentheses) and matches at least one character in a non-greedy fashion (via the ?). By being non-greedy it prevents the regex engine from greedily matching multiple occurrences and content till the last "[ETX]" occurrence. Remove the ? and you'll see what I mean in your str4 example. Since your last example has multiple occurrences you can use the Matches method.

string[] inputs =
{
    "[STX]some string 1[ETX]",
    "sajksajsk [STX]some string 2 [ETX] saksla",
    "[ETX] dksldkls [STX]some string 3 [ETX]ds ds",
    "dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"
};

string pattern = @"\[STX](.+?)\[ETX]";

foreach (string input in inputs)
{
    Console.WriteLine("Input: " + input);
    foreach(Match m in Regex.Matches(input, pattern))
    {
        // result will be in first group
        Console.WriteLine(m.Groups[1].Value);
    }

      Console.WriteLine();
}

You might consider using a Trim() to trim any excess spaces (m.Groups[1].Value.Trim()). It's possible to achieve in the pattern but complicates it unnecessarily. Use the overload that accepts RegexOptions.IgnoreCase if you need to ignore the case of the "STX" and "ETX" text (if they aren't always in uppercase form).

Ahmad Mageed 2010-09-27 20:03:41

Why the downvote?

Ahmad Mageed 2010-09-28 18:33:13

Answer 3

A:

Language = C#

string str = "
[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldk[STX]ls [STX]some st[ETX]ring 4.1[ETX]ds ds [STX]some string 4.2[ETX] jdskjd";

How can i get the same output if the string array is one single string

/* output */
some string 1 
some string 2
some string 3
some string 4.1 
some string 4.2


/*case 1*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 2*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] ddd" 
the output should be just "dskd1"

/*case 3*/ 
the above string can be " kdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 4*/ 
the above string can be "[STX] djk[STX]dsj [STX]dskd2[ETX] ddd" 
the output should be just "dskd2"

The real problem comes when [STX] followed by [STX] i want to consider the newer [STX] and start string processing from the newer [STX] occurance. Eg. Case 2 above

Sanket S. 2010-09-28 07:22:50

@Sanket please edit your original question at the top and include this information. Then delete this post since it is not an answer but is part of the question. Thanks. It looks like you might have signed in from 2 different places since your logo and reputation don't match. Try to sign in to your original OpenID provider that you used to post the question.

Ahmad Mageed 2010-09-28 11:48:35

Answer 4

+1 A:

(?<=\[STX\])(?:(?!\[STX\]).)*?(?=\[ETX\])

matches any text (except newlines) between [STX] and [ETX]:

(?<=\[STX\])  # Are we right after [STX]? If so,...
(?:           # match 0 or more of the following:
 (?!\[STX\])  # (as long as it's not possible to match [STX] here)
 .            # exactly one character
 )*?          # repeat as needed until...
(?=\[ETX\])   # there is a [ETX] ahead.

This will always match somestring in each of the following:

blah blah [STX]somestring[ETX] blah blah
[STX]somestring[ETX] blah [STX]somestring[ETX] (hey, two matches here!)
[STX] not this! [STX]somestring[ETX] not this either! [ETX]
blah [ETX] [STX]somestring[ETX] [STX] bla bla

A full reference on positive/negative lookbehind and lookahead assertions (three of which are used in this regex) can be found in Jan Goyvaerts' excellent regular expression tutorial at http://www.regular-expressions.info/lookaround.html.

Tim Pietzcker 2010-09-28 11:55:50

I am just starting with RegEx its very difficult to understand it can you further simplify how this works , some web link which would be helpful in understanding RegEx easily.

Sanket S. 2010-09-29 06:17:38

what does <= in (?<=\[STX\]) mean in regex terms

Sanket S. 2010-09-29 06:26:25

`(?<=XXX)` is a positive [lookbehind assertion](http://www.regular-expressions.info/lookaround.html "Lookaround"). It means "look backwards from the current position and see if there is a `XXX` there".

Tim Pietzcker 2010-09-29 06:42:24

ansaurus

tags:

views:

answers:

How Parsing string between [STX] and [ETX] using C# - Split/Append output using Regex or String Functions

EDIT 2

EDIT 3 : New Request

related questions