views:

85

answers:

4

Language = C#.NET

Anything that is between [STX] and [ETX] must be accepted rest of the things must be rejected.

string startparam = "[STX]";
string endparam = "[ETX]";

String str1 = "[STX]some string 1[ETX]"; //Option 1
String str2 = "sajksajsk [STX]some string 2 [ETX] saksla"; //Option 2
String str3 = "[ETX] dksldkls [STX]some string 3 [ETX]ds ds"; //Option 3
String str4 = "dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"; //Option 4

/* the various strings can be appended and converted to a single 
   string using string builder or treat them as different strings*/

ProcessString (string str , string startparam , string endparam)
{
   //What To Write here using RegEX or String Functions in c#

}

/* The output after passing these to a ProcessString () */     
/* Append Output To a TextBox or Append it to a String using For Loop.*/

/* Output Required */

some string 1 
some string 2
some string 3
some string 4.1 
some string 4.2

=============================================================================

EDIT 2

Language = C#

string str = "
[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldk[STX]ls [STX]some st[ETX]ring 4.1[ETX]ds ds [STX]some string 4.2[ETX] jdskjd";

How can i get the same output if the string array is one single string

/* output */
some string 1 
some string 2
some string 3
some string 4.1 
some string 4.2


/*case 1*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 2*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] ddd" 
the output should be just "dskd1"

/*case 3*/ 
the above string can be " kdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 4*/ 
the above string can be "[STX] djk[STX]dsj [STX]dskd2[ETX] ddd" 
the output should be just "dskd2"

The real problem comes when [STX] followed by [STX] i want to consider the newer [STX] and start string processing from the newer [STX] occurance. Eg. Case 2 above

=============================================================================

EDIT 3 : New Request

Language = C#

If i want the data between [STX] and [STX] also can that also be done.

New RegEx which will extract data between 1. [STX] some Data [STX] 2. [STX] some Data [ETX]

Eg.

/* the above string can be */
"[STX] djk[STX]dsj [STX]dskd2[ETX] ddd" 
/* the output should be just */
djk
dsj
dskd2

As [STX] means a transmission has been started so i want to extract data between STX as well.

A: 

Try this:

Regex regex = new Regex(@"\[STX\](.*?)\[ETX\]", RegexOptions.IgnoreCase);

And then just pick out the group to get the string between the tags

Oskar Kjellin
This will fail in strings like `[STX] not this [STX] THIS! [ETX]`.
Tim Pietzcker
@Tim it depends on what you want... If you want the things between the outer or the inner.
Oskar Kjellin
He said he wants the inner ones. However, he wrote this in an answer instead of editing his question...
Tim Pietzcker
@Tim Didn't see that (it was actually posted after my answer)
Oskar Kjellin
A: 

EDIT: to fit your updated requirements you should use this pattern that takes advantage of look-arounds to skip all STX groups except the last one that has an ETX after it:

string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";

Here's a complete example:

string input = @"[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd
[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]
[STX] djkdsj [STX]dskd1[ETX] ddd
kdsj [STX]dskd1[ETX] dsnds[ETX] 
[STX] djk[STX]dsj [STX]dskd2[ETX] ddd";

string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";

foreach(Match m in Regex.Matches(input, pattern))
{
    // result will be in first group
    Console.WriteLine(m.Groups[1].Value);
}

I also added the \s* between the grouping to eliminate extra whitespace. By doing so you no longer need to use Trim() as I suggested in my earlier response below.


PREVIOUS RESPONSE

This pattern should fit: "\[STX](.+?)\[ETX]"

Notice that the opening bracket, [, must be escaped to prevent it from being interpreted as a character class in regex. The closing bracket, ] need not be escaped. The (.+?) is a capturing group (due to the parentheses) and matches at least one character in a non-greedy fashion (via the ?). By being non-greedy it prevents the regex engine from greedily matching multiple occurrences and content till the last "[ETX]" occurrence. Remove the ? and you'll see what I mean in your str4 example. Since your last example has multiple occurrences you can use the Matches method.

string[] inputs =
{
    "[STX]some string 1[ETX]",
    "sajksajsk [STX]some string 2 [ETX] saksla",
    "[ETX] dksldkls [STX]some string 3 [ETX]ds ds",
    "dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"
};

string pattern = @"\[STX](.+?)\[ETX]";

foreach (string input in inputs)
{
    Console.WriteLine("Input: " + input);
    foreach(Match m in Regex.Matches(input, pattern))
    {
        // result will be in first group
        Console.WriteLine(m.Groups[1].Value);
    }

      Console.WriteLine();
}

You might consider using a Trim() to trim any excess spaces (m.Groups[1].Value.Trim()). It's possible to achieve in the pattern but complicates it unnecessarily. Use the overload that accepts RegexOptions.IgnoreCase if you need to ignore the case of the "STX" and "ETX" text (if they aren't always in uppercase form).

Ahmad Mageed
Why the downvote?
Ahmad Mageed
A: 

Language = C#

string str = "
[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldk[STX]ls [STX]some st[ETX]ring 4.1[ETX]ds ds [STX]some string 4.2[ETX] jdskjd";

How can i get the same output if the string array is one single string

/* output */
some string 1 
some string 2
some string 3
some string 4.1 
some string 4.2


/*case 1*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 2*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] ddd" 
the output should be just "dskd1"

/*case 3*/ 
the above string can be " kdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 4*/ 
the above string can be "[STX] djk[STX]dsj [STX]dskd2[ETX] ddd" 
the output should be just "dskd2"

The real problem comes when [STX] followed by [STX] i want to consider the newer [STX] and start string processing from the newer [STX] occurance. Eg. Case 2 above

Sanket S.
@Sanket please edit your original question at the top and include this information. Then delete this post since it is not an answer but is part of the question. Thanks. It looks like you might have signed in from 2 different places since your logo and reputation don't match. Try to sign in to your original OpenID provider that you used to post the question.
Ahmad Mageed
+1  A: 
(?<=\[STX\])(?:(?!\[STX\]).)*?(?=\[ETX\])

matches any text (except newlines) between [STX] and [ETX]:

(?<=\[STX\])  # Are we right after [STX]? If so,...
(?:           # match 0 or more of the following:
 (?!\[STX\])  # (as long as it's not possible to match [STX] here)
 .            # exactly one character
 )*?          # repeat as needed until...
(?=\[ETX\])   # there is a [ETX] ahead.

This will always match somestring in each of the following:

blah blah [STX]somestring[ETX] blah blah
[STX]somestring[ETX] blah [STX]somestring[ETX] (hey, two matches here!)
[STX] not this! [STX]somestring[ETX] not this either! [ETX]
blah [ETX] [STX]somestring[ETX] [STX] bla bla

A full reference on positive/negative lookbehind and lookahead assertions (three of which are used in this regex) can be found in Jan Goyvaerts' excellent regular expression tutorial at http://www.regular-expressions.info/lookaround.html.

Tim Pietzcker
I am just starting with RegEx its very difficult to understand it can you further simplify how this works , some web link which would be helpful in understanding RegEx easily.
Sanket S.
what does <= in (?<=\[STX\]) mean in regex terms
Sanket S.
`(?<=XXX)` is a positive [lookbehind assertion](http://www.regular-expressions.info/lookaround.html "Lookaround"). It means "look backwards from the current position and see if there is a `XXX` there".
Tim Pietzcker