tags:

views:

159

answers:

4

Sorry, this is probably really easy. But if you have a delimiter character on each line and you want to find all of the text before the delimiter on each line, what regular expression would do that? I don't know if the delimiter matters but the delimiter I have is the % character.

+1  A: 

Your text will be in group 1.

/^(.*?)%/

Note: This will capture everything up the percent sign. If you want to limit what you capture replace the . with the escape sequence of your choice.

Jesse
As I mentioned, regex is unnecessary for this problem, but at least that's the right one :-).
Tom
I agree, just answering the question :-) Most languages will have something much simpler for such straightforward tasks.
Jesse
...as your answer clearly shows.
Jesse
The following is a little more explicit, but still essentially the same: /^([^%]*)%/
dburke
A: 

you don't have to use regex if you don't want to. depending on the language you are using, there will be some sort of string function such as split().

$str = "sometext%some_other_text";
$s = explode("%",$str,2);
print $s[0];

this is in PHP, it split on % and then get the first element of the returned array. similarly done in other language with splitting methods as well.

ghostdog74
Although this probably would not be a huge issue, I will mention it again so it is on this post. "exploding" is a bit unnecessary because it has to look at the whole string. It also builds a list (and an extra string objects). Substringing is conceptually simpler and faster because it doesn't require looking at the whole string. (Finding the first occurrence of a delimiter does not require looking at the whole string).
Tom
what if the substring you are looking for as near the end? that is looking at whole string too. "substringing" involves 2 steps, one , finding the index, second, doing the substring. Both involves the call to 2 functions. Is that faster than calling items in memory (ie array) ? do you have a way to benchmark these 2 methods to convince me what you say is true? :)
ghostdog74
A: 

In python, you can use:

def GetStuffBeforeDelimeter(str, delim):
  return str[:str.find(delim)]

In Java:

public String getStuffBeforeDelimiter(String str, String delim) {
  return str.substring(0, str.indexOf(delim));
}

In C++ (untested):

using namespace std;
string GetStuffBeforeDelimiter(const string& str, const string& delim) {
  return str.substr(0, str.find(delim));
}

In all the above examples you will want to handle corner cases, such as your string not containing the delimeter.

Basically I would use substringing for something this simple becaues you can avoid scanning the entire string. Regex is overkill, and "exploding" or splitting on the delimeter is also unnecessary because it looks at the whole string.

Tom
of course I leave error handling as an exercise to the reader, but that's the idea :-).
Tom
substringing looks at the whole string too right? you have to use find() (in python) or indexOf( in java) to find the index.
ghostdog74
@ghostdob74: substring doesn't need to look at the whole string. It can just look at the portion you are substringing. Also, the indexOf or find only needs to scan until the first occurrence of the delimeter and then stop... it doesn't scan to the end unless the delimeter is at the end or not in the string.
Tom
@ghostdog74: exploding the string has to look at the whole thing because after finding the first occurrence of the delimeter, it has to keep looking because there may be more. It also has to unnecessarily construct a list of some sort to store the exploded string.
Tom
can downvoters explain what's wrong?
Tom
A: 

You don't say what flavor of regex, so I'll use Perl notation.

/^[^%]*/m

The first ^ is a start anchor: normally it matches only the beginning of the whole string, but this regex is in multiline mode thanks the 'm' modifier at the end. [^%] is an inverted character class: it matches any one character except a '%'. The * is a quantifier that means to match the previous thing ([^%] in this case) zero or more times.

Alan Moore