ansaurus

Question

Answer 1

+2 A:

Sure, you can identify the start and the end of your desired substring by string methods such as IndexOf, then get the desired Substring! In your example, you want to locate (with IndexOf) the "contents=" and then the first following ", right? And once you have those indices into the string, Substring will work fine. (Not posting C# code because I'm not entirely sure of what exactly it IS that you want, beyond IndexOf and Substring...!-)

If so, then:

int first = str.IndexOf("contents=\"");
int last = str.IndexOf("\"", first + 10);
return str.Substring(first + 10, last - first - 10);

should more or less do what you want (apologies in again if there's an off-by-one or so in those hardcoded 10s -- they're meant to stand for the length of the first substring you're looking for; adjust them a little bit up or down until you get exactly the result you want!-), but this is the general concept. Locate the start with single-argument IndexOf, locate the end with two-args IndexOf, slice off the desired piece with Substring...!

Alex Martelli 2009-10-14 04:57:23

thats right, what i'm after is the text inbetween both quotes like inside the content tag like this: content="i need this text"

baeltazor 2009-10-14 05:05:30

thanks for the code Alex, but its nowhere near close, it always extracts the first 15 or so chars of the beginning of the file.. weird???

baeltazor 2009-10-14 06:51:19

What do you see when you add output statements to show the value of first and last?

Alex Martelli 2009-10-14 14:59:09

Answer 2

A:

Sure you can do it with out Regex. Say you want to get the text between < and >...

string GetTextBetween(string content)
{
  int start = content.IndexOf("<");
  if(start == -1) return null; // Not found.
  int end = content.IndexOf(">");
  if(end == -1) return null;  // end not found
  return content.SubString(start, end - start);
}

RichAmberale 2009-10-14 05:00:15

Answer 3

+1 A:

if the input is : text1/text2/text3

The below regex will give the 2 in the group i.e, TEXT3

^([^/]*/){2}([^/]*)/$


if you need the last text always, then use the below

^.*/([^/]*)/$

solairaja 2009-10-14 05:01:02

I think OP is looking for a non-regex solution.

Goose Bumper 2009-11-10 14:37:05

Answer 4

+4 A:

Since you give an xml example, just use an xml parser:

string s = (string) XElement.Parse(xml).Attribute("content");

xml is not a simple text format, and Regex isn't really a very good fit; using an appropriate tool will protect you from a range of evils... for example, the following is identical as xml:

<meta
    name="description"
    content=
        'THIS IS THE TEXT I WANT TO EXTRACT'
/>

It also means that when the requirement changes, you have a simple tweak to make to the code, rather than trying to unpick a regex and put it back together again (which can be tricky if you are access a non-trivial node). Equally, xpath might be an option; so in your data the xpath:

/meta/@content

is all you need.

If you haven't got .NET 3.5:

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
string s = doc.DocumentElement.GetAttribute("content");

Marc Gravell 2009-10-14 05:05:39

This is really nice. Thanks for that one! =)

Carl Bergquist 2009-10-14 05:32:58

ansaurus

tags:

views:

answers:

Using String methods instead of Regex

related questions