hello
as i am not very familiar with regex, is it possible (whether its hard to do or not) to extract certain text inbetween symbols? for example:
<meta name="description" content="THIS IS THE TEXT I WANT TO EXTRACT" />
thank you :)
hello
as i am not very familiar with regex, is it possible (whether its hard to do or not) to extract certain text inbetween symbols? for example:
<meta name="description" content="THIS IS THE TEXT I WANT TO EXTRACT" />
thank you :)
Sure, you can identify the start and the end of your desired substring by string methods such as IndexOf
, then get the desired Substring
! In your example, you want to locate (with IndexOf
) the "contents=" and then the first following "
, right? And once you have those indices into the string, Substring
will work fine. (Not posting C# code because I'm not entirely sure of what exactly it IS that you want, beyond IndexOf and Substring...!-)
If so, then:
int first = str.IndexOf("contents=\"");
int last = str.IndexOf("\"", first + 10);
return str.Substring(first + 10, last - first - 10);
should more or less do what you want (apologies in again if there's an off-by-one or so in those hardcoded 10
s -- they're meant to stand for the length of the first substring you're looking for; adjust them a little bit up or down until you get exactly the result you want!-), but this is the general concept. Locate the start with single-argument IndexOf
, locate the end with two-args IndexOf
, slice off the desired piece with Substring
...!
Sure you can do it with out Regex. Say you want to get the text between < and >...
string GetTextBetween(string content)
{
int start = content.IndexOf("<");
if(start == -1) return null; // Not found.
int end = content.IndexOf(">");
if(end == -1) return null; // end not found
return content.SubString(start, end - start);
}
if the input is : text1/text2/text3
The below regex will give the 2 in the group i.e, TEXT3
^([^/]*/){2}([^/]*)/$
if you need the last text always, then use the below
^.*/([^/]*)/$
Since you give an xml example, just use an xml parser:
string s = (string) XElement.Parse(xml).Attribute("content");
xml is not a simple text format, and Regex
isn't really a very good fit; using an appropriate tool will protect you from a range of evils... for example, the following is identical as xml:
<meta
name="description"
content=
'THIS IS THE TEXT I WANT TO EXTRACT'
/>
It also means that when the requirement changes, you have a simple tweak to make to the code, rather than trying to unpick a regex and put it back together again (which can be tricky if you are access a non-trivial node). Equally, xpath might be an option; so in your data the xpath:
/meta/@content
is all you need.
If you haven't got .NET 3.5:
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
string s = doc.DocumentElement.GetAttribute("content");