I wanna extract https://www.sth.com/yment/Paymentform.aspx
from below string
<form id='paymentUTLfrm' action='https://www.sth.com/yment/Paymentform.aspx' method='post'>
How can I do it with Regex
or somthing ?
I wanna extract https://www.sth.com/yment/Paymentform.aspx
from below string
<form id='paymentUTLfrm' action='https://www.sth.com/yment/Paymentform.aspx' method='post'>
How can I do it with Regex
or somthing ?
Use Html Agility Pack. It will save you a lot of trouble in the long run.
using HtmlAgilityPack;
var doc = new HtmlDocument();
doc.LoadHtml("<form id='paymentUTLfrm' action='https://www.sth.com/yment/Paymentform.aspx' method='post'>");
var form = doc.DocumentNode.SelectSingleNode("id('paymentUTLfrm')");
string action = form.Attributes["action"].Value;
It supports loading pages directly from the web, as well as XPath (used above). The HTML does not have to be valid.
EDIT: If you want to use the name:
doc.DocumentNode.SelectSingleNode("//*[@name='paymentUTLfrm']");
While I would agree that general html parsing is best done with html agility pack (etc) rather than with regex, this is a pretty simple requirement and a regex would be appropriate. I am no regex expert, but this one works:
action=["'](.*)["']
The (.*) will capture the url
maybe some expert can add a comnent to refine this...
While I don't encourage using regex to parse HTML, this is simple enough that a regex will suffice. For more complex operations, do use a proper (X)HTML parser like HtmlAgilityPack.
This regex should work:
<\s*form[^>]*\s+action=(["'])(.*?)\1
Updated regex so it will work with apostrophes in URLs. Note that the URL is now in the 2nd capture group.
See it on rubular