I have a string (partly HTML) where I want to replace the string :-) into bbcode :wink:. But this replacement should not happen within <pre>, but in any other tag (or even not within a tag).
For example, I want to replace
:-)<pre>:-)</pre><blockquote>:-)</blockquote>
to:
:wink:<pre>:-)</pre><blockquote>:wink:</blockquote>
I alrea...
I have this string containing a large chunk of html and am trying to extract the link from href="..." portion of the string. The href could be in one of the following forms:
<a href="..." />
<a class="..." href="..." />
I don't really have a problem with regex but for some reason when I use the following code:
String innerHTM...
Possible Duplicate:
How can I remove external links from HTML using Perl?
Alright, i'm working on a job for a client right now who just switched up his language choice to Perl. I'm not the best in Perl, but i've done stuff like this before with it albeit a while ago.
There are lots of links like this:
<a href="/en/subtitles/35...
Ever since I asked how to parse html with regex and got bashed a bit (rightfully so), I've been studying HTML::TreeBuilder, HTML::Parser, HTML::TokeParser, and HTML::Elements Perl modules.
I have HTML like this:
<div id="listSubtitlesFilm">
<dt id="a1">
<a href="/45/subtitles-67624.aspx">
.45 (2006)
</a>
</dt>
</div>
...
I'm trying to retrieve specific tags with their content out of an xhtml document, but it's matching the wrong ending tags.
In the following content:
<cache_namespace name="content">
<content_block id="15">
some content here
<cache_namespace name="user">
<content_block id="welcome">
Welcome Apiko...
What's the easiest way in Java to retrieve all elements with a certain type in a malformed HTML page? So I want to do something like this:
public static void main(String[] args) {
// Read in an HTML file from disk
// Retrieve all INPUT elements regardless of whether the HTML is well-formed
// Loop through all elements and r...
Hi. I am a newbie when it comes to using Nokogirie reader to parse an xml file. Here is the xml file I want to parse and sample code:
<?xml version='1.0' encoding='UTF-8'?>
<inventory>
<tire name="super slick racing tire" />
<tire name="all weather tire" />
</inventory>
-------------------------------------------------------------...
I have been trying to get BeautifulSoup (3.1.0.1)to parse a html page that has a lot of javascript that generates html inside tags.
One example fragment looks like this :
<html><head><body><div>
<script type='text/javascript'>
if(ii > 0) {
html += '<span id="hoverMenuPosSepId" class="hoverMenuPosSep">|</span>'
}
html +=
'<div class=...
I'm using this code to find all interesting links in a page:
soup.findAll('a', href=re.compile('^notizia.php\?idn=\d+'))
And it does its job pretty well. Unfortunately inside that a tag there are a lot of nested tags, like font, b and different things... I'd like to get just the text content, without any other html tag.
Example of l...
How can one extract data from a rendered web page?
In which java script would update the data with time.
Is it possible to write user script which can access varibles from webpage java script?
Please suggest possible way to achieve this.
...
I've to automate a file download activity from a website (similar to, let's say, yahoomail.com). To reach a page which has this file download link, i've to login, jump from page to page to provide some parameters like dates etc., and finally click on download link.
I am thinking of three approaches:
Using WatIN and develop a windows s...
I'm trying to parse an HTML file for strings in this format:
<a href="/userinfo/userinfo.aspx?ID=305157" target="main">MyUsername</a> O22</td>
I want to retrieve the information where "305157", "MyUsername" and the first letter in "O22" (which can be either T, K or O).
I'm using this regex; <a href="/userinfo/userinfo\.aspx\?ID=\d*"...
I have a question about parsing HTML pages, specificaly forums,
i want to parse a forum or thread containing certain post criterias, i havent defined the
algorithm yet, since i have only parsed structure text formats before,
A use case may be copy and paste each thread into the program by hand, or insert a URL like
http://www.forums....
Parsing is something i come accross alot in development, but as a junior its one of those things i assume i will get the hang of at some point, when its needed. In my current project ive been told to find and use an HTML parser for a certain function, I have found a couple on the web, but what does an HTML parser actually do? And what do...
I have a DotNetNuke skin that has a single CSS file over 3,500 lines long. It contains styles for YUI, Telerik, Cluetip as well as the actual customisation of the site. The old developers just kept adding styles and never cleaned up the old unused ones.
I want to cleanup the file and get it to a more managable size. I first thought abou...
I'm using the COBRA HTMLParser but haven't had luck parsing one particular tag. Here's the source:
<li id="eta" class="hentry">
<span class="body">
<span class="actions">
</span>
<span class="content">
</span>
<span class="meta entry">Content here
</span>
<span class="meta entry stub">Content here
<span...
Hi,
I was wondering if there is a library in .Net to clean up and remove unclosed tags in an html document?
...
I've got a comma separated list in a table cell in an HTML document, but some of items in the list are linked:
<table>
<tr>
<td>Names</td>
<td>Fred, John, Barry, <a href="http://www.example.com/">Roger</a>, James</td>
</tr>
</table>
I've been using beautiful soup to parse the html, and I can get to the table, but ...
hi there
Because I'm a non-native English person, i use a lot a dictionary.
Now I'm learning C# and i was thinking to if I'm allowed to build an application which will run on my machine, but it will use the google/babefish translate service, or any other translation/dictionary online tool. It takes time to go on the browser each time ...
I use regexps to transform text as I want, but I want to preserve the HTML tags.
e.g. if I want to replace "stack overflow" with "stack underflow", this should work as
expected: if the input is stack <sometag>overflow</sometag>, I must obtain stack <sometag>underflow</sometag> (i.e. the string substitution is done, but the
tags are sti...