i am parsing an html page, let's say this page lists all players in a football team and those who are seniors will be bolded. I can't parse the file line by line and look for the strong tag because in my real example the pattern is much more complex and span multiple lines.
Something like this:
<strong>Senior:</strong> John Smith
Junio...
What's the code to store in a string the whole webpage's content between <body></body> tags?
can be any HTML/XHTML page
can be any encoding (ISOx, UTF-8, Asian-something)
can have attributes in the <body> (may trick the parser)
I've heard about DOMDocument but I'm a big rookie, some code sample would help!
...
Hi, php newbie here..I need some PHP help ideas/examples on how to import data from a delimited text file and map them into html tables. The data should populate and be mapped under its proper header. There are instances also where each record doesn't have all the values and if no data, then we can leave it null (See sample records). I w...
I have some broken html-code that i would like to fix with regex.
The html might be something like this:
<p>text1</p>
<p>text2</p>
text3
<p>text4</p>
<p>text5</p>
But there can be much more paragraphs and other html-elements too.
I want to turn in into:
<p>text1</p>
<p>text2</p>
<p>text3</p>
<p>text4</p>
<p>text5</p>
Is this poss...
Hello!
I want get all mathes from this expression:
import re
def my_handler(matches):
return str(matches.groups())
text = "<a href='#' title='Title here'>"
print re.sub("<[a-zA-Z]+( [a-zA-Z]+=[\#a-zA-Z0-9_.'\" ]+)*>", my_handler, text)
Actual result:
(" title='Title here'",)
Expected result:
("a", " href='#'", " title=...
I apologise if the terminology is not quite correct, ala 'page anchor' but I shall endeavour to explain what I am attempting.
I have an iframe, with links (to same domain) that I would like to have shown in the parent.
<a href="foo.html" target="_parent">bar link</a> works as expected. However, I am attempting to use a URL of the form;...
I'm downloading HTML from a website. The file can be quite large so while the file's downloading, I want to already parse the available chunks of HTML so that the process appears faster for the end-user of my program. I don't have control over how the cunks are generated, so a chunk can begin in the middle of a word, e.g. like so:
chunk...
I want to write an application using the C# that takes a URL as a parameter/input and then get the source code of the page, extract some URLs and some text based on given criteria ...
...
I'm using libxml2 to parse HTML:
static htmlSAXHandler simpleSAXHandlerStruct = {
NULL, /* internalSubset */
NULL, /* isStandalone */
NULL, /* hasInternalSubset */
NULL, /* hasExternalSubset */
NULL, /* res...
Hi, new to the community. been up all night trying to flesh out the underlying html reading system that's at the core of my app's functionally. I could really use a fresh pair of eyes on this one.
Problem: While trying to return a string to be displayed on my app's home activity, I've run into an issue where I'm almost certain that th...
I'm using Beautifulsoup to parse a website
request = urllib2.Request(url)
response = urllib2.urlopen(request)
soup = BeautifulSoup.BeautifulSoup(response)
I am using it to traverse a table. The problem I am running into is that BS is adding an extra end tag for the table into the html which doesn't exist, which I verified with...
Hi guys, I've just run into a little bit of trouble with some PHP on my latest project. Basically I have a block of text ($text) and I would like to search through that text and return all of the MP3 links. I know it has something to do with regular expressions but I just cannot get it working.
Here's my current code:
if(preg_match...
I'm research the different and (sometimes obsolete) Ratings/Classification standards used on the web. i.e. PICS, POWDER, ICRA
Which standard is the most popular (number of sites using it)?
Is there a C# library which will handle any (or all) of these?
...
i have a huge database of scraped forum posts that i am inserting into a website. however alot of people try to use html in their forum posts and often times do it wrong. because of this, there are always stray <strike> <b> </strike> </div> </b> tags in the posts which will end up messing up the webpage format when i add say 15 forum po...
I am in the process of converting my application to use XHTML strict mode (it didn't have a DOCTYPE before). However, I noticed a significant degradation when getting offsetHeight/offsetWidth. This is very noticeable on pages with large number of DOM elements, let's say a table with 1 column by 800 rows, the cells only have a piece of te...
Parsing HTML / JS codes to get info using PHP.
www.asos.com/Asos/Little-Asos-Union-Jack-T-Shirt/Prod/pgeproduct.aspx?iid=1273626
Take a look at this page, it's a clothes shop for kids. This is one of their items and I want to point out the size section. What we need to do here is to get all the sizes for this item and check whether the...
I'm parsing HTML with libxml2, using XPath to find elements. Once I found the element I'm looking for, how can I get the HTML as a string from that element (keeping in mind that this element will have many child elements). Given a document:
<html>
<header>
<title>Some document</title>
</header
<body>
<p id="...
I'm having a difficult time locating an HTML parser that works with JRuby.
I've become fond of using Nokogiri for HTML parsing, but Nokogiri requires the use of bxml2.dll, which I don't have available on my machine and am not sure that I can ensure that it is available on all users' machines.
I attempted to use another favorite, Scruby...
Hey guys, I was wondering if you guys could help me work through accessing the html behind a login page using C and libcurl.
Specific Example:
The website I'm trying to access is https://onlineservices.ubs.com/olsauth/ex/pbl/ubso/dl
Is it possible to do something like this?
The problem is that we have a lot of clients each of which h...
In Java, I am trying to parse an HTML file that contains complex text such as greek symbols.
I encounter a known problem when text contains a left facing quotation mark. Text such as
mutations to particular “hotspot” regions
becomes
mutations to particular “hotspot�? regions
I have isolated the problem by writting a simple text...