I need to supply a keyword like "blue metal kettle" (with/without quotes) and get only the number of results found for this search. If I search without quotes right now, I get:
Results 1 - 10 of about 1,040,000 for blue metal kettle. (0.19 seconds)
Here '1,040,000' is the number I want. Is there any API function to do this, or I must...
I'm looking for a PDF library which will allow me to extract the text from a PDF document. I've looked at PyPDF, and this can extract the text from a PDF document very nicely. The problem with this is that if there are tables in the document, the text in the tables is extracted in-line with the rest of the document text. This can be prob...
There's a lot of scholarly work on HTML content extraction, e.g., Gupta & Kaiser (2005) Extracting Content from Accessible Web Pages, and some signs of interest here, e.g., one, two, and three, but I'm not really clear about how well the practice of the latter reflects the ideas of the former. What is the best practice?
Pointers to goo...
Hi,
From my windows application, i want to detect selected text in "Internet Explorer", Firefox and any other browser.
Do you know what piece of code should i use in order to achieve this?
Thanks,
The idea is not to search for a text in IE, but instead "capture the selected text" in IE. By the way not only IE, but any windows applica...
Using sed or similar how would you extract lines from a file? If I wanted lines 1, 5, 1010, 20503 from a file, how would I get these 4 lines?
What if I have a fairly large number of lines I need to extract?
If I had a file with 100 lines, each representing a line number that I wanted to extract from another file, how would I do that?
...
My question is sort of like this question but I have more constraints:
I know the document's are reasonably sane
they are very regular (they all came from the same source
I want about 99% of the visible text
about 99% of what is viable at all is text (they are more or less RTF converted to HTML)
I don't care about formatting or even pa...
If you can help with this you're a genius.
Basically, I will have some text like this:
<parent wealthy>
<parent>
<children female>
<child>
jessica
<hobbies>
basketball, soccer, video games
</hobbies>
</child>
<child>
jane
<hobbies>
...
I'm trying to figure out how to extract dates from unstructured text using Ruby.
For example, I'd like to parse the date out of this string "Applications started after 12:00 A.M. Midnight (EST) February 1, 2010 will not be considered."
Any suggestions?
...
I'd like to know if any (experimental or not ) wrapper induction libraries for java exist.
Given a website of choice I would like to be able to point my code to product-pages of a particular website. The Wrapper Induction library should be able to:
- infer the 'wrapper' or schema of the product pages from a couple of examples.
- have ...
I have a series of text items- raw HTML from a MYSQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching).
My example is any review on Yelp.com, that shows 3 snippets from hundreds of reviews of a given restaurant, in the format:
"Try ...
I need to extract window content if this is based on text, or at least the file path associated to that window. To-date, I have considered:
1. win32api
2. 3rd party libraries
3. wrapper classes
However, I am not satisfied with the solutions. So any ideas how this can be done in a clean way?
...
Basically, I want to extract the strings "AAA", "BBB", "CCC", "DDD" from a text file..
...... (other text goes here).....
<TD align="left" class=texttd><font class='textfont'>AAA</font></TD>
..... (useless text here).....
<TD align="left" class=texttd><font class='textfont'>BBB</font></TD>
....(more text).....
<TD align="left" class=tex...
Posterous allows you to post a myraid of objects via email. We would like to allow users to reply to an email we send them, and we extract out the content to use somewhere.
What is the most effective way of doing that?
...
I’ve been working on an extension for automating tests in Chrome, and I came across an obscure issue with JavaScript dialogs. The message shown in the dialog can’t be readily retrieved/copied. I’ve used the GetWindowText and InternalGetWindowText functions, but they only return the title of the dialog and the text from the buttons, not ...
Hi all
I've simply used the following program on the url below
http://jericho.htmlparser.net/samples/console/src/ExtractText.java
My goal is to be able to extract the main body text, to be able to summarize it and present the summarized text as output to the user.
My problem is that, I'm not sure how I'd modify the above program to on...
I want to get the %tagname% from a file and copy them to a dictionary only tagname in python.
...
I'm trying to get my way through Poppler and its (lack of) documentation.
What I want to do is a very simple thing: open a PDF file and read the text in it. I'm then going to process the text, but that doesn't really matter here.
So... I saw the poppler_page_get_text function, and it kind of works, but I have to specify a selection rec...
Hello,
I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords).
And it seems like the accepted suggestion ( point-wise mutual information algorithm) is meant to work on bigger documents.
With this constrain(working on ...
I need some directions for the following problem:
I have a lot of InDesign files and i have to setup a process that will track if a certain paragraph or text block has changed between diferent versions of the file. If the text block has changed i want to extract that text block in a "portable" format (html, pdf, txt).
Is there an Adob...
I want to extract some keywords out of a query string for a search application in asp.net.
I decoded the url string first, so it's plain text
I have this to start with, but I want to add a keyword group
([\?\&])q=[^\&]+[\&]?
I get this ?q=harbour landing dental&
I'd like to trim off the stuff for pure words, but not sure if that's...