text

PHP: combine text after explode

Using explode I broke up the text into pieces, then us the foreach to look for a few thing in the text. $pieces = explode(' ', $text); foreach ($pieces as $piece) { Some Modification of the piece } My questions so how can I put those pieces back together? So I can wordwrap the text. Some like this: piece 1 + piece 2 + etc ...

PHP Get text between multiple <h1>...</h1> tags

Hello! I have a page with multiple H1 headings followed by text and so on. Example: <h1>Title 1</h1>Some text here <h1>Title 2</h1>Some moretext here <h1>Title 3</h1>Even more text here etc What I want to do is create an array of elements, that is explode the html using as separator <h1>ANY TEXT</h1> above that I have in an $output ...

How to Extract docx (word 2007 above) using apache POI

Hai, i'm using apche poi 3.6 I've already created some code.. XWPFDocument doc = new XWPFDocument(new FileInputStream(file)); wordxExtractor = new XWPFWordExtractor(doc); text = wordxExtractor.getText(); System.out.println("adding docx " + file); d.add(new Field("content", text, Field.Store.NO, Field...

UIImage in the end of UILabel text

Hi there, how to find coordinate of the last character in UILabel if we have more then 1 line of text in it? I would like to add an image in the end of the text. ...

How to select part of a text on mysql?

I have a column saved as LONGTEXT on mysql. This text saves rich text. I'm currently reading all the text then fixing it using javascript to get the first 100 characters in a way not to split the word in its middle. Yet this way don't seem the best way to do it. I want to select a summary directly using the query, yet I also want to be ...

c#: Strict algorithm or library for text search

The Problem: I need a good free library or algorithm to determine whether a text is related to a search pattern or not. The search pattern can be an ordered or unordered list of words. For some searches the order is relevant, for some it is not. Additionally I need the ability to define aliases for searched words (e.g. "(C#|C sharp) cod...

in python; convert list of files to file like object

Er, so im juggling parsers and such, and I'm going from one thing which processes files to another. The output from the first part of my code is a list of strings; I'm thinking of each string as a line from a text file. The second part of the code needs a file type as an input. So my question is, is there a proper, pythonic, way to co...

Extracting japanese characters from pdf file using itext

I am trying to extract text from a pdf file that has japanese characters. I am using iText for this purpose However I am getting this exception. Exception in thread "main" ExceptionConverter: com.itextpdf.text.DocumentException: Font 'KozMinPro-Regular' with 'UniJIS-UCS2-H' is not recognized. Can anybody help me with resolving this iss...

Want top frequent words in english

Hi, I want most frequent words in english. Basically, I am processing wikipedia text and am stuck with lot of words even after removing stop words. I tried googling for frequent words, but got the below link. http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#English I have to manually scrape the data from these link. Is there a...

Is there a way to check the spelling of words in a character vector?

The text to be checked is in Greek, but I would like to know if it can be done for English words too. My initial idea is described here, and I have already found a way to do it using VBA. But I wonder if there's a way to do it using R. If there isn't a way in R, do you think of something better than Excel-vba? ...

Extract text from a PDF and save it to a database - preserving spacing

I have a PDF document containing only text that needs to be saved into a varchar column in MSSQL. The first catch is that the spacing of the text in the PDF needs to be preserved as well, which can't be done simply by copy-pasting from the PDF into SSMS. Okay, so I need an application to read the PDF as text, while preserving spacing. B...

C# label AutoSize adds padding

I have a Label on a Windows.Form. I set the AutoSize property on the label to True and I noticed that when I do that, it pads the right hand side with ~5px of white background. I have the Padding property set to [0, 0, 0, 0]. Is there a way to get rid of this? I would like to get the bounds of the label as close as possible to the text...

How to automate a Google text-messaging app?

I have a group that needs to send out announcements and current events via phone text message. I want to have a central phone number that when it receives a text message, it rebroadcasts that message to a growing list of subscribers. I'm hoping to use a Google Voice number to avoid buying an actual phone number. Any ideas? I've thought ...

Perl Arrays and grep

I think its more a charachters, anyway, I have a text file, consisted of something like that: COMPANY NAME City Addresss, Address number Email phone number and so on... (it repeats itself, but with different data...), lets assume thing text is now in $strting variable. I want to have an array (@row), for exa...

HTML: Options other than 'input type=text' and 'textarea'?

I've seen many uses of rich-text or at least natural-looking text editing being made available with seemingly any of a page's text-containing elements. What are my options to have this for myself? ...

HTML + Javascript: Detecting where in a line of text a click occurred

What's a good way to do this without wrapping each letter with <span> tags and binding onclick functions to each, or something silly like that? ...

Converting a pdf to text/html in python so I can parse it

Dear Python Experts, I have the following sample code where I download a pdf from the European Parliament website on a given legislative proposal: EDIT: I ended up just getting the link and feeding it to adobes online conversion tool (see the code below): import mechanize import urllib2 import re from BeautifulSoup import * adobe = "...

Proper way to change text and elements on a page with JavaScript

Hi, I've been using innerHTML and innerText for a while to change elements and text on web pages and I've just discovered that they are not W3C standard. I've now found that innerHTML can be replaced with createElement, setAttribute and a few others but what is the best method for changing text inside an element? I found a textContent...

Text-Indent vs Position for SEO

What's the best way to hide a text element and replace it with an image while still maintaining good SEO. I've seen negative text-indent, but I prefer absolute positioning with negative top. So what I'd like to know is which is better for SEO. Do most search engines consider text elements with negative top with absolute positioning; lik...

Word boundary detection from text

Hi, I am having this problem with word boundary identification. I removed all the markup of the wikipedia document, now I want to get a list of entities.(meaningful terms). I am planning to take bi-grams, tri-grams of the document and check if it exists in dictionary(wordnet). Is there a better way to achieve this. Below is the sample ...