text

removing password for multiple PDF files

so I have a huge collection of PDF files that I need to extract text from. The files are encrypted, but I know the password for them. I'm looking for a way to automate the process of extracting the text. I can manually open the file in Acrobat professional, remove security by typing in the password, and then save as .txt file. But there'...

OS X file duplication converts text encoding by default

All the PHP files in my workspace are encoded in Unicode (UTF-8, no BOM). I often duplicate an existing source file to use as a base for a new script. Invariably (with Path Finder or the original Finder), OS X will convert the encoding of the duplicate file to Western (Mac OS Roman). Is there any way to make OS X behave and not convert ...

How to detect duplicate text with some fuzzyness

Some thing ago, I write small script using Text::DeDupe to remove duplicates of blog posts before I have to lay my eyes on them. After reading Syntactic Clustering of the Web paper on which implementation is based, I would love to have ability to find overlapping documents (e.g. snippets of blogs as opposed to full text, maybe also quot...

Retrieve Selected Text from a Web Browser Control in a

Here's what I am trying to do: Select text from a webpage I pulled up using my web browser control. After clicking a button while this text is still selected I would like a message box to pop-up displaying the text that was highlighted by the user. I do I get this functionality to work in my wpf application? I think I'm on the right tra...

C# code to convert XHTML doc to plain text

I'm writing a utility to export evernote notes into Outlook on a schedule. The Outlook API's need plain text, and Evernote outputs a XHTML doc version of the plain text note. What I need is to strip out all the Tags and unescape the source XHTML doc embedded in the Evernote export file. Basically I need to turn; <note> <title>Test Sy...

How much more inefficient are text (blobs) than varchar/nvarchar's?

We're doing a lot of large, but straightforward forms for a fairly big project (about 600 users using it throughout the day - that's big for me at least ;-) ). The forms have a lot of question/answer type sections, so it's natural for some people to type a sentence, while others type a novel. How beneficial would it be to put a charact...

Stop text from wrapping with NSLayoutManager

Given any arbitrary, one-line string, my goal is to render it into a bitmap representation. However, I have no means of finding out its dimensions beforehand, so I am reduced to getting the glyph range's bounding rect and resizing my canvas if it's not large enough. Unfortunately, if the canvas is not wide enough for the string, but ta...

Tool to find duplicate sections in a text (XML) file?

Hiya, I have an XML file, and I want to find nodes that have duplicate CDATA. Are there any tools that exist that can help me do this? I'd be fine with a tool that does this generally for text documents. ...

JS Regex For Human Names

I'm looking for a good JavaScript RegEx to convert names to proper cases. For example: John SMITH = John Smith Mary O'SMITH = Mary O'Smith E.t MCHYPHEN-SMITH = E.T McHyphen-Smith John Middlename SMITH = John Middlename SMITH Well you get the idea. Anyone come up with a comprehensive solution? ...

Reading text values into matlab variables from ASCII files

Consider the following file var1 var2 variable3 1 2 3 11 22 33 I would like to load the numbers into a matrix, and the column titles into a variable that would be equivalent to: variable_names = char('var1', 'var2', 'variable3'); I don't mind to split the names and the numbers in two files, however preparing matlab code...

.NET C# - Random access in text files - no easy way?

Hi all, I've got a text file that contains several 'records' inside of it. Each record contains a name and a collection of numbers as data. I'm trying to build a class that will read through the file, present only the names of all the records, and then allow the user to select which record data he/she wants. The first time I go thro...

Extract words out of a text file

Let's say you have a text file like this one: http://www.gutenberg.org/files/17921/17921-8.txt Does anyone has a good algorithm, or open-source code, to extract words from a text file? How to get all the words, while avoiding special characters, and keeping things like "it's", etc... I'm working in Java. Thanks ...

Truncate a string nicely to fit within a given pixel width.

Sometimes you have strings that must fit within a certain pixel width. This function attempts to do so efficiently. Please post your suggestions or refactorings below :) function fitStringToSize(str,len) { var shortStr = str; var f = document.createElement("span"); f.style.display = 'hidden'; f.style.padding = '0px'; ...

Are there any good editors for LISP programming, other than emacs?

I'm looking for an alternative, since I find emacs difficult to use. I'd rather use an editor that supports all the usual shortcuts I'm used to, such as arrow keys to move the cursor around, CTRL+SHIFT+RightArrow to select the next word, etc. Basically, I don't want to have to relearn all my familiar shortcuts just so I can use emacs. ...

Transforming selected text with a hotkey

I have this code: myVariable which I want to change into trace("myVariable: " + myVariable); using a direct hotkey like "alt-f12" to do it. I.e not using "ctrl-space" and arrow buttons. is it possible in eclipse? ...

windows cmd pipe not unicode even with /U switch

Hello, I have a little c# console program that outputs some text using Console.WriteLine. I then pipe this output into a textfile like: c:myprogram > textfile.txt However, the file is always an ansi text file, even when I start cmd with the /u switch. cmd /? says about the /u switch: /U Causes the output of internal commands...

Simple text menu in C++

I am writing a silly little app in C++ to test one of my libraries. I would like the app to display a list of commands to the user, allow the user to type a command, and then execute the action associated with that command. Sounds simple enough. In C# I would end up writing a list/map of commands like so: class MenuItem { ...

Text Wrapping in SSRS

Hi, How do I accomplish text wrapping of table fields in SSRS Report, and proper landscaping when rendering the report to PDF format Thanks in advance Anna ...

What is the best way to display HTML in Flex?

I have HTML that includes symbols such as the Trademark "TM" as superscript (). In normal HTML, I would use "&trade;" or &#153; to display the Trademark TM. However, I can find no way to import HTML like this into Flex and have it displayed correctly. I am having similar issues with the <li> tag. My HTML: <p>This information is intelle...

How do I convert Unicode to an integer value and not raise an exception if the text is not really an integer

I have some HTML I am trying to parse. There are cases where the html attributes alone are not going to help me identify the row type (header versus data). Fortunately, if my row is a data row then it should have some values that can be converted to integers. I have figured out how to convert the unicode to an integer for those cases ...