extraction

Creating and Extracting a tgz archive in Rails

Does anyone know how to archive a folder and its contained files as a tgz archive using Rails? What I would like todo is archive the contents of the folder and then have another script which extracts the same folder that was archived. All of the archiving techniques that I've come across are pretty complicated, I was wondering if there ...

Regular Expression (Python) to extract strings of text from inside of < and > - e.g. <stringone><string-two> etc...

I'm currently playing with the Stack Overflow data dumps and am trying to construct (what I imagine is) a simple regular expression to extract tag names from inside of < and > characters. So, for each question, I have a list of one or more tags like <tagone><tag-two>...<tag-n> and am trying to extract just a list of tag names. Here are...

Where to get/extract relational data on administrative regions of the world?

Hi all, I'm working on a web-application that operates on services. Every service is normally provided over some (possibly several distinct) geographical domain(s) like city, county, region, state, country or worldwide. Typical tasks: users will submit their location as lng,lat and I will need to find what are the services availabl...

Tarring only the files of a directory

If I have a folder with a bunch of images, how can I tar ONLY the images and not the folder structure leading to the images without having to CD into the directory of images? tar czf images.tgz /path/to/images/* Now when images.tgz is extracted, the contents that are extracted are /path/to/images/... How I can only have the images inc...

HTML data extraction

I'm accessing some website and I need to extract some data. To be more specific - from this part: <input type="hidden" value="1" name="d520783895194bd08750e47c744d553d"> I need to extract the "name" part. I heard that reular expressions are not the best solution, so I'd like to ask what is the best way to access this piece of data I n...

web information extraction

I want to create a shopping search engine that shows products from many websites and I wonder how can I retrieve information about products from those sites. I am not interested in search engine part but in extracting product information from web pages in an automated manner using auto-generated templates. Does anybody knows some good a...

What are some ways to extract a location from a sentence/query?

I want to recognize and extract a location that's built into a sentence. For example I might have a sentence: "I love the pizza in Boston, Ma." but this same sentence could also be written as "Pizza in Boston, I love it." OR "I love the pizza in Boston." So I have to be able to find it anywhere in the sentence and also if the state i...

How to extract office embedded OLE files under Linux, Nativly (Python,C,Java)?

I am trying to extract Excel Documents which embedded inside word document as OLE but its failing hard. I need to put it in server side script so console or script is necessary. And automating Open Office is very resource hungry .. Is there any tool or libraries to do this ? Please help.. ...

Extrakting Zip to SD-Card is very slow. How can i optimize performance?

Hi there, my app downloads a zip with about 350 files. A mix of JPG and HTML files. The function i wrote to do it works just fine but the unzipping takes for ever. At first i thought the reason might be that writing to the sd-card is slow. but when i unzip the same zip with an other app on my phone it works much faster. is there anything...

PDF Text Extraction at hyperlink locations

Anybody know of a (FREE) SDK of some sort that can start text extraction at the point in the PDF Document where a hyperlink takes you to (within the same PDF document)? The links end up taking us to specific points on specific pages. More specifically we need a program that can parse a pdf document that holds questions and answers to a...

Extract Geometry from Font

I'd like to be able to extract the geometry for each letter in a TrueType font file. Each letter would have a set of coordinates, assuming each letter is in its own grid. As a picture tells a thousand words - I'd like to get the vertices for letters similar to the image below (courtesy of http://polymaps.org/) Update Thanks to the ...

Monitor ZIP File Extraction Python

I need to unzip a .ZIP archive. I already know how to unzip it, but it is a huge file and takes some time to extract. How would I print the percentage complete for the extraction? I would like something like this: Extracting File 1% Complete 2% Complete etc, etc ...

How To Extract Function Name From Main() Function Of C Source

Hi! I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c). Example source code: int main() { int a = functionA(); // functionA must be extracted int b = functionB(...

How to extract words from text as per the context

Hello, I want to extract relevant words from a text statement provided by the user. eg. For a question "How many sides are there in a rectangle?" The words should be 'rectangles' , 'sides', 'many' , 'how'. We've discovered that what exactly I'm aiming to do is a NLP Question answer system. But right now I want to only extract the requi...