I have over a million text files compressed into 40 zip files. I also have a list of about 500 model names of phones. I want to find out the number of times a particular model was mentioned in the text files.
Is there any python module which can do a regex match on the files without unzipping it. Is there a simple way to solve this pro...
The Gang of Four's Design Patterns uses a word processor as an example for at least a few of their patterns, particularly Composite and Flyweight.
Other than by using C or C++, could you really use those patterns and the object-oriented overhead they entail to write a high-performing fully featured word processor?
I know that Eclipse ...
I have a ~23000 line sql dump containing several databases worth of data. I need to extract a certain section of this file (i.e. the data for a single database) and place it in a new file. I know both the start and end line numbers of the data that I want.
Does anyone know a unix command (or series of commands) to extract all lines from...
I am constantly learning new tools, even old fashioned ones, because I like to use the right solution for the problem.
Nevertheless, I wonder if there is still any reason to learn some of them. AWK for example, is interesting to me, but for simple text processing, I can use grep / cut / sed / whatever, while for complex ones, I´ll go fo...
I'm trying to come up with a way to estimate the number of English words a translation from Japanese will turn into. Japanese has three main scripts -- Kanji, Hiragana, and Katakana -- and each has a different average character-to-word ratio (Kanji being the lowest, Katakana the highest).
Examples:
computer: コンピュータ (Katakana - 6
chara...
I want to update a large number of C++ source files with an extra include directive before any existing #includes. For this sort of task I normally use a small bash script with sed to re-write the file.
How do I get sed to replace just the first occurrence of a string in a file rather than replacing the every occurrence?
If I use
se...
I can write a trivial script to do this but in my ongoing quest to get more familliar with unix I'd like to learn efficient methods using built in commands instead.
I need to deal with very large files that have a variable number of header lines. the last header line consists of the text 'LastHeaderLine'. I wish to output everything aft...
Consider the following file
var1 var2 variable3
1 2 3
11 22 33
I would like to load the numbers into a matrix, and the column titles into a variable that would be equivalent to:
variable_names = char('var1', 'var2', 'variable3');
I don't mind to split the names and the numbers in two files, however preparing matlab code...
Hi Guys,
I'm still working with this huge list of URLs, all the help I have received has been great.
At the moment I have the list looking like this (17000 URLs though):
http://www.domain.com/page?CONTENT_ITEM_ID=1
http://www.domain.com/page?CONTENT_ITEM_ID=3
http://www.domain.com/page?CONTENT_ITEM_ID=2
http://www.domain.com/page?CONT...
[EDIT]In Short: How would you write an automatic spell checker? The idea is that the checker builds a list of words from a known good source (a dictionary) and automatically adds new words when they are used often enough. Words which haven't been used a while should be phased out. So if I delete part of a scene which contains "Mungrohype...
I want to pipe the output of a "template" file into mysql, the file having variables like ${dbName} interspersed. What is the commandline utility to replace these instances and dump the output to stdout?
...
Is there an existing Java library that could tell me whether a String contains English language text or not (e.g. I need to be able to distinguish French or Italian text -- the function needs to return false for French and Italian, and true for English)?
...
I need to read some large files (from 50k to 100k lines), structured in groups separated by empty lines. Each group start at the same pattern "No.999999999 dd/mm/yyyy ZZZ". Here´s some sample data.
No.813829461 16/09/1987 270
Tit.SUZANO PAPEL E CELULOSE S.A. (BR/BA)
C.N.P.J./C.I.C./N INPI : 16404287000155
Procurador: MARCEL...
I have a huge (but finite) set of natural language strings.
I need a way to convert each string to a numeric value. For any given string the value must be the same every time.
The more "different" two given strings are, the more different two corresponding values should be. The more "similar" they are, the less different values should ...
does anybody know how to delete all characters behind a specific character??
like this:
http://google.com/translate_t
into
http://google.com
...
I have a text file layed out like this:
1 a, b, c
2 c, b, c
2.5 a, c
I would like to reverse the keys (the number) and values (CSV) (they are separated by a tab character) to produce this:
a 1, 2.5
b 1, 2
c 1, 2, 2.5
(Notice how 2 isn't duplicated for c.)
I do not need this exact output. The numbers in the input are ord...
This is a beginner-best-practice question in perl. I'm new to this language. The question is:
If I want to process the output lines from a program, how can I format THE FIRST LINE in a special way?
I think of two possibilities:
1) A flag variable, once the loop is executed first time is set. But it will be evaluated for each cycle. BA...
script(1) is a tool for keeping a record of an interactive terminal session; by default it writes to the file transcript. My problem is that I use ksh93, which has readline features, and so the transcript is mucked up with all sorts of terminal escape sequences and it can be very difficult to reconstruct the command that was actually e...
I have problem in string maniputation with c++.
The Rule : if the same 'word' is repeated from sentences or paragraph i want it to become an integer.
Please help me ?!
example
input : we prefer questions that can be answered, not just we discussed that.
output:
1 prefer questions 2 can be answered, not just 1 discussed 2.
1 we
2 th...
I have an HTML file and would like to extract the text between <li> and </li> tags. There are of course a million ways to do this, but I figured it would be useful to get more into the habit of doing this in simple shell commands:
awk '/<li[^>]+><a[^>]+>([^>]+)<\/a>/m' cities.html
The problem is, this prints everything whereas I simpl...