So now I want to make Estonian wordlist ~about 20m unique words in lowercase. To get input for wordlist, corpus of Estonian can be used. Corpus files are in Text Encoding Initiative (TEI) format. I tried using regex to find the words.
This is what I made: it's inefficient, mcv is all messed up, it brakes if hashset of words can't fit i...
Database : SQL Server 2005
Problem : Copy values from one column to another column in the same table with a billion+
rows.
test_table (int id, bigint bigid)
Things tried 1: update query
update test_table set bigid = id
fills up the transaction log and rolls back due to lack of transaction log space.
Tried 2 - a procedure on fol...
Using SQlite I have a large database split into years:
DB_2006_thru_2007.sq3
DB_2008_thru_2009.sq3
DB_current.sq3
They all have a single table call hist_tbl with two columns (key, data).
The requirements are:
1. to be able to access all the data at once.
2. inserts only go to the current version.
3. the data will continue to be...
I need to add a autonumber column to an existing table which has about 15 million records in SQL 2005.
Do you think how much time it'll take? What's the better way to do it?
...
My question centers on some Parallel.ForEach code that used to work without fail, and now that our database has grown to 5 times as large, it breaks almost regularly.
Parallel.ForEach<Stock_ListAllResult>( lbStockList.SelectedItems.Cast<Stock_ListAllResult>(), SelectedStock =>
{
ComputeTipDown( SelectedStock.Symbol );
} );
The Com...
I'm writing a CXF WS to upload some large files - up to 1GB. In most cases they won't be >10-15MB, but the problem is that it is ineffective to load the file and send it as regular byte[] using the standard binding. For that reason a custom interceptor might be needed but I'm not sure it is the only option as well as how to write it.
10x...
My next project involves the creation of a data API within an enterprise framework. The data will be consumed by several applications running on different software platforms. While my colleagues generally favour SOAP, I would like to use a RESTful architecture.
Most of the applications will only need a few objects at every call. Othe...
I was thinking of this the other day, apps like Twitter deal with millions of users. I was thinking how the functionality of 'following' would work, where the maximum amount of users in the database can follow the maximum amount of users in the database less one, (himself).
If this was a ManyToMany bidirectional mapping, it would cre...
I need to go over a huge amount of text (> 2 Tb, a Wikipedia full dump) and keep two counters for each seen token (each counter is incremented depending on the current event). The only operation that I will need for these counters is increase. On a second phase, I should calculate two floats based on these counters and store them.
It sh...
Hi,
we are facing the following problem and we are trying to come up with the best possible solution.
We are using SQL Server 2008. We have a table that has more than 600 millions records, and has about 25 columns. One of the columns is an ID and is indexed. We need to get a subset of records from this table. There are mainly 2 cases:
...