indexing

Increment Numpy array with repeated indices

I have a Numpy array and a list of indices whose values I would like to increment by one. This list may contain repeated indices, and I would like the increment to scale with the number of repeats of each index. Without repeats, the command is simple: a=np.zeros(6).astype('int') b=[3,2,5] a[b]+=1 With repeats, I've come up with the fo...

SQL Server - how to determine if indexes aren't being used?

I have a high-demand transactional database that I think is over-indexed. Originally, it didn't have any indexes at all, so adding some for common processes made a huge difference. However, over time, we've created indexes to speed up individual queries, and some of the most popular tables have 10-15 different indexes on them, and in som...

How to index Y/N column in Oracle

I have a large table (6m records) containing data licensed from a vendor. The table contains an NVARCHAR2(1) column with Y/N values. I have created a view to filter out records with a value of 'N', and this view will be queried extensively. What is the best way to index the NVARCHAR2(1) column? ...

How to index a table with a Type 2 slowly changing dimension for optimal performance

Suppose you have a table with a Type 2 slowly-changing dimension. Let's express this table as follows, with the following columns: * [Key] * [Value1] * ... * [ValueN] * [StartDate] * [ExpiryDate] In this example, let's suppose that [StartDate] is effectively the date in which the values for a given [Key] become known to the system. ...

Implementing full text search on iPhone?

I'm looking for suggestions on the best way to implement a full-text search on some static data on the iPhone. Basically I have an app that contains the offline version of a web site, about 50MB of text, and I'd like for users to be able to search for terms. I figure that I should somehow build an table of ("word", reference_to_file_con...

Index of item in list when only part of the item is known

This is a follow-up on a previous question of mine regarding searching in lists of lists I have a list with pairs of values as lists in it. [['a',5], ['b',3], ['c',2] ] I know the first element of each pair but I don't know the second (it's the result of a calculation and stored in the list with the first element. I sorted the li...

Compound index required to speed up join-ed query?

A colleague asked me to explain how indexes (indices?) boost up performance; I tried to do so, but got confused myself. I used the model below for explanation (an error/diagnostics logging database). It consists of three tables: List of business systems, table "System" containing their names List of different types of traces, table "Tr...

Slow Update vs Slow Select

This is a question about tradeoffs. Imagine a social network. Each user has a status message, that he can change anytime. Whenever he does change it, all his friends are notified through a wall (like in Facebook). To make this work. We have 3 tables Users(id, name), FriendLists(userId, friendUserId), Notifications(?). Now let's assume...

Negative integer indexes: are they evil?

I have this database that I'm designing. It needs to contain a couple dozen tables with records that we provide (a bunch of defaults) as well as records that the user can add. In order to keep the user from shooting himself in the foot, it's necessary to keep him from modifying the default records. There are lots of ways to facilitate ...

Using an index to recursively get all files in a directory really fast

Attempt #2: People don't seem to be understanding what I'm trying to do. Let me see if I can state it more clearly: 1) Reading a list of files is much faster than walking a directory. 2) So let's have a function that walks a directory and writes the resulting list to a file. Now, in the future, if we want to get all the files in t...

Any suggestions for identifying what indexes need to be created?

I'm in a situation where I have to improve the performance of about 75 stored procedures (created by someone else) used for reporting. The first part of my solution was creating about 6 denormalized tables that will be used for the bulk of the reporting. Now that I've created the tables I have the somewhat daunting task of determining ...

multidimensional indexing for image retrieval system

Hi please, any one help me to solve my problem: I am working now in trademark image retrieval system , now I was prepared my database space (i.e I compute more than one color features like color histogram , Mean Color, moment set ...), and I was used the distance measures to retrieve the images which are similar to the query image an...

Details of impact for indexes, primary keys, unique keys

I like to think I know enought theory, but I have little experience optimizing DB in real world. I would like to know points of view, thoughts or experiences. Let's imagine a scenario like: Table A Key: c1, c2, c3, c4 Index: c7, c3, c2 Table B Key: c1, c2, c3, c4 Index: c1, c5 All are non-clustered. The tables have 40+ fields. They a...

User activity vs. System activity on the Index Usage Statistics report

I recently decided to crawl over the indexes on one of our most heavily used databases to see which were suboptimal. I generated the built-in Index Usage Statistics report from SSMS, and it's showing me a great deal of information that I'm unsure how to understand. I found an article at Carpe Datum about the report, but it doesn't tell ...

SQL Server Indexes - Initial slow performance after creation

Using SQL Server 2005. This is something I've noticed while doing some performance analysis. I have a large table with about 100 million rows. I'm comparing the performance of different indexes on the table, to see what the most optimal is for my test scenario which is doing about 10,000 inserts on that table, among other things on othe...

Does MySQL record how often it uses indices?

I've got a table (InnoDB) with a fair number of indices. It would be great to know if one (or more) of these was never actually used. I don't care as much about the disk space, but insertion speed is sometimes an issue. Does MySQL record any statistics on how often it has used each index when running queries? ...

Storing large amounts of data: DB or File System?

Hello, Let's say my application creates, stores and retrieves a very large amount of entries (tens of millions). Each entry has variable number of different data (for example, some entries have only a few bytes such as ID/title, while some may have megabytes of supplementary data). Basic structure of each entry is same and is in XML for...

Advantages to Vertical Partitioning of Table

(Note that this situation isn't exactly how it is, but I made this as an example) I have an entity in a table with data that is updated every 5 seconds (Kinematic Data: Speed, Heading, Lat, Long, and PositionTime), and other data that is updated hardly at all, if ever (Color, Make, OriginTime). Now my boss wants me to partition this ...

Approximate/fuzzy string lookup using Tokyo Cabinet

I recently learned about Tokyo Cabinet and more precisely Tokyo Dystopia, a full-text search engine built on top of TC. I'm looking for an approximate/fuzzy text index but it doesn't seem to be supported out-of-the-box by Dystopia. However, it seems like the engine is using a q-gram inverted index so this should be a relatively simple h...

R-tree implementation in matlab

Hi please, any one tell me how we can implement the R-tree structure in matlab to speed the image retrieval system , I would like to inform you that my database space a feature vector of Color Histogram (Multidimensional ) and also I I have a distance vector for similarity measure... thanks ...