views:

126

answers:

5

I'm using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I'm using a custom ListBox class that has a function to sort the text stripping the word 'The' from the beginning of every item. However, it doesn't exactly seem to be the simplest way to do it, since it calls from the SQLite database and then sorts. I'd prefer to cut it down to just one step, hopefully sorting straight from the database in my "SELECT" query.

I've done some searching on this, and have found some suggestions, including creating an extra sort-by column in the database. While this is certainly a possibility, I'm wondering if there's any simpler options that don't require inserting almost identical duplicate information (especially if the database becomes larger). I'm pretty new to SQLite, but I've read something about creating a collate function that can be used to create custom ordering. However, I'm not sure if this is appropriate use for it and can't seem to find any help with implementing it in C#.

Was hoping someone might be able to share some guidance. If an extra sorting column is the best way to go, then that is what I shall do.

+4  A: 

To avoid inserting duplicate data, what about having two columns: TITLE_PREFIX (usually empty, but sometimes contains "The ", or "A "; no index on this column) and TITLE (contains the title without "The " or "A "; this is the column you create the index on). To display the data, you have to combine TITLE_PREFIX and TITLE. But you just search on TITLE.

Thomas Mueller
+1  A: 

You could store each title in 2 parts: title and prefix.

With SQLite you can combine 2 string values via the || operator also known as the concatenate operator.

Here's an example:

SELECT prefix || ' ' || title FROM movies ORDER BY title

You can also use ltrim in case prefix is empty, so you don't have a space at the front:

SELECT ltrim(prefix || ' ' || title) FROM movies ORDER BY title

Another alternative is to store the prefix at the end of the title. For example at a lot of movie stores you will see something like:

Three Musketeers, The

Brian R. Bondy
A: 

Within C# Code

If you wanted to do this within C#, use LINQ to do the ordering for you. I've posted a full sample on PasteBin. This will allow you to:

  • avoid duplicating data in your database
  • take advantage of DB indexes as you normally would, no matter which RDBMS
  • put in noise words in a config file, thereby reducing downtime/rebuild/redeploy when modifying the list
  • ensure a solution is more readable in your client code
DropDownList1.DataSource = myBooks.OrderBy(n => ReplaceNoise(n.Title))

public string ReplaceNoise(string input)
{
     string[] noise = new string[] { "the", "an", "a" };

     //surely this could be LINQ'd 
     foreach (string n in noise)
     {
         if (input.ToLower().StartsWith(n))
         {
             return input.Substring(n.Length).Trim();
         }
     }
     return input;
}

Within your SQLite statement

How about simply replacing the noise words with blanks in the order by? It's an ugly first step, but strongly consider a new column to store this value for sorting purposes.

ORDER BY REPLACE(REPLACE([title],'the',''), 'a', '')

Admittedly, this gets ugly when you end up with this:

REPLACE(REPLACE(REPLACE(REPLACE([title],'The ',''),'a',''),'of',''),'by','')
p.campbell
That won't allow the database to use an index on `title` if one exists. And it looks like the OP is expecting a lot of data.
Daniel Vassallo
@Daniel: indeed, you're right on the index. For high perf, OP might want to find another DB apart from SQLite. The C# solution is much more attractive. The sorting really belongs in the client, no matter the RDBMS.
p.campbell
@downvoter: care to explain why sorting using LINQ doesn't help? How about the suggestion of a new column to store the 'sorting' value? How do any of these suggestions garner a downvote?
p.campbell
A: 

You could try building a table that supports full-text searching (using the FTS module) on the title. Then you'll be able to do fast searches on any words in the title without requiring lots of extra work on your part. For example, a user query of good bad ugly might produce “The Good, the Bad and the Ugly” as one of its first results. The extra cost of all this is about a quarter of the length of the text itself in general, but might be more for your dataset, as titles aren't full english text. You also need to spend the time building those extra indices – you don't want to build them on your main dataset on a live system (obviously) – but that shouldn't be too big a problem.

Donal Fellows
A: 

Create a virtual column (result of a function that can be implemented in C#) and sort on this virtual column. The function could move "The" to the end as in "Three Musketeers, The" or discard "The", whatever you want it to do.

Tim
There's documentation on how to do this somewhere on the System.Data.SQLite site. The sorts would be done in your SELECT statement in SQLite, though you would give up access to the index on the sort. I assume you don't have thousands and thousands of titles in your resultset if you're putting them in a Listbox now, so that should be no problem.
Tim
Here's an example of what I mean:
Tim
http://www.ivankristianto.com/software-development/visual-studio-net/howto-make-user-defined-function-in-sqlite-ado-net-with-csharp/957/
Tim