views:

53

answers:

3

Hi,

I am working on a web app that allows users to create dynamic PDF files based on what they enter into a form (it is not very structured data).

The idea is that User 1 enters several words (arbitrary # of words, practically capped of course), for example:

A B C D E

There is no such string in the database, so I was thinking:

  1. Store this string as a primary key in a MySQL database (it could be maybe around 50-100k of text, but usually probably less than 200 words)

  2. Generate the PDF file, and create a link to it in the database

  3. When the next user requests A B C D E, then I can just serve the file instead of recreating it each time. (simple cache)

The PDF is cpu intensive to generate, so I am trying to cache as much as I can...

My questions are:

  1. Does anyone have any alternative ideas to my approach

  2. What will the database performance be like?

  3. Is there a better way to design the schema than using the input string as the primary key?

+2  A: 

How about using an auto-incremented integer key, and, next to it, you can store the file name?

If you like, you can even implement that all the file names must be unique.

I cannot answer the performance questions, but this is much more common-place.

rlb.usa
And it is more commonplace for good reason. It does tend to perform much better and it is much easier to maintain. PKs should not be changeable things like long strings. They should be integers or GUIDs where possible. That way when the name of the item changes, it doesn't have to filter through and change a million child records. ANd the unique index is not only a good idea, it should be a requirement or you will have crappy duplicated data.
HLGEM
+1  A: 

use an auto increment INT primary key and create a unique index on your text string. A primary key should be as small as possible.

If you make the primary key this huge string performance will be bad for a number of reasons. you will fill cache memory with the index and performance will suffer. Just stepping through the index will be slow for the database. Imagine looking through a phone book where it is sorted by a bio of each person and not their name, it would take some effort to just find the start of the next entry.

KM
A: 

If you are using c#, you can use string.GetHashCode() to get an integer which will (almost certainly) be unique and will provide an integer based primary key.

You may want to remove spaces etc as these will cause a different hash code to be produced.

James Westgate
but you'd need to generate the PDF to get the hash code, so duplicate file detection couldn't occur until after you generate the duplicate PDF, which the OP says is CPU intensive.
KM
Disagree. Create the hashcode from the strings the user has entered.
James Westgate