ansaurus

Question

Answer 1

+6 A:

You don't really say what your background is and how much you know about programming and database design. It sounds like you should do some reading. Conceptually though your design is fairly simple. Your description identifies a mere two entities:

Financial instrument; and
Quote.

So you need to then identify the attributes.

Financial instrument:

Security code;
Market;
etc.

Quote:

Timestamp;
Financial instrument;
Bid price; and
Ask price.

The reference to the financial instrument is what's called a foreign key. Each table also needs a primary key, probably just an auto-increment field.

Conceptually fairly simple.

CREATE TABLE instrument (
  id BIGINT NOT NULL AUTO_INCREMENT,
  code CHAR(4),
  company_name VARCHAR(100),
  PRIMARY KEY (id)
);

CREATE TABLE quote (
  id BIGINT NOT NULL AUTO_INCREMENT,
  intrument_id BIGINT NOT NULL,
  dt DATETIME NOT NULL,
  bid NUMERIC(8,3),
  ask NUMERIC(8,3),
  PRIMARY KEY (id)
)

CREATE INDEX instrument_idx1 ON instrument (code);

CREATE INDEX quote_idx1 ON quote (instrument_id, dt);

SELECT (bid + ask) / 2
FROM instrument i
JOIN quote q ON i.id = q.instrument_id
WHERE i.code = 'GOOG'
AND q.dt >= '01-06-2008' AND q.dt < '02-06-2008'

If your dataset is sufficiently large you might want to include (bid + ask) / 2 in the table so you don't have to calculate on the fly.

Ok, so that's the normalized view. After this you may need to start making performance optimizations. Consider this question about storing billions of rows in MySQL. Partitioning is a feature of MySQL 5.1+ (fairly new).

But another question to ask yourself is this: do you need to store all this data? The reason I ask this is that I used to be working in online broking and we only stored all the trades for a very limited window and trades would be a smaller set of data than quotes, which you seem to want.

Storing billions of rows of data is a serious problem and one you really need serious help to solve.

cletus 2008-12-29 10:04:54

Wouldn't quote table be gigantic? I tried inserting a few millions and the database started to creak

Dani 2008-12-29 11:05:18

Storing the quotes would indeed be irrelevant to a brokerage company, but to develop alogrithmic trading it is essential.Partitioning the tables seem to work (a tad slowly but perfectly acceptable) - thanks cletus and Jonathan!

Dani 2008-12-30 10:53:12

Answer 2

+1 A:

What you need to do is to read up on database normalization. If you find that article too much, you should simply skim through a 3rd normal form tutorial.

csl 2008-12-29 10:10:47

Answer 3

+2 A:

When storing data at tick level, many financial databases partition the data at least by instrument as it is rare to want run a query across instruments. So a table per instrument is normal. Some go further and partition also by date, giving a table per instrument/date combination. This can make querying a lot more difficult if queries across dates are the norm.

So two options:

A tick-table per instrument, with a clustered index on timestamp
A tick-table per instrument/date, with a clustered index on timestamp

It's a basic trade-off between speed of access and ease of querying.

Anonymous Coward 2008-12-29 10:45:31

Answer 4

+1 A:

Or perhaps consider a star schema, dimensions and facts. Ralph Kimball has some nice stuff to tell you how to go about it.

duffymo 2008-12-29 14:41:44

Answer 5

A:

Just some general observations:

Don't use a TIMESTAMP column, as it's automatically set based on the INSERT time. Since you're importing data, that's not what you want.
If you use the MySQL DATETIME column type, you can use the MySQL Date and Time functions on it.
MyISAM doesn't support FOREIGN KEY constraints and silently ignores them.
Indexes, indexes, indexes. Make sure you have them on columns you'll use for lookups. However, if you have columns with a lot of text, you may want to use FULLTEXT searches on them instead.
If you plan on turning this into a live database with INSERTs as well as SELECT queries, considering using InnoDB with transactions and row-level locking (SELECT ... FOR UPDATE)

R. Bemrose 2008-12-29 15:19:17

Answer 6

+1 A:

Dani, I've been working with Tick by Tick data for years and would be happy to collaborate on this. Email me IanTebbutt at Hotmail. (BTW I've checked and there's no way to do private email on StackOverflow and Jeff seems way against it rejected,. )

Briefly I've found partitioning by date and instrument to work pretty well. You could chose to put a months worth of data for instrument X into a set of tables using a pattern like InstrumentX_YYDD. Then when accessing the data you need at the very least a table name generator, but more likely a sql generator that can decide which single table to use, or potentially use Union to look at multiple tables.

Whichever way you look at this those kind of data volumes are not easy to deal with. This verges into DataWarehouse territory and there's a huge number of ways of skinning that cat. Like I said, happy to collaborate - I've probably got half your issues fixed already.

MrTelly 2009-01-18 02:42:47

Yup - this is a good wheel not to reinvent.

le dorfier 2009-01-18 03:07:16

ansaurus

tags:

views:

answers:

Database design question

related questions