views:

230

answers:

6

In my previous question a lot of users wanted me to give some more data to toy with. So I got working on exporting all my data and processing it with Python, but then I realized: where do I leave all this data?

Well I decided the best thing would be to stick them in a database, so at least I don't have to parse the raw files every time. But since I know nothing about databases this is turning out to be quite confusing. I tried some tutorials to create a sqlite database, add a table and field and try to insert my numpy.arrays, but it can't get it to work.

Typically my results per dog look like this: alt text

So I have 35 different dogs and each dog has 24 measurement. Each measurement itself has an unknown amount of contacts. Each measurement consists out of a 3D array (248 frames of the whole plate [255x63]) and a 2D array (the maximal values for each sensor of the plate [255x63]). Storing one value in a database wasn't a problem, but getting my 2D arrays in there didn't seem to work.

So my question is how should I order this in a database and insert my arrays into it?

+3  A: 

You'll probably want to start out with a dogs table containing all the flat (non array) data for each dog, things which each dog has one of, like a name, a sex, and an age:

CREATE TABLE `dogs` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `name` VARCHAR(64),
  `age` INT UNSIGNED,
  `sex` ENUM('Male','Female')
);

From there, each dog "has many" measurements, so you need a dog_mesaurements table to store the 24 measurements:

CREATE TABLE `dog_measurements` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `dog_id` INT UNSIGNED NOT NULL,
  `paw` ENUM ('Front Left','Front Right','Rear Left','Rear Right'),
  `taken_at` DATETIME NOT NULL
);

Then whenever you take a measurement, you INSERT INTO dog_measurements (dog_id,taken_at) VALUES (*?*, NOW()); where * ? * is the dog's ID from the dogs table.

You'll then want tables to store the actual frames for each measurement, something like:

CREATE TABLE `dog_measurement_data` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `dog_measurement_id` INT UNSIGNED NOT NULL,
  `frame` INT UNSIGNED,
  `sensor_row` INT UNSIGNED,
  `sensor_col` INT UNSIGNED,
  `value` NUMBER
);

That way, for each of the 250 frames, you loop through each of the 63 sensors, and store the value for that sensor with the frame number into the database:

INSERT INTO `dog_measurement_data` (`dog_measurement_id`,`frame`,`sensor_row`,`sensor_col`,`value`) VALUES
(*measurement_id?*, *frame_number?*, *sensor_row?*, *sensor_col?*, *value?*)

Obviously replace *measurement_id?*, *frame_number?*, *sensor_number?*, value? with real values :-)

So basically, each dog_measurement_data is a single sensor value for a given frame. That way, to get all the sensor values for all a given frame, you would:

SELECT `sensor_row`,sensor_col`,`value` FROM `dog_measurement_data`
WHERE `dog_measurement_id`=*some measurement id* AND `frame`=*some frame number*
ORDER BY `sensor_row`,`sensor_col`

And this will give you all the rows and cols for that frame.

Josh
+2  A: 

I think you are not able to figure out how to put 2D data in database.

If you think of relation between 2 columns, you can think of it as 2D data with 1st column as X axis data and 2nd column as Y axis data. Similarly for 3D data.

Finally your db should look like this:

Table: Dogs
    Columns: DogId, DogName -- contains data for each dog

Table: Measurements
    Columns: DogId, MeasurementId, 3D_DataId, 2D_DataId -- contains measurements of each dog

Table: 3D_data
    Columns: 3D_DataId, 3D_X, 3D_Y, 3D_Z -- contains all 3D data of a measurement

Table: 2D_data
    Columns: 2D_DataId, 2D_X, 2D_Y -- contains all 2D data of a measurement

Also you may want to store your 3D data and 2D data in an order. In that case, you will have to add a column to store that order in table of 3D data and 2D data

Niraj Nawanit
A: 

The only thing I would add to Josh's answer is that if you don't need to query individual frames or sensors, just store the arrays as BLOBs in the dog_measurement_data table. I have done this before with large binary set of sensor data and it worked out well. You basically query the 2d and 3d arrays with each measurement and manipulate them in code instead of the database.

James
This should have been posted as a comment to my answer, but, I guess you can't post comments yet :-) In response, I'd say that the downside to this is, you need a format you can reliably store/retrieve from BLOB, like JSON or some other serialized python value. This might not be as easy as it sounds!
Josh
+2  A: 

Django has a library for encapsulating all the database work into Python classes, so you don't have to mess with raw SQL until you have to do something really clever. Even though Django is a framework for web applications, you can use the database ORM by itself.

Josh's models would look like this in Python using Django:

from django.db import models

class Dog(models.Model):
    # Might want to look at storing birthday instead of age.
    # If you track age, you probably need another field telling
    # you when in the year age goes up by 1... and at that point,
    # you're really storing a birthday.
    name = models.CharField(max_length=64)
    age = models.IntegerField()
    genders = [
        ('M', 'Male'),
        ('F', 'Female'),
    ]
    gender = models.CharField(max_length=1, choices=genders)

class Measurement(models.Model):
    dog = models.ForeignKey(Dog, related_name="measurements")
    paws = [
        ('FL', 'Front Left'),
        ('FR', 'Front Right'),
        ('RL', 'Rear Left'),
        ('RR', 'Rear Right'),
    ]
    paw = models.CharField(max_length=2, choices=paws)
    taken_at = models.DateTimeField(default=date, auto_now_add=True)

class Measurement_Point(models.Model):
    measurement = models.ForeignKey(Measurement, related_name="data_points")
    frame = models.IntegerField()
    sensor_row = models.PositiveIntegerField()
    sensor_col = models.PositiveIntegerField()
    value = models.FloatField()

    class Meta:
        ordering = ['frame', 'sensor_row', 'sensor_col']

The id fields are created automatically.

Then you can do things like:

dog = Dog()
dog.name = "Pochi"
dog.age = 3
dog.gender = 'M'
# dog.gender will return 'M', and dog.get_gender_display() will return 'Male'
dog.save()

# Or, written another way:
dog = Dog.objects.create(name="Fido", age=3, sex='M')

To take a measurement:

measurement = dog.measurements.create(paw='FL')
for frame in range(248):
    for row in range(255):
        for col in range(63):
            measurement.data_points.create(frame=frame, sensor_row=row, 
                sensor_col=col, value=myData[frame][row][col])

Finally, to get a frame:

# For the sake of argument, assuming the dogs have unique names.
# If not, you'll need some more fields in the Dog model to disambiguate.
dog = Dog.objects.get(name="Pochi", sex='M')
# For example, grab the latest measurement...
measurement = dog.measurements.all().order_by('-taken_at')[0]
# `theFrameNumber` has to be set somewhere...
theFrame = measurement.filter(frame=theFrameNumber).values_list('value')

Note: this will return a list of tuples (e.g. [(1.5,), (1.8,), ... ]), since values_list() can retrieve multiple fields at once. I'm not familiar with NumPy, but I'd imagine it's got a function similar to Matlab's reshape function for remapping vectors to matrices.

Mike DeSimone
+1 for using an excellent model! Kidding... this is a great answer and will probably be very helpful to Ivo!
Josh
A: 

I've benefited a lot from the sqlalchemy package; it is an Object Relational Mapper. What this means is that you can create a very clear and distinct separation between your objects and your data:

SQL databases behave less like object collections the more size and performance start to matter; object collections behave less like tables and rows the more abstraction starts to matter. SQLAlchemy aims to accommodate both of these principles.

You can create a objects representing your different nouns (Dog, Measurement, Plate, etc). Then you create a table via sqlalchemy constructs which will contain all the data that you want to associate with, say, a Dog object. Finally you create a mapper between the Dog object and the dog_table.

This is difficult to understand without an example, and I will not reproduce one here. Instead, please start by reading this case study and then study this tutorial.

Once you can think of your Dogs and Measurements as you do in the real world (that is, the objects themselves) you can start factoring out the data that makes them up.

Finally, try not to marry your data with an specific format (as you do currently by using numpy arrays). Instead, you can think of the simple numbers and then transform them on demand to the specific format your current application demands (along the lines of a Model-View-Controller paradigm).

Good luck!

Arrieta
A: 

From your description, I would highly recommend looking into PyTables. It's not a relational database in the traditional sense, it has most of the features you're likely to be using (e.g. querying) while allowing easy storage of large, multidimensional datasets and their attributes. As an added bonus, it's tightly integrated with numpy.

Joe Kington