views:

107

answers:

1

I am currently working on a Wikipedia API which means that we have a database for each language we want to use. The structure of each database is identical, they only differ in their language. The only place where this information is stored is in the name of the database.

When starting with one language the straight forward approach to use a mapping between the tables to needed classes (e.g. Page) looked fine. We defined an engine and corresponding metadata. When we added a second database with its own setup for engine and metadata we ran into the following error:

ArgumentError:
Class '<class 'wp.orm.types.pages.Page'>' already has a primary mapper defined.
Use non_primary=True to create a non primary Mapper.clear_mappers() will remove
*all* current mappers from all classes.

I found an email saying that there must be at least one primary mapper, so using this option for all databases doesn't seem feasible.

The next idea is to use sharding. For that we need a way to distinguish between the databases from the perspective of an instance, as noted in the docs:

"You need a function which can return a single shard id, given an instance to be saved; this is called "shard_chooser"

I am stuck here. Is there a way to get the database name given an Object it is loaded from? Or a possibility to add a static attribute based on the engine? The alternative would be to add a language column to every table which is just ugly. Am I overseeing other possibilities? Any ideas how to define multiple mappers for the same class, that map against tables in different databases?

+1  A: 

I asked this question on a mailing list and got this answer by Michael Bayer:

if you'd like distinct classes to indicate that they "belong" in a
different database, and you have very clear lines as to how this is
performed, use the "entity_name" concept described at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName . this sounds very much like your use case.

The next idea is to use sharding. For that we need a way to distinguish between the databases from the perspective of an instance, as noted in the docs: "You need a function which can return a single shard id, given an instance to be saved; this is called "shard_chooser"

horizontal sharding is a method of storing many homogeneous instances across multiple databases, with the implication that you're creating one big "virtual" database among partitions - the main concept is that an individual instance gets placed in different partitions based on some ruleset. This is a little like your use case as well but since you have a very simple delineation i think the "entity name" approach is easier.

So the basic idea is to generate anonymous subclasses for each desired mapping which are distinguished by the *Entity_Name*. The details can be found in Michaels Link

Ponk