views:

88

answers:

4

For the current app I am writing I have elected to place all database functionality into a single class, as it allows me to keep the database code away from the business logic and easily replace the database code if we ever have need to switch to another DBMS. However, recently my database class has become rather large (EDIT for info: approximately 53k), and I'm worried about the speed of parsing this file due to its volume, since it generally must be parsed for each request.

Typically only one, or maybe two, different "Types" of database calls are made (e.g., user system calls, asset system calls, map system calls, session system calls, etc) at any given time, so one option I was considering was breaking the tasks into a series of database object "slices" and then dynamically loading those at run time based on the functions requests.

On the other hand, I'm worried that doing this would either (a) lead to a large amount of parallel execution in memory (i.e., each slice now has a query method, an independent query log, etc) as well as forcing me to modify all the existing code to point to the new, smaller objects or (b) cause relative performance losses as I back-hack in this functionality to work with the code that's already written (e.g., have each slice point back to the parent's querying functions as well as the performance hit to be incurred by suddenly using __call all over the place instead of direct method access).

What is the more correct course of action in this scenario?

EDIT For More Info: The file is approximately 53kb with about 2,350 lines at present (and it is not done), although this may be considered skewed as I use an expanded SQL model for readability, e.g.

SELECT
    foo,
    bar,
    baz
FROM
    someTable st
    LEFT JOIN someOtherTable sot
        ON st.id = sot.stId
WHERE
    cond > otherCond

There are 70 querying functions, each one performing some unique task, with very little overlap (If I need two startlingly similar result sets I can simply ignore what I don't need each time and reuse the same queries).

EDIT: Example function:

public function alarm_getActiveAlarmsByAsset($tier, $id) {
    if (    !Redacted::checkNumber($tier, $id) 
        ||  $id < 0 
        ||  $tier > Redacted::ASSET_LOWEST_TIER 
        ||  $tier < Redacted::ASSET_TIER_CUSTOMER
    ) {
        return false;
    }

    $sql = "
        SELECT
            alarmId,
            alarmTime,
            server,
            source,
            reason,
            train,
            server_sites.siteId AS `siteId`
        FROM
            alarm_alarms
    ";

    $join = '';

    switch ($tier) {
        case Redacted::ASSET_TIER_CUSTOMER:
            $join = '
                LEFT JOIN red_campus
                    ON red_campus.campId = red_site.campId
            ';
        case Redacted::ASSET_TIER_CAMPUS:
            $join = '
                LEFT JOIN red_site
                    ON red_site.siteId = server_sites.siteId
            ' . $join;
        case Redacted::ASSET_TIER_SITE:
            $join = '
                LEFT JOIN server_sites
                    ON server_sites.servId = alarm_alarms.server
            ' . $join;
    }
    $table = isset(self::$dbTierMap[$tier + 1]) ? self::$dbTierMap[$tier + 1]['table'] : 'server_sites';
    $field = isset(self::$dbTierMap[$tier + 1]) ? self::$dbTierMap[$tier + 1]['parent'] : 'site';
    $sql .= $join . "
        WHERE
                ackId IS NULL
            AND {$table}.{$field}Id = {$id}
    ";

    $r = $this->query($sql);

    if (!$r) {
        return false;
    }

    $alarms = array();
    while ($alarm = mysql_fetch_assoc($r)) {
        $alarms[] = $alarm;
    }
    return $alarms;
}
A: 

its difficult to tell without knowing db scheme, but its always easier to maintain several classes than one big one(e g one class per logical entity/db table)

you shouldn't worry about parse time as the request to the database(and any network request) will always take longer

kgb
I'm almost certain I'm worrying over nothing, but this is my first step up from a medium size app to high-availability large-scale app.
Dereleased
If it's your first step, then remember: premature optimization is the root of all evil.
Mewp
A: 

If you are worried about parsing time, and not code quality, you could use some compiler cache, such as APC.

However, if you have a big class, it's likely that you should refactor it into smaller ones just for readability. If you can't easily find what you need in the class (to modify it, for example), then it's probably too big to maintain.
If you decide to refactor, remember that maintainability and code quality are far more important than execution speed (database I/O is slower than executing code anyway).

Mewp
53k bytes not lines, but I understand what you're getting at. At this point everything is still very easy to find and deal with; my IDE gives me quick links to all the function entry points, the names are prefixed based on what system they touch, etc. I am curious about your "and not code quality" comments regarding APC, as we are likely going to employ a bytecode cache; I assume you mean using APC as a bandaid without attempting to refactor or fix anything else?
Dereleased
Yes, your understanding is correct. I meant that if you're satisfied with your code and just want speed, APC might be the solution.
Mewp
+1  A: 

Normally it is easier to maintain many small classes. For MySQL (mostly used with PHP) there are several class generators available:

sourceforge: PHP Class Generator

sourceforge: php Class Generator - PCG

Perhaps you can find new ideas there.

FloE
In my humble opinion, **please** don't use code generators. It's really better to use a good library, than to generate code. Machine-generated code, even if readable, quickly becomes unmaintainable. Also, it severely breaks [DRY](http://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
Mewp
+2  A: 

As far as I understand, your database class basically holds all possible queries, hardcoded, that are made in the entire app?

A better way to achieve a database layer abstraction is by abstracting the queries into an internally consistent format, and use models and database adapters to turns these into actual SQL. E.g.:

$customer = $Customer->find(array(
    'fields'    => array('name', 'id'),
    'condition' => array('ssn' => $ssn)
));

The Customer class maps to a specific table, and if necessary could map the schema to different columns:

class Customer extends BaseModel {
    public $table = 'xe_a3_cstmr';
    public $schema = array(
        'name' => 'ze_42_nm',
        …
    );
}

The BaseModel turns these abstracted queries into real SQL, depending on which database it needs to talk to:

SELECT `xe_a3_cstmr`.`ze_42_nm`, `xe_a3_cstmr`.`…` FROM `xe_a3_cstmr`
WHERE `xe_a3_cstmr`.`ssn` = 123235;

This way your database layer will not grow exponentially with every new query you need to make in your app, while still maintaing flexibility to map queries to different databases. This is known as ORM. Not to mention that this actually provides much easier abstraction. Instead of needing to rewrite every single hardcoded query, you just need to write a different database adapter.

As Mewp suggested up in the comments, a good place to start would be Doctrine.

deceze
More accurately it contains each type of query that may need to be built; sometimes that is a simple nigh-static query, and other times queries are constructed based on relevant information: e.g., in the heirarchal sense of assets which can exist at tiers from 0 to 3, the same function will be used to generate queries for assets at any level based on semantic knowledge of the table at hand; I'll edit an example into my post.
Dereleased
I think I will go with a bytecode caching mechanism for now; I am a bit wary to hand off my queries to a query generator as I have spent a lot of time tuning the queries, even the ones I semi-generate, for efficiency. I will be investigating this sort of system and possibly look to include it at a future date and time, but as my largest fear that's eating at me right now is raw performance I do not feel that adding a new layer of abstraction is the correct way to solve this.
Dereleased
@Dereleased Fair enough, as always, there's a tradeoff between performance and flexibility (and maintainability). ORM gives you incredible flexibility, but may of course have worse performance. If that's your main concern, building your own queries may indeed be the better choice. You may want to look into it anyway, "everyday queries" probably won't suffer much of a performance hit, and you can still keep specialized, hand crafted queries for the hard cases. The query generation process itself can be sped up with proper caching, if that's a concern.
deceze