views:

588

answers:

5

In a quick-and-dirty Perl script, I have a data structure like this:

$tax_revenue{YEAR}{STATE}{GOVLEV}{TAX} = integer

The hash keys assume values like this:

YEAR: 1900 .. 2000
STATE: AK, AL, ... WY
GOVLEV: state, local
TAX: type of tax (income, sales, etc.)

In addition, the hash keys are unique. For example, no value for the TAX parameter collides with a value for another other parameter.

I am starting a medium-sized project working with this data and I would like to implement the data structure in a more flexible way. I don't know all of the data-retrieval functionality I will need yet, but here are some examples:

# Specify the parameters in any order.
Tax_rev( qw(1902 WY state property) );
Tax_rev( qw(state property 1902 WY) );

# Use named parameters.
Tax_rev(year => 1902, state => 'WY', govlev => 'state', tax => 'property');

# Use wildcards to obtain a list of values.
# For example, state property tax revenue in 1902 for all states.
Tax_rev( qw(1902 * state property) );

My initial inclination was to keep storing the data as a hash-of-hashes and to build one or more utility functions (probably as part of a class) to retrieve the values. But then I wondered whether there is a better strategy -- some way of storing the underlying data other than a hash-of-hashes. Any advice about how to approach this problem would be appreciated.

+1  A: 

I would advise you to look into an object system such as Moose. The learning curve isn't too steep (or steep at all) and the benefits will be enormous. You'd start with something like:

package MyApp;

use Moose; # use strict automagically in effect

has 'year'   => ( is => 'ro', isa => 'Int', required => 1 );
has 'state'  => ( is => 'ro', isa => 'Str', required => 1 );
has 'govlev' => ( is => 'ro', isa => 'Str', required => 1 );
has 'tax'    => ( is => 'ro', isa => 'Str', required => 1 );

Then in your main program:

use MyApp;

my $obj = MyApp->new(
    year   => 2000,
    state  => 'AK',
    govlev => 'local',
    tax    => 'revenue'
);

# ...

With the flexibility of MooseX::Types you can go on to declare your own type classes, with enums, etc.

Once you go Moose, you never look back :)

pfig
Moose is not a one-sized solution to all problems. He doesn't have objects. He doesn't have any behavior. He has data.
Michael Carman
+3  A: 

If you want a pure Perl implementation, you could build an array of hashes:

my @taxdata = (
    { year => 1902, state => 'WY', level => 'state', type => 'property', amount => 500 },
    # ...
);

my @matches = grep {
    $_->{year}  == 1902    &&
    $_->{level} eq 'state' &&
    $_->{type}  eq 'property'
} @taxdata;

That's flexible if you want to run arbitrary queries against it, but slow if you want to be able to get to a specific record.

A better solution might be a database with a single table where each row contains the fields you listed. Then you could write an SQL query to extract data according to arbitrary criteria. You can use the DBI module to handle the connection.

Michael Carman
You could augment this solution by building the hash table you already have from the @taxdata array. This would give you both flexible queries in a very perl-like way, as well as quick lookups for particular records.
Dale Hagglund
A: 
Brad Gilbert
+5  A: 

Please consider putting the data in an SQLite database. Then, you have the flexibility of running whatever query you want (via DBI or just the command line interface to SQL) and getting data structures that are suitable for generating reports for taxes by state or states by taxes or taxes for a given year for all states whose names begin with the letter 'W' etc etc. I presume the data are already in some kind of character separated format (tab, comma, pipe etc) and therefore can be easily bulk imported into an SQLite DB, saving some work and code on that end.

Sinan Ünür
+1  A: 

Check out Data::Diver: "Simple, ad-hoc access to elements of deeply nested structures". It seems to do exactly what you want from Tax_rev:

use Data::Diver qw( Dive );

...
$tax_revenue{ 1900 }{ NC }{ STATE }{ SALES } = 1000;
...

  Dive( \%Hash, qw( 1900 NC STATE SALES ) ) => 1000;
  Dive( \%Hash, qw( 1901 NC STATE SALES ) ) => undef;
Peter Kovacs