tags:

views:

329

answers:

5

I'm trying to learn object oriented programming, but am having a hard time overcoming my structured programming background (mainly C, but many others over time). I thought I'd write a simple check register program as an exercise. I put something together pretty quickly (python is a great language), with my data in some global variables and with a bunch of functions. I can't figure out if this design can be improved by creating a number of classes to encapsulate some of the data and functions and, if so, how to change the design.

My data is basically a list of accounts ['checking', 'saving', 'Amex'], a list of categories ['food', 'shelter', 'transportation'] and lists of dicts that represent transactions [{'date':xyz, 'cat':xyz, 'amount':xyz, 'description':xzy]. Each account has an associated list of dicts.

I then have functions at the account level (create-acct(), display-all-accts(), etc.) and the transaction level (display-entries-in-account(), enter-a-transaction(), edit-a-transaction(), display-entries-between-dates(), etc.)

The user sees a list of accounts, then can choose an account and see the underlying transactions, with ability to add, delete, edit, etc. the accounts and transactions.

I currently implement everything in one large class, so that I can use self.variable throughout, rather than explicit globals.

In short, I'm trying to figure out if re-organizing this into some classes would be useful and, if so, how to design those classes. I've read some oop books (most recently Object-Oriented Thought Process). I like to think my existing design is readable and does not repeat itself.

Any suggestions would be appreciated.

+5  A: 

Not a direct answer to your question but O'Reilly's Head First Object-Oriented Analysis and Design is an excellent place to start.

Followed by Head First Design Patterns

Mitch Wheat
I realize they are very popular, but I couldn't get into these books
rdp
+4  A: 

"My data is basically a list of accounts"

Account is a class.

"dicts that represent transactions"

Transaction appears to be a class. You happen to have elected to represent this as a dict.

That's your first pass at OO design. Focus on the Responsibilities and Collaborators.

You have at least two classes of objects.

S.Lott
But don't let implementation details (such as represented by a dict) dictate how you model the problem.
Mitch Wheat
Accounts would have account related methods and transactions the transaction methods, with each having methods to pass relevant data to the other?
rdp
@rdp: Correct. That's the collaboration issue.
S.Lott
How to program collaboration? At the moment, main program creates an Application class which includes all methods and calls them based on user input. Do I set up the Application class to create Accounts and Transactions instances as appropriate and mediates communications between them?
rdp
@rdp: Correct. A dict is a class. You're replacing dumb dicts with classes that have methods focused on the members of the dict.
S.Lott
@S.Lott - I'm having a mental block about implementation and am likely missing something obvious. If I'm in a GUI framework, I have a main application class that binds user actions (such as clicks or keypresses) to callback functions (or responds to signals by calling slots). In other words, each user action calls some method/function. In my example, the user starts with a list of accounts (which is displayed using some GUI element), then clicks on the account and gets a list of transactions. How do use an account class or transaction class in this context?
rdp
@rdp: First replace dumb dictionary objects with smarter objects based on class definitions you wrote. That has nothing to do with GUI callbacks. A GUI application has several parts: an underlying Model, plus the GUI presentation (or view), plus Control that knits things together, responds to callbacks. All these elements involve objects. Your model should not be just dumb dictionaries.
S.Lott
@S.Lott, just to clarify, how about: create a class mydata, which holds and can change the data set, then have the rest of the program interact with mydata rather than directly dealing with the data?
rdp
@rdp: You're missing part of the point. Your `mydata` class would include two other classes within it -- accounts and transactions -- which are separate classes. You can certainly have a `mydata` collection that stands in for a proper database. But accounts and transactions *are* separate classes and must be defined that way.
S.Lott
+2  A: 

Hi Rdp,

There are many 'mindsets' that you could adopt to help in the design process (some of which point towards OO and some that don't). I think it is often better to start with questions rather than answers (i.e. rather than say, 'how can I apply inheritance to this' you should ask how this system might expect to change over time).

Here's a few questions to answer that might point you towards design principles:

  • Are other's going to use this API? Are they likely to break it? (info hiding)
  • do I need to deploy this across many machines? (state management, lifecycle management)
  • do i need to interoperate with other systems, runtimes, languages? (abstraction and standards)
  • what are my performance constraints? (state management, lifecycle management)
  • what kind of security environment does this component live in? (abstraction, info hiding, interoperability)
  • how would i construct my objects, assuming I used some? (configuration, inversion of control, object decoupling, hiding implementation details)

These aren't direct answers to your question, but they might put you in the right frame of mind to answer it yourself. :)

Andrew Matthews
+1 for pointing out that its backwards to embrace solutions and then start looking for problems to solve with them.
Tom Leys
Helpful mindsets, especially as my first question was whether classes would be useful, rather than how to oop.
rdp
Rdp, sorry if I gave offence, I wasn't implying that you were treating OOP as a solution looking for a problem, but that can be a problem to those who come from other domains. I always found the OO domain entirely intuitive, and I did precisely that when I started to embrace the functional domain with the advent of LINQ etc in C#. One thing I notice about Python (say) is that the functional syntax is much less awkward than the OO syntax. Perhaps a functional implementation is likely to be cleaner than an OO one in Python?
Andrew Matthews
@Andrew, sorry if I seemed offended. Part of my question was whether OO was useful and you gave some very helpful suggestions for considering whether OO would be useful.
rdp
+6  A: 

You don't have to throw out structured programming to do object-oriented programming. The code is still structured, it just belongs to the objects rather than being separate from them.

In classical programming, code is the driving force that operates on data, leading to a dichotomy (and the possibility that code can operate on the wrong data).

In OO, data and code are inextricably entwined - an object contains both data and the code to operate on that data (although technically the code (and sometimes some data) belongs to the class rather than an individual object). Any client code that wants to use those objects should do so only by using the code within that object. This prevents the code/data mismatch problem.

For a bookkeeping system, I'd approach it as follows:

  1. Low-level objects are accounts and categories (actually, in accounting, there's no difference between these, this is a false separation only exacerbated by Quicken et al to separate balance sheet items from P&L - I'll refer to them as accounts only). An account object consists of (for example) an account code, name and starting balance, although in the accounting systems I've worked on, starting balance is always zero - I've always used a "startup" transaction to set the balanaces initially.
  2. Transactions are a balanced object which consist of a group of accounts/categories with associated movements (changes in dollar value). By balanced, I mean they must sum to zero (this is the crux of double entry accounting). This means it's a date, description and an array or vector of elements, each containing an account code and value.
  3. The overall accounting "object" (the ledger) is then simply the list of all accounts and transactions.

Keep in mind that this is the "back-end" of the system (the data model). You will hopefully have separate classes for viewing the data (the view) which will allow you to easily change it, depending on user preferences. For example, you may want the whole ledger, just the balance sheet or just the P&L. Or you may want different date ranges.

One thing I'd stress to make a good accounting system. You do need to think like a bookkeeper. By that I mean lose the artificial difference between "accounts" and "categories" since it will make your system a lot cleaner (you need to be able to have transactions between two asset-class accounts (such as a bank transfer) and this won't work if every transaction needs a "category". The data model should reflect the data, not the view.

The only difficulty there is remembering that asset-class accounts have the opposite sign from which you expect (negative values for your cash-at-bank mean you have money in the bank and your very high positive value loan for that company sports car is a debt, for example). This will make the double-entry aspect work perfectly but you have to remember to reverse the signs of asset-class accounts (assets, liabilities and equity) when showing or printing the balance sheet.

paxdiablo
w.r.t. "the possibility that code can operate on the wrong data": do you know what a type system is for, right?
MaD70
So, @MaD70, in your C code, *all* your types would fully typedef'ed and you wouldn't use integers or floats anywhere, yes? Because, otherwise, there *is* a possibility that you will pass the wrong data to a function.
paxdiablo
I don't program in C, but of course I try to use a programming language (PL) type system to its full extent, be it an object-oriented, procedural, functional, logical (or based on whatever paradigm) PL. That type of safety is granted by a type system not by the fact that a PL is OO or not.
MaD70
Being not fluent in C, I didn't realized that even your rhetorical question refer to a wrong fact: *typedef* introduce a mere type synonym, equivalent to its base type for the type checker, so is useless as a safety measure. With other PLs with a decent nominal type system, this is not the case: a new type identifier introduce a new type which is incompatible with a type having the same base type/structure.
MaD70
Anyway, the type checker prevent you from crashing a program while invoking a non-existent method or passing a parameter of the wrong type to a method. Without a type checker no message at compile time (for statically-type PLs) or run-time (for dynamically typed PLs): the program simply crashes or begin to behave in a strange way.
MaD70
Is is not correct that "In OO, data is king". Structurally, OO doesn't distinguish between state and behavior, treating atoms of both types uniformly as "features". OO design is the process of minimizing the overall dependency between all state and behavioral features by clumping these features together into Classes.
Doug Knesek
Good point, @Doug, I suppose I should have said "object" instead of "data". Anyhow, I've fixed it as per your suggestion.
paxdiablo
A: 

Rather than using dicts to represent your transactions, a better container would be a namedtuple from the collections module. A namedtuple is a subclass of tuple which allows you to reference it's items by name as well as index number.

Since you may possibly have thousands of transactions in your journal lists, it pays to keep these items as small and light-weight as possible so that processing, sorting, searching, etc. is as fast and responsive as possible. A dict is a fairly heavy-weight object compared to a namedtuple which takes up no more memory than an ordinary tuple. A namedtuple also has the added advantage of keeping it's items in order, unlike a dict.

>>> import sys
>>> from collections import namedtuple
>>> sys.getsizeof((1,2,3,4,5,6,7,8))
60
>>> ntc = namedtuple('ntc', 'one two three four five six seven eight')
>>> xnt = ntc(1,2,3,4,5,6,7,8)
>>> sys.getsizeof(xnt)
60
>>> xdic = dict(one=1, two=2, three=3, four=4, five=5, six=6, seven=7, eight=8)
>>> sys.getsizeof(xdic)
524

So you see that's almost 9 times saving in memory for an eight item transaction. I'm using Python 3.1, so your milage may vary.

Don O'Donnell
Named tuples started in 2.6. In any event, this is more of a low level implementation issue than a design issue. Useful to remember.
rdp