The way I've been approaching it is to have a display layer that knows nothing about the gameworld itself. its only job is to recieve an ordered list of objects to draw onto the screen that all fit a uniform format for a graphic object. so for instance, if it's a 2D game, your display layer will receive a list of images along with their scaling factor, opacity, rotation, flip, and source texture, and whatever other attributes a display object could have. The view may also be responsible for recieving high level mouse interactions with these displayed objects and dispatching them somewhere appropriate. But it's important that the view layer not know anything sementically about what it is that it's displaying. Only that it's some kind of square with a surface area, and some attributes.
Then the next layer down is a program whose job it is simply to generate a list of these objects in order. It's helpful if each object in the list has some kind of unique ID, as it makes certain optimisation strategies possible in the view layer. Generating a list of display objects is a much less daunting sort of task than trying to figure out for each sort of character how its going to physically render itself.
Z sorting is simple enough. Your display object generating code just needs to generate the list in the order that you want, and you can use whatever means you need to to get there.
In our display object list program, each character, prop and NPC has two parts: A resource database assistant, and a character instance. The database assistant presents for each character a simple interface from which each character can pull up any image/statistics/animation/arrangement etc that the character will need. You'll probably want to come up with a fairly uniform interface for fetching the data, but it's going to vary a little from object to object. A tree or a rock doesn't need as much stuff as a fully animated NPC for example.
Then you need some way of generating an instance for each type of object. You might implement this dichotomy using your language's built in class/instance systems, or depending on your needs, you may need to work a little beyond that. for example, having each resource database be an instance of a resource database class, and each character instance being an instance of a "character" class. This saves you from writing a chunk of code for every single little object in the system. This way you only need to write code for broad categories of objects, and only change little things like which row of a database to fetch images from.
Then, don't forget to have an internal object representing your camera. Then it's your camera's job to query each character about where they are in relation to the camera. It is basically going around each character instance and asking for its display object. "What do you look like, and where are you?"
Each character instance in turn has its own little resourcey databasey assistant thing to query. So each character instance has available to it all the information it needs to tell the camera what it needs to know.
This leaves you with a set of character instances in a world that's more or less oblivious to the nitty gritty of how they are to be displayed on a physical screen, and more or less oblivious to the nitty gritty of how to fetch image data from the hard drive. This is good- it leaves you with as clean a slate as possible for a sort of platonically "pure" world of characters in which you can implement your game logic without worrying about things like falling off the edge of the screen. Think of what sort of interface you would like if you were to put a scripting language into your game engine. Simple as possible right? As grounded in a simulated world as possible, without worrying about little technical implementation details right? That's what this strategy lets you do.
Additionally, the separation of concerns lets you swap out the display layer with whatever technology you like: Open GL, DirectX, software rendering, Adobe Flash, Nintendo DS, whatever- Without having to fuss around too much with the other layers.
In addition, you can actually swap out the database layer to do things like reskin all the characters- Or depending on how you built it, swap in a completely new game with new content that reuses the bulk of the character interactions/ collision detection/ path finder code that you wrote in the middle layer.