It sounds like you might do well to read up on the Model-View-Controller pattern. You don't have to adhere slavishly to it (for example, in some contexts it makes sense to allow some overlap between Model and View), but having a good understanding of it will help you to build any program that has lots of graphical objects and logic controlling them, and the need to broadcast state or persist it to disc (game save), etc.
You also have to realize that cocos2d provides a good system for structuring the graphical scene graph and rendering it efficiently, but it doesn't provide a complete infrastructure for programming games. In that sense it's more of a graphics engine than a game engine. If you try to fit your game's architecture into the structure of cocos2d, you might not end up with the most maintainable result. Instead, you should treat cocos2d as what it is: a great tool to take care of your display and animation needs.
You should definitely have an object other than the scenes that maintain the game state, because otherwise where will all the state go when you switch between scenes? And within scenes/levels, you should simply try to use good Object Oriented design to have state distributed over objects of various classes. Each character object remembers its own state etc. Here you can see where MVC becomes useful: when you save the game to disc, you want to remember each character's health level, but probably not which exact frame index the sprite animation was showing. So you need to distinguish between the sprite and the character (model) itself. That said, as I mentioned before, for game objects that don't have a lot of logic attached to them, or which don't need to be saved, it might be ok to just fuse the Model and View together into one class (basically by subclassing CCSprite).
To pull off MVC the way it's supposed to be, you should also learn the basics of Key-Value Observing. (And you'd do well to use this replacement for Apple's interface.) In more intensely real-time games, techniques like this might be too slow, but since you're making a RPG (good choice for starting out) you could probably sacrifice performance for a more maintainable architecture.
The game scene (which is just another cocos2d sprite) plays the role of Controller, in terms of the MVC pattern. It doesn't draw anything itself, but tells everything else to draw itself based on inputs and state. It's tempting to put all kinds of logic and functionality into the game scene, but when you notice that it swells, you should ask yourself how you could separate that functionality into other classes. Analyze which type of functionality you're implementing. Is it to do with data and state (Model)? Or is it about animation and rendering (View)? Or is it about connecting logic with rendering (in which case you should try to make the View observe the Model directly)?
The game scene/Controller is basically a dispatch center, which takes input events (from the user or from sprites reporting that they've hit something, for example) and decides what to do with them: it might tell one or several of the Model objects to update themselves in some way, or it might just trigger an animation in some other sprites, for example.
In a real-time game, you'd have a "tick" or "step" method in the scene which tells all objects to update themselves. This method (the game loop) is the heart of the program and is run every time a new frame is drawn. (In modern game engines there's a lot of multi-threading but let's not think about that.) But in your case, you might want to create a module that can "play the game" completely separate from the game scene. Imagine creating a program that can play chess through the terminal, using only text input. If you create the whole game system in that manner, and then connect it to the graphics engine through a small and clean interface, you'll have a really maintainable app with lots of reusable code for future projects!
Some good rules of thumb: the model (data) shouldn't know anything about sprites or display states; the view (sprites) shouldn't contain any of the game's actual logic (the game rules) but only know how to do simple things like moving and bouncing and reporting to the scene if something complicated happens. Whenever possible, the view should react to changes in the model directly, without the controller having to interfere.