There are basically two types of server to server (s2s) connections. The first is either called a gateway or a transport, but they're the same thing. This is probably the kind you're looking for. I couldn't find specific documentation for the non-XMPP side, but how XMPP thinks about doing translations to legacy servers is at http://xmpp.org/extensions/xep-0100.html. The second kind really isn't explained in any additional XEPs -- it's regular XMPP s2s connections. Look for "Server-to-Server Communication" in RFC 3920 or RFC 3920bis for the latest draft update.
Since you have your own users and presence on your server, and it's not XMPP, the concepts aren't going to map completely to the XMPP model. This is where the work of the transport comes in. You have to do the translation from your model to the XMPP model. While this is some work, you do get to make all the decisions.
Which brings us right to one of the key design choices -- you need to really decide which things you are going to map to XMPP from your service and what you aren't. These feature and use case descriptions will drive the overall structure. For example, is this like a transport to talk to AOL or MSN chat services? Then you'll need a way to map their equivalent of rosters, presence, and keep session information along with logins and passwords from your local users to the remote server. This is because your transport will need to pretend to be those users and will need to login for them.
Or, maybe you're just an s2s bridge to someone else's XMPP based chess game, so you don't need a login on the remote server, and can just act similarly to an email server and pass the information back and forth. (With normal s2s connections the only session that would be stored would be SASL authentication used with the remote server, but at the user level s2s just maintains the connection, and not the login session.)
Other factors are scalability and modularity on your end. You nailed some of the scalability concerns. Take a look at putting in multiple transports to balance the load. For modularity, see where you want to make decisions about what to do with each packet or action. For example, how do you handle and keep track of subscription data? You can put it on your transport, but then that makes using multiple transports harder. Or if you make that decision closer to your core server you can have simpler transports and use some common code if you need to talk to services other than XMPP. The trade off is a more complex core server with more vulnerability potential.