Background
I am developing a social web app for poets and writers, allowing them to share their poetry, gather feedback, and communicate with other poets. I have very little formal training in database design, but I have been reading books, SO, and online DB design resources in an attempt to ensure performance and scalability without over-engineering.
The database is MySQL, and the application is written in PHP. I'm not sure yet whether we will be using an ORM library or writing SQL queries from scratch in the app. Other than the web application, Solr search server and maybe some messaging client will interact with the database.
Current Needs
The schema I have thrown together below represents the primary components of the first version of the website. Initially, users can register for the site and do any of the following:
- Create and modify profile details and account settings
- Post, tag and categorize their writing
- Read, comment on and "favorite" other users' posts
- "Follow" other users to get notifications of their activity
- Search and browse content and get suggested posts/users (though we will be using the Solr search server to index DB data and run these type of queries)
Scalability
Obviously the whole team expects to explode into the scene, overcoming facebook within a few weeks of launching, but I don't want to go crazy with scalability until that time comes (a few weeks, remember). However, we have a second iteration of features that we will be adding shortly after initial launch, so the initial DB design needs to scale effortlessly to accommodate these changes. The most important of these features include:
- Video, image and audio posts/albums/sharing
- Private messaging with facebook-style reply threads
- User-created groups with in-group discussion boards
- Administration controls for various user roles, (admins, content editors, community managers)
Schema
Here is what I came up with on MySQL Workbench for the initial site. I'm still a little fuzzy on some relational databasey things, so go easy.
Questions
- In general, is there anything I'm doing wrong or can improve upon?
- Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?
- Is there any reason why I shouldn't combine the PostStats table into the Posts table?
- Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?
- Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?
- Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?
Thanks for the help!