For anyone who wants to help: I've been looking at a bunch of different ways to achieve high-availability postgres. What approach would you all suggest for a solo startup on a limited budget
I had a hell of a time getting PgPool II w/ PG Streaming Replication set up right. The trivial cases seemed to be fine, but when I started triggering failovers back and forth, I ran into a lot of cases where PgPool II would go stale due to a data file left around in /tmp.
I eventually got it working, but there were way too many informally created scripts that PgPool and PG had to know to trigger failovers, initiate resyncs from WALs, etc. I didn't like it at all, and around about then AWS started offering PG on RDS, so I just moved to that.
So, my advice would be, unless you've got someone on team for who that isn't that much work, you get a lot of benefit from going w/ hosted. RDS Postgres has been pretty great - not exceptional, but for my use cases, okay. Hoping they add cross region read replicas for PG sometime soon, as that would make a lot of expansion opportunities really easy.
I'm worried that this is going to sound flippant (not my intention), but use Heroku. You're under significant time constraints with being a solo founder and it would most likely be a win to just not have to worry about this one technical aspect.
Nothing wrong with staying that at all. It is easier to get started with. We should try to create some AMI images to make it a little easier to get up and running at AWS with Postgres-XL.
I am also running a solo startup. Don't run your own database. It is an unwanted distraction and data loss due to poor backup procedures (i.e. do you actually test backup/restore) can often kill your startup before it has a chance to take off.
Plenty of great hosted database solutions e.g. AWS RDS (PostgreSQL, MySQL, Oracle), MongoLab, MongoHQ, Cloudant, Firebase, Azure etc.
http://www.packtpub.com/postgresql-replication/book is a great book about how postgresql replication works. Thing that I love about the book is that it doesn't just describe how to setup replication, it nicely describes how postgresql works (disk layout, the wal/transaction logs, fsync consistency, checkpoints, etc).
I'd recommend doing a master-slave setup. Doing a master-master is a lot more complicated. If the master goes down, you can have monitoring software promote a slave to be the new master.
For a solo startup on a limited budget I would recommend a single-server Postgres setup. Don't solve scaling problems until they become a real problem.
Tough for HA, default Postgres has everything you need to get started.
The one thing that's missing is automated failover, but personally, I wouldn't do that with a database because from experience, from all the issues that would cause an automated failover, only a very small part would actually be solved by such action, whereas most of them would probably be made a lot worse.
Especially considering that undoing a failover is very costly (in terms of time, not necessarily money) until we get 9.4 where failing back to the original master won't need a full resync.
As much as I dislike lots about Oracle, the Oracle DB RAC cluster sets the bar pretty high, imho. Multi-master with automated failover that just works.
I've been working on an automated failover framework based on Zookeeper and ZFS for some time now. Whilst we use it in production, it's been hampered by bugs we keep hitting in the PG replication layer. Coupled with the problem that the PG WAL includes checkpoint data in addition to transaction logs, it makes it quite difficult to get right and make the system completely hands off.
Sure, didn't mean to imply that I thought this was a big advancement there (haven't looked at the options deep enough to tell, this included), just that the two concerns are different and YAGNI may apply to scale but not HA (though it can certainly apply to HA as well).