Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For anyone who wants to help: I've been looking at a bunch of different ways to achieve high-availability postgres. What approach would you all suggest for a solo startup on a limited budget


I had a hell of a time getting PgPool II w/ PG Streaming Replication set up right. The trivial cases seemed to be fine, but when I started triggering failovers back and forth, I ran into a lot of cases where PgPool II would go stale due to a data file left around in /tmp.

I eventually got it working, but there were way too many informally created scripts that PgPool and PG had to know to trigger failovers, initiate resyncs from WALs, etc. I didn't like it at all, and around about then AWS started offering PG on RDS, so I just moved to that.

So, my advice would be, unless you've got someone on team for who that isn't that much work, you get a lot of benefit from going w/ hosted. RDS Postgres has been pretty great - not exceptional, but for my use cases, okay. Hoping they add cross region read replicas for PG sometime soon, as that would make a lot of expansion opportunities really easy.


I'm worried that this is going to sound flippant (not my intention), but use Heroku. You're under significant time constraints with being a solo founder and it would most likely be a win to just not have to worry about this one technical aspect.


Why recommend Heroku for that as opposed to AWS RDS? The pricing seems to be significant (Heroku "Ika" is $750/mo, db.m3.large multi-az is $280/mo)


Well, the main reason was that I didn't realize that AWS RDS was offering Postgres now (looks like it is still in beta, but awesome news nonetheless).


Nothing wrong with staying that at all. It is easier to get started with. We should try to create some AMI images to make it a little easier to get up and running at AWS with Postgres-XL.


I am also running a solo startup. Don't run your own database. It is an unwanted distraction and data loss due to poor backup procedures (i.e. do you actually test backup/restore) can often kill your startup before it has a chance to take off.

Plenty of great hosted database solutions e.g. AWS RDS (PostgreSQL, MySQL, Oracle), MongoLab, MongoHQ, Cloudant, Firebase, Azure etc.


+1 for the reality that an untested backup is potentially no backup at all. it's easy to have a false sense of security about these things


http://www.packtpub.com/postgresql-replication/book is a great book about how postgresql replication works. Thing that I love about the book is that it doesn't just describe how to setup replication, it nicely describes how postgresql works (disk layout, the wal/transaction logs, fsync consistency, checkpoints, etc).

I'd recommend doing a master-slave setup. Doing a master-master is a lot more complicated. If the master goes down, you can have monitoring software promote a slave to be the new master.


For a solo startup on a limited budget I would recommend a single-server Postgres setup. Don't solve scaling problems until they become a real problem.


Scaling is one thing; high availability is another. Depending on the product, the latter might well be important early.


Tough for HA, default Postgres has everything you need to get started.

The one thing that's missing is automated failover, but personally, I wouldn't do that with a database because from experience, from all the issues that would cause an automated failover, only a very small part would actually be solved by such action, whereas most of them would probably be made a lot worse.

Especially considering that undoing a failover is very costly (in terms of time, not necessarily money) until we get 9.4 where failing back to the original master won't need a full resync.


As much as I dislike lots about Oracle, the Oracle DB RAC cluster sets the bar pretty high, imho. Multi-master with automated failover that just works.


I've been working on an automated failover framework based on Zookeeper and ZFS for some time now. Whilst we use it in production, it's been hampered by bugs we keep hitting in the PG replication layer. Coupled with the problem that the PG WAL includes checkpoint data in addition to transaction logs, it makes it quite difficult to get right and make the system completely hands off.


Sure, didn't mean to imply that I thought this was a big advancement there (haven't looked at the options deep enough to tell, this included), just that the two concerns are different and YAGNI may apply to scale but not HA (though it can certainly apply to HA as well).


Yes, but ensure that you have backups, and possibly a warm standby.

Postgres-XL is hopefully at least reassuring that once those scaling problems do crop up, there is something available that can help.



Here's a quick walkthrough of how to set up master/slave streaming replication over SSL:

http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-postgr...

This will work over the internet between data centers if you're so inclined...depending on your traffic, a simple way to get started.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: