You're certainly not alone in hoping that MySQL loses market share to PostgreSQL under Oracle's stewardship. I've always assumed that MySQL has been more prevalent due to some combination of historically earlier user friendliness, an incompatible SQL implementation, and feedback effects related to the previous two factors.
That said, is there something that would be lost if everyone just switched from MySQL to PostgreSQL tomorrow? What benefits does MySQL have over PostgreSQL these days?
I run a big website and would love to switch; the only thing stopping me is that I'd have to manually fix hundreds of carefully hand-crafted queries to support pg syntax.
If someone wrote a "MySQL emulator" layer for postgres, I'd switch tomorrow. (And I'd work quickly to progressively replace emulated calls with real pg SQL -- it's a lot easier to justify the effort after you make the switch than before!)
It really depends how 'carefully crafted' your queries are. If there are things like index hints, you can just strip them out. If you are using non-standard mysql specific syntax, that will be a bit of a pain, but it won't take more than a day or two to fix a few hundred queries.
Where you are going to experience the most pain is that MySQL lets you write idiotic queries like this:
SELECT id, last_name
FROM some_table
GROUP BY last_name;
> If you are using non-standard mysql specific syntax, that will be a bit of a pain, but it won't take more than a day or two to fix a few hundred queries.
One issue is with non-standard MySQL functions (such as inet_aton/inet_ntoa) and aggregate functions (such as group_concat). While most have great and superior pg implementations, it's a lot of work to cross-reference each one, grep the source code, and hope the pg implementation is sufficiently identical.
Another issue is that MySQL considers the 'as' keyword to be optional. I'm sure pg has good reason for requiring it, but there's probably about a thousand 'as'es to be added.
And that's about where I gave up last time, if I recall.
> I'm actually curious: what id would a MySQL user expect to be displayed
> for this query?
MySQL discourages using this feature if columns not included in GROUP BY are not constant in the group:
> Do not use this feature if the columns you omit from the GROUP BY part
> are not constant in the group. The server is free to return any value
> from the group, so the results are indeterminate unless all values are
> the same.
Their example in documentation:
SELECT order.custid, customer.name, MAX(payments)
FROM order,customer
WHERE order.custid = customer.custid
GROUP BY order.custid;
"MySQL discourages using this feature if columns not included in GROUP BY are not constant in the group"
That's what errors are for, not documentation. It is pretty easy to forget something in the group by, and documentation won't help with that.
The dangerous thing is that the result returned from such a nonsense query looks valid in many cases, while being wrong in subtle ways.
PostgreSQL detects when the query is valid, and executes it if so. So, if you do a GROUP BY customer_id (a key column), you can also see customer_name without adding it to the GROUP BY list. But if you group by customer_zipcode (not a key), and try to select the customer_name, it will throw an error.
1. InnoDB has certain optimizations that PG lacks which can make a big performance difference at the high end: index-only queries, insert buffer (or change buffer in MySQL 5.5+), clustered index
2. Lightweight connection creation: MySQL can handle many more concurrent connections and also can create new connections much faster due to threading vs. process model
3. More flexible replication: PG is catching up, but MySQL still has the edge here imo.
1. while true, PG has other features (e.g. partial indexes) which can make a big difference as well. Completely depends on the scenario. Sometime mysql will 'win', sometimes PG. No real reason to pick one over the other, unless you know that in your situation a particular feature is a real must-have.
2. use a connection pooler (e.g. pgbouncer), problem solved, good practice anyway, even with mysql
3. what (practical) edge do you see given the replication features in 9.1?
As for lightweight connections, I see this as completely moot. While you might make tens of thousands of cheap connections to a mysql server, postgresql is much better at executing concurrent queries. Connection poolers like pgbouncer let you make as many cheap connections as you want if most of them are going to be idle anyway.
Both allow you optimize the memory usage of some particularly problematic cases, specifically very large, simple tables. Postgres cannot return data directly from indexes (covering indexes). It always has to go back to the table itself to fetch the actual data. If the table is large, that can be inefficient for some types of queries.
Non-transactional tables use a lot less memory as well. For instance, if you have a large table that represents a N:M relationship (id1 int, id2 int), the two ints use 8 bytes of memory. A postgres table adds about 24 bytes per record, three times the actual data, plus some overhead per page.
Don't take this to mean that MySQL is faster than Postgres. That's not generally the case. The Postgres query optimizer is vastly better than MySQL's. So for complex queries and data models, Postgres is way superior. The big differences are always related to very specific data model and query combinations, so general benchmarks are utterly useless.
That's very good news! I have to look at the physical data structures and what it means for memory usage though. "Unlogged" as such only means they don't use WAL.
They are not available in Postgres at the moment, but I believe that covering indexes / index only scans will be coming some time next year, as theres a patch that should hopefully go into the next release.
That said, is there something that would be lost if everyone just switched from MySQL to PostgreSQL tomorrow? What benefits does MySQL have over PostgreSQL these days?