You're certainly not alone in hoping that MySQL loses market share to PostgreSQL...

sjwright · on Sept 24, 2011

Momentum and familiarity.

I run a big website and would love to switch; the only thing stopping me is that I'd have to manually fix hundreds of carefully hand-crafted queries to support pg syntax.

If someone wrote a "MySQL emulator" layer for postgres, I'd switch tomorrow. (And I'd work quickly to progressively replace emulated calls with real pg SQL -- it's a lot easier to justify the effort after you make the switch than before!)

ltbarcly3 · on Sept 24, 2011

It really depends how 'carefully crafted' your queries are. If there are things like index hints, you can just strip them out. If you are using non-standard mysql specific syntax, that will be a bit of a pain, but it won't take more than a day or two to fix a few hundred queries.

Where you are going to experience the most pain is that MySQL lets you write idiotic queries like this:

SELECT id, last_name FROM some_table GROUP BY last_name;

That is not valid SQL, for good reason.

Other things that will bite you: http://andreas.scherbaum.la/blog/archives/657-PostgreSQL-9.0...

God, looking at that link makes me so sorry for anyone using MySQL. Life is too short for that.

sjwright · on Sept 24, 2011

> If you are using non-standard mysql specific syntax, that will be a bit of a pain, but it won't take more than a day or two to fix a few hundred queries.

One issue is with non-standard MySQL functions (such as inet_aton/inet_ntoa) and aggregate functions (such as group_concat). While most have great and superior pg implementations, it's a lot of work to cross-reference each one, grep the source code, and hope the pg implementation is sufficiently identical.

Another issue is that MySQL considers the 'as' keyword to be optional. I'm sure pg has good reason for requiring it, but there's probably about a thousand 'as'es to be added.

And that's about where I gave up last time, if I recall.

rimantas · on Sept 24, 2011

And what stops you from using SQL standard syntax? You can disable MySQL extensions for that if you wish so.

rudiger · on Sept 24, 2011

Why isn't that valid SQL?

ltbarcly3 · on Sept 24, 2011

Well, lets say your rows are:

id|score|last_name 1|11|smith 2|22|jones 3|33|smith 4|44|jones

Query we are pretending is valid is: SELECT id, last_name FROM table GROUP BY last_name;

What rows are returned? I expect to see something like:

?|?|smith ?|?|jones

However, what ? is isn't clear. Could we get a row like 1|33|smith ?

In SQL, all selected columns must be part of the group by or inside an aggregate.

ericflo · on Sept 24, 2011

'id' doesn't appear in the group by list and it isn't being used in an aggregate function.

I'm actually curious: what id would a MySQL user expect to be displayed for this query?

rimantas · on Sept 24, 2011

> I'm actually curious: what id would a MySQL user expect to be displayed > for this query?

MySQL discourages using this feature if columns not included in GROUP BY are not constant in the group:

  > Do not use this feature if the columns you omit from the GROUP BY part
  > are not constant in the group. The server is free to return any value
  > from the group, so the results are indeterminate unless all values are
  > the same.

Their example in documentation:

  SELECT order.custid, customer.name, MAX(payments)
    FROM order,customer
    WHERE order.custid = customer.custid
    GROUP BY order.custid;

jeffdavis · on Sept 24, 2011

"MySQL discourages using this feature if columns not included in GROUP BY are not constant in the group"

That's what errors are for, not documentation. It is pretty easy to forget something in the group by, and documentation won't help with that.

The dangerous thing is that the result returned from such a nonsense query looks valid in many cases, while being wrong in subtle ways.

PostgreSQL detects when the query is valid, and executes it if so. So, if you do a GROUP BY customer_id (a key column), you can also see customer_name without adding it to the GROUP BY list. But if you group by customer_zipcode (not a key), and try to select the customer_name, it will throw an error.

mey · on Sept 24, 2011

Ponder: Which id is returned when you have more then one last name that is the same?

HarrisonFisk · on Sept 24, 2011

There are a number of benefits:

1. InnoDB has certain optimizations that PG lacks which can make a big performance difference at the high end: index-only queries, insert buffer (or change buffer in MySQL 5.5+), clustered index

2. Lightweight connection creation: MySQL can handle many more concurrent connections and also can create new connections much faster due to threading vs. process model

3. More flexible replication: PG is catching up, but MySQL still has the edge here imo.

jvdongen · on Sept 24, 2011

1. while true, PG has other features (e.g. partial indexes) which can make a big difference as well. Completely depends on the scenario. Sometime mysql will 'win', sometimes PG. No real reason to pick one over the other, unless you know that in your situation a particular feature is a real must-have.

2. use a connection pooler (e.g. pgbouncer), problem solved, good practice anyway, even with mysql

3. what (practical) edge do you see given the replication features in 9.1?

ltbarcly3 · on Sept 24, 2011

I feel the lack of index-only queries.

As for lightweight connections, I see this as completely moot. While you might make tens of thousands of cheap connections to a mysql server, postgresql is much better at executing concurrent queries. Connection poolers like pgbouncer let you make as many cheap connections as you want if most of them are going to be idle anyway.

fauigerzigerk · on Sept 24, 2011

a) Clustered and covering indexes.

b) Non-transactional tables

Both allow you optimize the memory usage of some particularly problematic cases, specifically very large, simple tables. Postgres cannot return data directly from indexes (covering indexes). It always has to go back to the table itself to fetch the actual data. If the table is large, that can be inefficient for some types of queries.

Non-transactional tables use a lot less memory as well. For instance, if you have a large table that represents a N:M relationship (id1 int, id2 int), the two ints use 8 bytes of memory. A postgres table adds about 24 bytes per record, three times the actual data, plus some overhead per page.

Don't take this to mean that MySQL is faster than Postgres. That's not generally the case. The Postgres query optimizer is vastly better than MySQL's. So for complex queries and data models, Postgres is way superior. The big differences are always related to very specific data model and query combinations, so general benchmarks are utterly useless.

caf · on Sept 24, 2011

Unlogged tables are now in Postgres as of release 9.1: http://www.postgresql.org/docs/9.1/static/release-9-1

fauigerzigerk · on Sept 24, 2011

That's very good news! I have to look at the physical data structures and what it means for memory usage though. "Unlogged" as such only means they don't use WAL.

sheff · on Sept 24, 2011

They are not available in Postgres at the moment, but I believe that covering indexes / index only scans will be coming some time next year, as theres a patch that should hopefully go into the next release.

http://rhaas.blogspot.com/2011/08/index-only-scans-now-there...

http://archives.postgresql.org/pgsql-hackers/2011-08/msg0073...

masklinn · on Sept 24, 2011

> b) Non-transactional tables

9.1 added unlogged tables. They are completely unsafe, and quite fast.