ORDER BY random() is a really un-optimized bad practice to get a random sample on PostgreSQL. If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".
Note that `FROM table TABLESAMPLE SYSTEM (n/N)` is a safe substituion for `ORDER BY random() LIMIT n` only when you don't need a sample that's guaranteed to be from a uniform distribution of rows. That's because TABLESAMPLE SYSTEM samples blocks, not individual rows:
> The SYSTEM method does block-level sampling with each block having the specified chance of being selected; all rows in each selected block are returned.
order by stored uuidv4 will get you the same rows the next time you run it. If you sample using where random() < 0.001 (or something like that) you'll get different rows each time.
Good point, but in the case of OP, without condition, it will do a full table scan.
Also, any case of "WHERE filter ORDER BY random() LIMIT n" means "I want a tiny sample from a broad filter". So it might not be a full table scan, but it will still assign value to a lot of rows.
This is the plan I get:
One should use "FROM table TABLESAMPLE SYSTEM (size)", this is the plan: You can also use different distribution method than "SYSTEM", and you can make it reproducible with a seed.https://www.postgresql.org/docs/current/sql-select.html#SQL-...
ORDER BY random() is a really un-optimized bad practice to get a random sample on PostgreSQL. If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".