But ORDER BY random() will scan the whole table and assign values. This is the p...

tmoertel · on May 31, 2023

Note that `FROM table TABLESAMPLE SYSTEM (n/N)` is a safe substituion for `ORDER BY random() LIMIT n` only when you don't need a sample that's guaranteed to be from a uniform distribution of rows. That's because TABLESAMPLE SYSTEM samples blocks, not individual rows:

> The SYSTEM method does block-level sampling with each block having the specified chance of being selected; all rows in each selected block are returned.

tedunangst · on May 31, 2023

> If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".

What if I want a different random order tomorrow?

acatton · on May 31, 2023

> > If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".

> What if I want a different random order tomorrow?

If your data is large, and you want different samples. You can always always "select * from pk > gen_random_uuid() order by pk limit 10"

You have a very small chance to get ffffffff-... and get zero elements.

This is not perfect, I admit, but ORDER BY random() is one of the most wasteful thing.

If getting random samples is very important, the best you can you is migrate to postgres and use TABLESAMPLE.

__alexs · on May 31, 2023

Then you sort descending of course /s

wolf550e · on May 31, 2023

order by stored uuidv4 will get you the same rows the next time you run it. If you sample using where random() < 0.001 (or something like that) you'll get different rows each time.

cryptonector · on May 31, 2023

> But ORDER BY random() will scan the whole table and assign values.

Er, no, it will assign values to the result set, which may not involve a full table scan.

acatton · on May 31, 2023

Good point, but in the case of OP, without condition, it will do a full table scan.

Also, any case of "WHERE filter ORDER BY random() LIMIT n" means "I want a tiny sample from a broad filter". So it might not be a full table scan, but it will still assign value to a lot of rows.

cryptonector · on May 31, 2023

> Good point, but in the case of OP, without condition, it will do a full table scan.

And that has nothing to do with the ORDER BY random().

tremon · on June 1, 2023

Yes it does, because of LIMIT N. An unordered result set will stop after the first N matches.

cryptonector · on June 2, 2023

Now that is correct.

hans_castorp · on June 1, 2023

> If you use another SQL database

Just a side note: TABLESAMPLE is part of the SQL standard, and e.g. Oracle supports this as well.