Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But ORDER BY random() will scan the whole table and assign values.

This is the plan I get:

    Limit (cost=56.39..56.41 rows=10 width=40)
    -> Sort (cost=56.39..59.79 rows=1360 width=40)
    Sort Key: (random())
    -> Seq Scan on test (cost=0.00..27.00 rows=1360 width=40)
One should use "FROM table TABLESAMPLE SYSTEM (size)", this is the plan:

    Sample Scan on test (cost=0.00..5.36 rows=136 width=32)
    Sampling: system ('10'::real)
You can also use different distribution method than "SYSTEM", and you can make it reproducible with a seed.

https://www.postgresql.org/docs/current/sql-select.html#SQL-...

ORDER BY random() is a really un-optimized bad practice to get a random sample on PostgreSQL. If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".



Note that `FROM table TABLESAMPLE SYSTEM (n/N)` is a safe substituion for `ORDER BY random() LIMIT n` only when you don't need a sample that's guaranteed to be from a uniform distribution of rows. That's because TABLESAMPLE SYSTEM samples blocks, not individual rows:

> The SYSTEM method does block-level sampling with each block having the specified chance of being selected; all rows in each selected block are returned.


> If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".

What if I want a different random order tomorrow?


> > If you use another SQL database, the best thing to do is to use UUIDv4 as primary key, and use "ORDER BY pk LIMIT size".

> What if I want a different random order tomorrow?

If your data is large, and you want different samples. You can always always "select * from pk > gen_random_uuid() order by pk limit 10"

You have a very small chance to get ffffffff-... and get zero elements.

This is not perfect, I admit, but ORDER BY random() is one of the most wasteful thing.

If getting random samples is very important, the best you can you is migrate to postgres and use TABLESAMPLE.


Then you sort descending of course /s


order by stored uuidv4 will get you the same rows the next time you run it. If you sample using where random() < 0.001 (or something like that) you'll get different rows each time.


> But ORDER BY random() will scan the whole table and assign values.

Er, no, it will assign values to the result set, which may not involve a full table scan.


Good point, but in the case of OP, without condition, it will do a full table scan.

Also, any case of "WHERE filter ORDER BY random() LIMIT n" means "I want a tiny sample from a broad filter". So it might not be a full table scan, but it will still assign value to a lot of rows.


> Good point, but in the case of OP, without condition, it will do a full table scan.

And that has nothing to do with the ORDER BY random().


Yes it does, because of LIMIT N. An unordered result set will stop after the first N matches.


Now that is correct.


> If you use another SQL database

Just a side note: TABLESAMPLE is part of the SQL standard, and e.g. Oracle supports this as well.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: