Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was under the impression they went with a home baked id generator was to be able to encode the logical shard id into the id. I think UUID 1 encodes info about the hardware (mac address), but their data changes physical hardware as they split their data up to new shards.


Where the data ends up is not important to its identifier: the reason to have a "logical shard ID" is to provide for sequence and timestamp uniqueness.

The way to think about the problem is that at the granularity of your timestamp you lose the ability to uniquely generate identifiers across multiple nodes (on the single node, this is handled with the sequence number), so you need some kind of identifier for the instance of the generator itself: with UUIDs, they spend a relatively immense number of bits storing the computer's MAC, but if you know you know you only have a smaller-scale deployment and are able/willing to centrally plan identifiers for the deployed nodes, you can get buy with something akin to this "shard ID".

(BTW, note that a v1 UUID need not be generated with a true MAC: the specification both notes that you can just by node addresses from them for a fairly small cost, or to use fake addresses that are marked as such by setting the multicast bit. The result is that if you want to use UUID v1 off-the-shelf with "shard IDs" you are more than welcome to do so, not just in the "analogously-equivalent" sense but in the "to the letter of the spec" sense. You can find more information in Section 4.5 of the UUID specification.)


Actually they also noted that it helps with easier mapping. By including the logical shard ID in the ID, they don't need to keep a giant index of IDs-to-shards to figure out which machine an ID lives on. Just a tiny mapping of logical to physical shards, which every app server instance can cache in memory.

One less moving part at the expense of wedding yourself to a prepartitioned logical shard scheme, I guess. (I wonder how painful it would be to rebucket data into a different logical shard should the need arise...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: