Always use a UUID or GUID (same thing) in todays web applications. When we get to terrabytes of data as the norm for apps int based keys are going to seem so ancient.
Sure it takes more room but space it cheap, and it makes syncing and distributed databases so much easier (not tied to one machine or cluster)
I suspect for a lot of people coding against Twitter the notion of integers being signed or unsigned is about as foreign to their experience as pointer arithmetic. I do not consider that a bad thing.
irb > "2,147,483,647".gsub(",","").to_i + 1
2147483648
(Forgive the extra code necessary to strip out the commas. The Windows irb console leaves a little to be desired in terms of midline editing.)
It's such a low barrier of entry and almost non-existant learning curve to code for Twitter, so I expect quite a lot of services to have made that mistake. Not the big ones, though.
"Values updated every 5 minutes". Looking at the source (I don't actually know JavaScript, but it seems pretty simple) I'm assuming that means they get the current TwitID AND the current rate of increase.
To be fair, they say "For some of your favorite third-party Twitter services not designed to handle such a case..." rather than accusing Twitter themselves of primary-key ineptitude.
I'd also expect most of those numbers not to be used. Their db writes are probably not contending on one monotonic counter. They're probably using some numbering scheme which guarantees uniqueness across multiple writable databases, but not global serialisation based on id.
I'm not sure how they'd be doing that and still giving each tweet a number so close to all the others. I have no idea where you see a "public timeline" as people mentioned above, so I just did a search for "i"... the following tweets were about a second apart.
2088840357 2088840362 (difference of 5, and that's without a real public timeline where you could easily prove each tweet takes the next number in a sequence)
So it isn't as if they're using UUIDs or something unique... they very much seem to be relying on a single counter (currently).
For a first guess, if there were 5 writing dbs, each could write every 5th id, starting at a different index (0,1,2,3,4).
You'd end up with globally unique ids, and as long as your writes were (approx.) evenly distributed, all 5 sequences would be (approx.) close to each other.
Who knows, they might have 100 writable dbs, with only 70% up at any point in time, meaning only 70% of all integers are actually used.. anyway, I don't know, I can just imagine them doing something like this if they wanted certain things from the system.
Assuming that Rails is still interacting with the DB, and the ID in question is the primary key (ie, the 'id' on the 'twits' table) isn't quite that easy.
I had to do this a couple of years back when I realized I had a table that'd grow quickly:
I'm actually interested in this answer as well. I know of a company using a proprietary db and now that their growth has literally exploded, they're stuck with it.
Sure it takes more room but space it cheap, and it makes syncing and distributed databases so much easier (not tied to one machine or cluster)