Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Twitter's UID associated to each tweet will very soon exceed 2,147,483,647 (twitpocalypse.com)
66 points by timothychung on June 9, 2009 | hide | past | favorite | 43 comments


Always use a UUID or GUID (same thing) in todays web applications. When we get to terrabytes of data as the norm for apps int based keys are going to seem so ancient.

Sure it takes more room but space it cheap, and it makes syncing and distributed databases so much easier (not tied to one machine or cluster)


I figure many twitter apps are written with SQLite as their database (in AIR, etc) so the Integer, Primary Key, can get up to 9223372036854775807

http://www.sqlite.org/faq.html

What is the point of creating a website like this?


Maybe I'm just paranoid, but I index the tweet id's in Tipjoy with a string: tweetId = models.CharField(max_length=50, db_index=True)


Anyone notice the footnote on the estimated time of crossing?

"3 the average tweets per second and the Twitpocalypse date are a rough approximation, please do not schedule any vacation around that date"


Meh, anyone who's coding for this platform must have put those id's in a unsigned integer, wouldn't they?


I suspect for a lot of people coding against Twitter the notion of integers being signed or unsigned is about as foreign to their experience as pointer arithmetic. I do not consider that a bad thing.

irb > "2,147,483,647".gsub(",","").to_i + 1

2147483648

(Forgive the extra code necessary to strip out the commas. The Windows irb console leaves a little to be desired in terms of midline editing.)


you know ruby supports _ as a comma separator, right?

irb(main):002:0> 2_147_483_647 + 1

=> 2147483648


It's such a low barrier of entry and almost non-existant learning curve to code for Twitter, so I expect quite a lot of services to have made that mistake. Not the big ones, though.



It's actually sooner. They assume 83 tweets per second (reference 2) but the number is 200 tweets per second: http://friendfeed.com/scobleizer/50e673d8/some-stats-from-tw...


From the link:

"Twitter is seeing about 200 tweets per second, during peak loads. - Robert Scoble"

Note "during peak loads", so 83 is probably not too far off.


Fair enough. Do we have any solid references of the actual average rate of tweets?


I haven't seen one, but it's easy enough to get it. Just check the UID of the latest tweet on the public timeline, wait a week and check again.


They must read HN. It now says:

2 at a rate of 151 tweets per seconds


"Values updated every 5 minutes". Looking at the source (I don't actually know JavaScript, but it seems pretty simple) I'm assuming that means they get the current TwitID AND the current rate of increase.


In the age of multicore 64 bit CPUs, they're not using signed 32 bit integers as primary key, do they?


To be fair, they say "For some of your favorite third-party Twitter services not designed to handle such a case..." rather than accusing Twitter themselves of primary-key ineptitude.


I think the problem is not in twitter itself but in the services using their APIs.


Twitter is using an unsigned 64-bit bigint according to this:

http://twitter.com/twitterapi/status/2048659057


I'm going to put money down that this will not affect anything.


That is quite an amazingly high number. Surely 100m people have not posted 2,000 "tweets" each.

I wonder what proportion of the total is human-typed, and how much is machine generated?


Surely 100m people have not posted 2,000 "tweets" each.

Two billion tweets would be 1M people posting 2k tweets each, or 100M people posting 20 tweets each.


Oh. Oops. Of course you are right. That was pretty dumb of me : /


Amazingly high? considering how long twitter has been going? It's extremely small IMHO.

http://news.ycombinator.com/item?id=648725

"Just 10% of Twitter users generate more than 90% of the content, a Harvard study of 300,000 users found."

I'd expect most of it is machine generated.


I'd also expect most of those numbers not to be used. Their db writes are probably not contending on one monotonic counter. They're probably using some numbering scheme which guarantees uniqueness across multiple writable databases, but not global serialisation based on id.


I'm not sure how they'd be doing that and still giving each tweet a number so close to all the others. I have no idea where you see a "public timeline" as people mentioned above, so I just did a search for "i"... the following tweets were about a second apart.

2088840357 2088840362 (difference of 5, and that's without a real public timeline where you could easily prove each tweet takes the next number in a sequence)

So it isn't as if they're using UUIDs or something unique... they very much seem to be relying on a single counter (currently).


For a first guess, if there were 5 writing dbs, each could write every 5th id, starting at a different index (0,1,2,3,4).

You'd end up with globally unique ids, and as long as your writes were (approx.) evenly distributed, all 5 sequences would be (approx.) close to each other.

Who knows, they might have 100 writable dbs, with only 70% up at any point in time, meaning only 70% of all integers are actually used.. anyway, I don't know, I can just imagine them doing something like this if they wanted certain things from the system.


No. Because their API exposes an explicit ordering of the IDs - there is a sinceID: parameter.


You can get the public timeline from here: http://twitter.com/public_timeline

Note that one can delete a message, which could explain some gaps.


I think the gaps are mostly people posting protected tweets, which thus don't get into the public timeline.


They do something like that with userid's. Used to be incrementing and now they have gaps.


ALTER TABLE twits MODIFY COLUMN twitid BIGINT UNSIGNED NOT NULL

There, fixed, send check for $20 to donations@redcross.org

Call me back when they reach 18446744073709551615


Assuming that Rails is still interacting with the DB, and the ID in question is the primary key (ie, the 'id' on the 'twits' table) isn't quite that easy.

I had to do this a couple of years back when I realized I had a table that'd grow quickly:

http://snippets.dzone.com/posts/show/4422

Feel free to send the donation to the same place.


Is there no easier way then this? Can't believe that wasn't integrated into rails...


"Can't believe that wasn't integrated into rails..."

You just made my day. I haven't laughed so hard in ages ;)


Hmm, didn't fix my iPhone client. Try again!


That command will take several hours to complete and write lock the table.


Unless you use PostgreSQL.

But then you would have had 64 bit ID by default anyway.


Why can't you just redirect the writes to a new temporary table and join the original and temporary on reads? (I'm not a database admin)


I'm actually interested in this answer as well. I know of a company using a proprietary db and now that their growth has literally exploded, they're stuck with it.


I wonder what literally exploding growth looks like...



Nothing the fail whale can't fix ;-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: