"YouPorn is a beast, streaming three full DVDs of video every second (900TB/day, like Netflix), handing 300K queries every second, and generating up to 15GBs of log data per hour."
Serious misconception: it's just a couple of boxes and two dudes, nothing more. It runs itself. CDN FTW!
And the only thing a CDN will help you with in this case, is offloading CSS, images and JS. You can't put that much streaming content up unless you host it yourself or want to spend every penny you have.
And the only thing a CDN will help you with in this case, is offloading CSS, images and JS. You can't put that much streaming content up unless you host it yourself or want to spend every penny you have.
This is utter nonsense.
It is the nature of YouPorn's UX that the vast majority of requests are for the first couple pages of data. You don't have to put all the content on the CDN, only the part that represents 80-90% of your traffic. If you have a pull-based CDN you don't even need to plan it; the CDN automatically populates itself with what it considers a reasonable working set.
Updated: I should add, I designed Kink.com's modern porn-serving architecture back in 2007. Prior, it ran off of 20 apache httpd boxes at 365 Main. Now it runs off of a handful of appservers, a couple MySQL boxes, and a lot of CDN capacity... on vastly more traffic. Believe me when I say there's no reason that the bulk of YouPorn's traffic couldn't be served off of one or more CDNs.
This is really in reply to powertower's statement (above or below since I couldn't reply directly to the comment) that they aren't using a CDN for their content. Here's the source domain for the content on one of their videos:
cdn1.public.youporn.phncdn.com
This domain resolves out to:
cdn1.public.youporn.phncdn.com.swiftcdn1.com
Which is hosted by a CDN company called SwiftWill.
Besides, the article you referenced says that they are using nginx to act as an external engine for static content such as css, js, etc.
According to the info, that's all YouPorn uses CDNs for (page assets minus video). That might, or might not have changed recently.
I'd imagine that paying extra for shaving a few 10s of milliseconds off latency might not really be much of a benefit in this type of a business, they are not doing VoIP phone calls. I'd imagine having fat pipes on a decent tier is #1 here.
The point of using a CDN (in this case) is not to reduce latency. The problem is that it becomes exponentially more expensive to serve high data rates out of a single data center. Basic infrastructure like switches and loadbalancers start to get crazy expensive, as do their support contracts. Also, it requires a lot of fairly rare (and highly-paid) expertise to set it up.
Distributed CDNs are like the RAID of content serving. Each node can be simpler, cheaper.
Another bonus of using CDNs is that you're in a great negotiating position. If you're serving 80% of traffic through one and 20% through another, you can flip it around the moment one offers to shave a percent or two off the price. I've had people in the sales department of the formerly-80% side notice the traffic drop and suddenly call up with counteroffers. In contrast, getting someone to draw fiber cables across the datacenter usually requires a lot of onetime expense and long-term contracts.
I'd be really curious what kind of CDN deal they're getting.
At regular CDN rates you're looking at ballpark $150k/month for that kind of traffic (rather optimistic extrapolation from my own rates...).
Also the figures remain mind-boggling regardless how you slice them. 900T/day breaks down to a healthy ~80 GBit/s average. That's more than most mid-sized datacenter uplinks (plus conveniently ignoring any bell curves they may have).
Yup that seems more realistic (my estimate was too optimistic then). Works out to around 2.2ct/GB. Personally I haven't seen a CDN quote below 10ct/GB, but we also measure our traffic in TB/month, not TB/day.
http://highscalability.com/blog/2012/4/2/youporn-targeting-2...
Slides: http://tinyurl.com/7bckqm8
(from https://joind.in/6123)
Serious misconception: it's just a couple of boxes and two dudes, nothing more. It runs itself. CDN FTW!
And the only thing a CDN will help you with in this case, is offloading CSS, images and JS. You can't put that much streaming content up unless you host it yourself or want to spend every penny you have.