Is anyone at all surprised their tech-stack is php? is it because of legacy or is it because any sensible person moving petabytes of data would use? or does it even matter
As someone who used to work over there (Pornhub, Youporn, ...), it is not a question of legacy.
If you use PHP the way it's meant to be used, you are not gonna have any surprise, and it'll run faster than the alternatives (or close too), for lower development time, as well as easiness in finding developers.
Also, the article is a bit off on some points, a website like Pornhub (100Million+ pageviews/day), is on the most standard stack you could imagine: PHP, Apache, MySQL, Memcached/Redis. Varnish get mentioned a lot, but when I was working there (not so long ago), it was not in use, and as far as I know Youporn might be the only one relying on it right now.
If you know what you are doing with PHP, you will have no surprise, no performance issue, and maintenance will be trivial. But sadly I have to admit few PHP developers actually use PHP the way it should be.
PHP actually seems like a good balance in terms of server support vs. (what I can guess of) the application requirements. It's brain-dead simple to run and if you weren't precomputing everything a page needs for response-time reasons the language would push you towards doing so anyways.
I wouldn't be surprised if they're using PHP for what it was originally meant to do (add a thin layer of dynamic-ness to straight html) and precomputing all of the data it uses in something else.
PHP is not particularly known for fast and no, you wouldn't expect PHP to move petabytes of data. The normal approach for moving data with PHP is to readfile() the whole file into a byte string in memory before echo()ing it to the user; doing something chunked, incremental, and seekable is probably just as much difficulty in PHP as in Perl. It also isn't legacy -- as they say, they switched from Perl to PHP. (Although, it might be. They may have switched to PHP just for the library MySQL functions, or perhaps they wanted to switch to Nginx and couldn't get it at the time to run exactly how they wanted with Perl -- either way, it could be the case that now they're staying with it for legacy reasons.)
I can think of some special cases where PHP would be better, especially in a porn site's case -- the most common clicks are front-page links and there are probably a bunch of common keywords and clicks to links off the first page of those searches, which means that caching whole pages is probably economical. As far as I know, both Perl and PHP are identically suited to talking with upstream caching proxies, but PHP might have felt more natural for day-to-day feature development.
Most of the html content will be pushed out by Varnish. PHP just generates the most popular pages once before Varnish takes over. As for pushing out videos, I doubt they're using PHP readfile(). They're probably serving it out of a CDN.
It was recently (2011) rewritten on top of Symfony2 -- http://symfony.com -- and more importantly a modern, documented, and stable framework. It was likely done because finding quality PHP programmers is easier than finding quality Perl programmers.
I'm assuming you are talking about YouPorn. I'm surprised they went from Perl to PHP. But they obviously know what they are doing. I wonder if they use HipHop at all.
Like stated before, most adult website owners are average joes, and PHP being easy to learn with a low entry barrier it was the logical reason to get a site up and running fast. Also because many of the tools written for the adult industry were done in PHP. Just like ICQ it won't ever be replaced as the standard for the industry.
From what I recall its essentially how people do business. Everything from technical support with adult oriented hosting companies to making deals to sell sites/traffic/etc or talking to sponsors about promotions/etc. Plus a great deal of just general bs-ing. You have to remember outside of maybe a dozen large companies, most adult sites are just run by 1 or 2 guys so its their form of talking at the water cooler.
Basically this... And so many old school peddlers have stuck with ICQ and it's just easier for them to keep using it instead of adopting a new modern form of instant messaging.