Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GPUs for Google Cloud Platform (googleblog.com)
317 points by hurrycane on Nov 15, 2016 | hide | past | favorite | 94 comments


Kudos to Google for making moves here. Having spent the last year+ tackling GPUs in the datacenter, super curious how custom sizing works. It's a huge technical feat to get eight GPUs running (let alone, in a virtual environment), but the real challenge is making sure the blocks/puzzle pieces all fit together so there's no idle hardware sitting around There's a reason why Amazon's G/P instances require that you double the RAM/CPU if you double the GPU. Another example would be Digital Ocean's linear scale-up of instance types. In any case, we'll have to see what pricing comes out to.

Shameless plug, if you want raw access to a GPU in the cloud today, shoot me an email at daniel at paperspace.com We have people doing everything from image analysis to genomics to a whole lot of ML/AI.


Paperspace looks pretty awesome. Given the exorbitant expense of the new MacBook Pros, I'm thinking of getting a iPad Pro instead, except that you still can't really do dev work conveniently on an iPad.

This looks like it might be a cool solution to that problem. Use iOS most of the time (I'm a writer) and the Paperspace VM on the iPad for coding (I'm also a wannabe developer).

Only fly in the ointment is that it's not available in Europe, and I'm worried about latency over a 4g connection.


For $25/month, just rent a dedicated server in Europe (or even just put in your basement if your network connection is sufficiently sized). RDP with HyperV has everything you need for that sort of usage (RemoteFX).

For coding, you don't even need a GPU. Just use RDP. Big companies use this all the time (thin clients + central server farm, also called VDI).


I tend to prefer Linux on the server, and I've tried the iPad + tmux + vim route on a remote server and the experience hasn't been great. It's usually latency problems that trip me up. I travel a lot and rely on tethering to my phone's 4g connection, which mostly works but can be quite frustrating.


Living permanently on 4G I couldn't get by without mosh. I have managed to do weeks of work with most+tmux+vim at times when ssh was too painful to use.


On mosh's note. It's great experience for slow connection. Totally agree. There is new ios client recently released client with mosh support build-in [1]. Happy user

[1] http://www.blink.sh/


But Paperspace is Windows only.

If you want Linux remoting, I recommend either x11vnc or xrdp.


Paperspace looks awesome, but I found the pricing on the homepage misleading.

It says in big font "Starting at $5/month". Then I clicked "Pricing" and the cheapest I see is "$15/month". What's the $5 plan?


Maybe the $5 means hourly usage (which has a $5 fixed monthly storage cost) and then not actually using it? Strange indeed, but looks very interesting!


Does paperspace suffer from network latency?


Yes and no. There was minor lag ... enough to be a slight nuisance when I was connected over wifi...

There was 0 lag when connected directly over the ethernet. It was seamless.

Though I should add that the paperspace server I was connecting to was located less than 50 miles from my laptop.


>Your password is too weak

This better be worth it! :P


Would you also have anything available in Europe?


> Google Cloud GPUs give you the flexibility to mix and match infrastructure. You’ll be able to attach up to 8 GPU dies to any non-shared-core machine...

Wow, that's impressive. One thing I've loved about GCE has been the custom sizing. This takes it even further, so we don't have to buy what we don't need.

Looking forward to seeing the pricing on this. Looks like they're going to heavily compete with AWS on this stuff.


Yes, I foresee a lot of 8 vCPUs VMs with 8 giant GPU dies attached to them. And that's just fine by us!

Disclosure: I work on Google Cloud (and want to sell you some GPUs!)


One of the HUGE advanatages of GCE/AWS is that they will gobble up 100% of the unused resources for their own computation. Nothing is wasted, and the machines basically pay for themselves.

Compare this was something like oracle, which simply can't consume the unused resources in order to discount the hardware effectively. They can't beat GCE/AWS at the cloud game until this changes.


I'm not sure if Amazon has enough batch-processing type work to make this effective. You can't run amazon.com with these resources because their needs are inflexible.

Actually not sure about Google either – their preemtible instances are sold at an 80% discount, so that puts an upper bound on the usefulness of these machines for them.


Amazons largest "batch" job is their holiday shopping season. They must purchase the horsepower to get them through that season, and then the rest of the year they can resell it. I believe this was the original motivation for AWS.


A coauthor of the original proposal for AWS denies that:

http://www.networkworld.com/article/2891297/cloud-computing/...


If it was, they'd have to shutdown AWS for the holidays.


It's really not about the platform being a "buyer of last resort" of the unused capacity. Think about the narrow band of values you need for that to work. Google must value these compute jobs highly enough to have them offset the unused capacity...But they can't value them so highly that they're worth buying hardware to make sure they run. And this has to remain true, even as the hardware gets much cheaper?

Here's how I think AWS and GCE work...

Imagine you buy some hardware. You have n=1 users, and so the variance on your compute needs is very high. You buy for peak use, so your median utilisation is low.

Now imagine you have n=10,000,000 users. Again, you stock for peak use...But the variance is now much much lower. Your median utilisation is now much better.


I built a system at google called Exacycle which used idle cycles for computation. You can think of my system as the user of last resort. We ran MD simulations, drug docking, and ray tracing simulations of the LSST telescope (along with fundamental mathematical problems), then published the results from this for free. The economics of idle cycles in a large production environment aren't trivial to analyze, and anybody outside of Google speculating is likely to make a mistake.


Isnt this what folding@home and similar services have been doing for decades now?


That's correct. in fact, we were the largest (in terms of total simulation seconds produced) provider of resources to Folding@Home and worked with Vijay Pande closely on this effort.


I think there's a better way to think about economies of scale (looking at it through the lens of Google Cloud).

Let's quantify your workload as "resource-seconds", what many of Google's services (and some AWS services, like Lambda) allow you to do is maximize resources and minimize seconds (assuming efficient parallelization of workload, which is afforded by Google's super-awesome Networking). Cost is the same to you, but you get your results faster.

This is afforded by the scale and by the multi-tenant nature of underlying services. This model can only work when there's lots of folks using such a service.

(work on Google Cloud Big Data)


Did you mean to say GCE/AWS could gobble up "unpaid for" resources for their own computation?

Curious because if I pay for using a P100 for 2 hours - it is 100% available to me and not virtualized of any sort right?

However, if I use it only for 2 hours that day, and no one else were ready and waiting to buy the time on those, then Google uses it internally until...


From our post:

> GPUs are offered in passthrough mode to provide bare metal performance. Up to 8 GPU dies can be attached per VM instance including custom machine types.

When you're using it, it's 100% yours.

Disclosure: I work on Google Cloud (and pitched in on this!)


Can you explain this in more detail about GCE/AWS bs Oracle.

What "unused resources" are used for "their own compitation"?

How does this "pay for itself"?


He's saying they have their own apps (like google search) that can use vps that aren't currently rented to do their own computations (like pagerank etc).


I'm curious to know what workloads you think Google is running on the same hardware that cloud customers are using.


What an odd question. Seems like training a new voice recognition model, or image classifier, or any of a zillion other research projects google is working on would be perfect for any unused capacity.

I've never worked at google. Do they just let each team go with their own crazy hardware setup? Or maybe two separate systems, one internal and one external? That seems a little wasteful (they have the money, so whatever).

But wouldn't programming against a big unified api be a bit more, well, sane? I can see really sensitive stuff locked away on its own hardware, but the search front end? why not serve some JS or provide chrome downloads from a giant pool of common hardware?

Things grow organically, and they might be technically locked into a specific layout right now, that's totally understandable. But, unused capacity is just capital depreciating steadily away with no gain. Soaking up every spare cycle doing something useful should probably be somewhere in the company goals.


It's not odd from a security perspective. There is a reason the CIA pays massive amounts of money to Amazon for a completely separate set of hardware from the public. Hypervisors aren't perfect and you might not want customers to be one kvm exploit away from perusing someone's Gmail being processed on the same machine.


Like I said, I'm just curious. Google has so much capacity that I'm not sure it would be important for their bottom line to move it between Google and customer workloads in quanta smaller than one physical machine. Your theory also requires Google to have a substantial standby workload that is currently not scheduled, doesn't it?


In my very limited experience, trying to answer research questions can soak up all the computation you can throw at it. Consider something like the traveling salesman problem. The only answer is to check every path. There are heuristics to get good answers, but you never know if they're optimal.

You can get an answer on your laptop in an hour. You'll probably get a better answer throwing 1000 hours at your algorithm though, and a better answer throwing 10000 hours.

You can do a lot of neat things if p=np. Since no one knows if that's true, we're stuck with bad big O. More computer time is the only way out right now.

The point is, I'd be disappointed in google researchers if they didn't have a huge amount of unscheduled workload.


There are some questions though where the saved energy from not running the batch jobs is worth more than the answer (e.g. searching for the next prime).


We (google) ran numerous theory problems on Google's computing platform via the Exacycle program. Peter Norvig convinced me this was a bad idea, but mainly to say that time was better spent on disproving the conjecture using pure math, not search. http://norvig.com/beal.html

"But Witold Jarnicki and David Konerding already did that: they wrote a C++ program that built a table of Cz(modp)Cz(modp) up to 5000500050005000 , and, in parallel across thousands of machines, searched for A,BA,B up to 200,000 and x,yx,y up to 5,000, but found no counterexamples. On a smaller scale, Edwin P. Berlin Jr. searched all CzCz up to 10171017 and also found nothing. So I don't think it is worthwhile to continue on that path."


All kinds of CPU intensive work. Isn't that most work?


No, a lot of work is IO bound. That's why M:N threading (e.g. Go's coroutines, python's gevent) is so popular.


All of them.


source?


Kudos for Google and happy to see that at least in principle AMD is still an option.

I wonder what kind of device driver does GCE use with AMD, the new ROCm?

What about Power8 + NVLink harware? Does anybody know if the current NVIDIA GPUs, in particular the P100s are all on x86?


I'll just answer your driver question: you bring whatever you want. Our VMs just boot off of bytes you have stored in your image (which is in turn just stored on GCS). If those bytes happen to be Debian 8 with some AMD driver that supports the 9300, awesome ;).

We will be working with both vendors to make sure we highlight which drivers work most reliably on our stack though (virtualization + GPUs is way too rare!). We want to work closely with both vendors to do the qualification, so you at least know what is known good.

Disclosure: I work on Google Cloud.


Thanks for the explanation, that's what I guessed -- which has both good & bad aspects. The good is the flexibility, the bad is that:

> If those bytes happen to be Debian 8 with some AMD driver that supports the 9300, awesome ;).

happens to support might alone not be enough given how much difficulties AMD has been having. I just hope that there will be enough customer demand for their SPFlop cruncher GPUs for interested parties -- including Google -- to pitch and contribute to their ROCm stack.


When is Kubernetes support going to be added? Right now it has it only in a very basic form for NVIDIA GPUs.


There is a PR[1] to extend the current alpha GPU implentation. Hopefully it will get merged soon.

[1] - https://github.com/kubernetes/kubernetes/pull/28216


That's still NVIDIA-only, plus you can't even tell a P100 from a K80 for scheduling purposes.


This is the reason node-feature-discovery exists. Kubernetes needs an extensible way to detect and label hardware features: See: https://github.com/kubernetes-incubator/node-feature-discove...


I'd just use a separate NodePool and match your jobs using a selector as appropriate.


So, is gci going to come with the right drivers?


Very very happy to finally see AMD GPUs in cloud.


I love to root for an underdog. But I've been pulling for AMD for years and they just can't get their linux support to the same level of NVIDIA's. I speculate that they have undiagnosed design errors in their driver.


You should check out ROCm

http://hothardware.com/news/amd-rocm-13-adds-support-for-pol...

They are revamping their entire linux stack for HPC, heterogenous compute.


Fool me several times, shame on me.... ;) Really -- ever since the ATI acquisition I thought AMD could make it happen. And it's true that they have come a long way since then.

"New Linux Driver and Runtime Stack optimized for HPC & Ultra-scale class computing" sounds great, let's hope this time they can do it.


:) I've only been deeply involved with compute for the past two years, so I am still an optimist. I think AMD pretty much has no choice but to get it right this time: they need server and HPC dough to stay afloat.


Still no OpenCL, still only promised as a "preview" for mid Dec.

I'm still very worried that AMD will fail, not because of the hardware, but due to their lacking software stack.


This amazing!!! First cloud provider to have P100! Amazing opportunities ahead with compute power like that.


What about this[1]?

"GPU-Accelerated Microsoft Cognitive Toolkit Now Available in the Cloud on Microsoft Azure and On-Premises with NVIDIA DGX-1"

DGX-1 is powered by 8 x P100

[1]http://nvidianews.nvidia.com/news/nvidia-and-microsoft-accel...


AFAIK the DGX-1 is for running on premise, not on the cloud



Didn't know the added those recently. I knew that had Titan X's and K80s. This is awesome too.

Nice to have a full cloud provider with a lengthy feature set AND P100s though.


First with the P100, lets see if they can be the first with full IPv6 support at the instance and/or container level! (I'm not holding my breath)


What's the assurance like regarding security against other concurrent users on the same hardware? Historically multitenancy with GPUs has been quite iffy and not much security research around, even if there theoretically are IOMMU's.


We care a lot about security, that's part of the motivation for passthrough instead of any of the "split up a GPU". So as I quoted above from our post:

> GPUs are offered in passthrough mode to provide bare metal performance. Up to 8 GPU dies can be attached per VM instance including custom machine types.

So when your VM boots up, the GPUs you attach will not be shared with others. I agree that the state of GPU virtualization is not yet where it needs to be to make me personally comfortable enough.

Disclosure: I work on Google Cloud (and worked on this a little).


Now it would be great if kubernetes on GKE would work nicely with GPUs. It's still in the works: https://github.com/kubernetes/kubernetes/blob/master/docs/pr... .


Awesome news! The Tesla P100 is a monster, this will push ML development to new heights.


Is there any public access to the TPUs?


Not direct access, yet (they're not sufficiently programmable and isolatable). But as mikecb alludes to, we have a number of services like Translate, Vision, Speech, and a few others in Alpha that may be trained using TPUs (see https://cloud.google.com/ml for more).

For models you write yourself and have the Cloud ML service train, they will use our regular VMs today and as this blog post says, GCE will soon have GPUs (and therefore, Cloud ML will expose them, too).

Disclsoure: I work on Google Cloud (and sorry for the confusion!)


If you use the cloud ml service.


Really? I was under the impression they just use standard compute instances to power Cloud ML?


This happened some months ago ... how does it compare? Anyone in the know can pitch in on a short comparison?

https://aws.amazon.com/about-aws/whats-new/2016/09/introduci...


Amazon: up to 16 K80s and ~700 GB of RAM

Google: up to 8 K80s and ~200 GB of RAM but much more flexible


There are a lot of excited posts here about this announcement! For someone that doesn't use GPU's in everyday life, can someone explain why this is great and maybe touch on the current landscape around GPU usage and the cost landscape?


The excitement comes from the fact that some workloads benefit greatly from GPUs -- mostly those where the GPU arch offers some niche-advantage that can give GPUs >=10x perf[/W]. A good example is deep learning where Intel is struggling to compete [1], and even in linear algebra there is a 4-5x power efficiency difference [2, slide 6].

At the same time, in many other complex applications the performance/W/$ advantage has not been all that clear, especially since GPUs have been quite behind in process technology. A good example is a simulation code I work on where a 2-4x speedup from GPUs, which (due to inherent overhead of using accelerators) gradually vanishes in strong scaling, means that when price is considered, only cheap consumer cards can imrpove the performance/buck metric significantly (that's time-to-solution/buck in our case), professional cards don't offer significant significant improvements other than performance density [3, figures 6,7].

There are plenty more examples that NVIDIA collects [4], but always take the claims with a grain of salt, especially the ones with "incredible" >10x speedup claims :)

Last, I'd note that with the recent 14-16nm jump, the gap between the traditional CPUs and the simpler accelerator architectures has increased (I've just seen MAGMA BLAS on Tesla P100 results which show >10x GFlops/W, more than double that of the previous arch) and I expect it to keep increasing partly due to the manufacturing/process technology gap shrinking between Intel and the rest and partly due to architecture and programmability improvements of accelerators.

[1] https://www.nvidia.com/object/gpu-accelerated-applications-t... [2] http://on-demand.gputechconf.com/gtc/2015/presentation/S5476... [3] https://www.academia.edu/13753737/Best_bang_for_your_buck_GP... [4] https://www.nvidia.com/object/gpu-applications.html


GPUs for certain workloads are much more powerful than CPUs, meaning you can get stuff done much quicker.

Machine Learning is one such field, batch-rendering for CG studios etc is another.

You could throw 100 CPUs at something that perhaps 1 GPU might do in the same time (made up numbers but it gives you an idea of why people want it)


Is it possible to have a non-shared machine? Is it virtualized anyway?


Even if we had a "You get this box all for you", we'd still run our hypervisor on it. We wouldn't give someone a raw, unfettered machine in our datacenter (and I hope you wouldn't either!).

What is your concern with virtualization in this context?

Disclosure: I work on Google Cloud (and for a long time specifically Compute Engine).


> We wouldn't give someone a raw, unfettered machine in our datacenter (and I hope you wouldn't either!).

I think you're alluding to security issues with renting bare metal hardware to potentially untrusted customers, right? I've often wondered about the security implications of that - think backdooring BIOS/UEFI or other component firmware. But there are many large providers that rent dedicated hardware without a hypervisor. Are they just ignoring these concerns, hoping that this kind of tampering would be rare anyway?

> What is your concern with virtualization in this context?

I'm not the parent poster, but performance was probably the concern. There's still some (marginal, these days) overhead with virtualization in general, and VT-d specifically.


From our perspective: yes, we don't feel comfortable letting untrusted users run side by side without a sandbox.

In this context though of GPU compute, you are just talking sporadically to a PCIe attached device directly. There isn't any virtualization or sharing of the device. And yes, I assumed that the complaint would be performance, but especially with just a pass through GPU, it's going to be honestly funny to find the right "oh wow this is 20% slower because virtualization specifically".


> From our perspective: yes, we don't feel comfortable letting untrusted users run side by side without a sandbox.

That's what I figured. I still wonder what other large providers (SoftLayer, OVH, etc.) do about this. Unfortunately, I suspect the answer is "nothing, we just hope/assume nothing will happen."

> In this context though of GPU compute, you are just talking sporadically to a PCIe attached device directly.

I know; I'm familiar with passthrough. It's not completely "free," but the overhead is very minor. I don't think anyone will be able to complain with a straight face about its performance. :P


> But there are many large providers that rent dedicated hardware without a hypervisor. Are they just ignoring these concerns, hoping that this kind of tampering would be rare anyway?

Yes. You can try to mitigate the risk a bit, but it's still there. Just think of all the other firmware you can update, not just UEFI.

Also, for large contracts, rented servers are usually bought new and not re-used for multiple reasons (this is why providers have used server markets!).

Virtualization overhead is so low nowadays that it's not worth the risk. Provisioning is much harder, too.


> Virtualization overhead is so low nowadays that it's not worth the risk

The entire Finance industry would like to disagree with you.

On the other hand, you don't run latency sensitive code in the cloud, you run on your own hardware. Since GCE is a cloud, no reason not to have the virtualization.


Does anyone know what the cost will be for these? AWS is quite high for the K80.


Nvidia got a massive bump in share price. I was quite sad because I sold all my shares after election downfall. I think this announcement might have caused the huge peak. Could have made 10% in one day.


They announced very good results for the quarter, i think that was the reason for this spike. Their datacenter/compute revenue grew by >150%


The large recent bump in share price happened while the market was closed between last Thursday and Friday, when they announced earnings: http://nvidianews.nvidia.com/news/nvidia-announces-financial...

The Google announcement didn't move their stock much at all.


Great news!



Mine ALL the bitcoins


I wonder what kinds of cores will be available and whether that will be visible. Optimizing your code for a particular GPU architecture can have massive performance differences, much more so than for GPUs.


9300x2s, K80s and P100s.

From the post:

> Google Cloud will offer AMD FirePro S9300 x2 that supports powerful, GPU-based remote workstations. We'll also offer NVIDIA® Tesla® P100 and K80 GPUs for deep learning, AI and HPC applications that require powerful computation and analysis. GPUs are offered in passthrough mode to provide bare metal performance. Up to 8 GPU dies can be attached per VM instance including custom machine types.

Disclosure: I work on Google Cloud and pitched in on this.


They have pictures in the post if you don't want to read the rest of the article...


Am I the only one annoyed that their "announcement" talks about something that will happen in the future?

What kind of asshole move is this? Why not just say "here, you can use it now, good luck".


Presumably you can contact them and ask for early access.


Yes, that's the goal of the survey:

> Tell us about your GPU computing requirements and sign up to be notified about GPU-related announcements using this survey. Additional information is available on our webpage.

Survey: https://goo.gl/mgEI9X Landing Page: https://cloud.google.com/gpu/ (which also links to the survey and indicates as you rightly presumed, that if you fill it in, we'll add you to our waitlist)

Disclosure: I work on Google Cloud.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: