GPUs for Google Cloud Platform

dkobran · on Nov 15, 2016

Kudos to Google for making moves here. Having spent the last year+ tackling GPUs in the datacenter, super curious how custom sizing works. It's a huge technical feat to get eight GPUs running (let alone, in a virtual environment), but the real challenge is making sure the blocks/puzzle pieces all fit together so there's no idle hardware sitting around There's a reason why Amazon's G/P instances require that you double the RAM/CPU if you double the GPU. Another example would be Digital Ocean's linear scale-up of instance types. In any case, we'll have to see what pricing comes out to.

Shameless plug, if you want raw access to a GPU in the cloud today, shoot me an email at daniel at paperspace.com We have people doing everything from image analysis to genomics to a whole lot of ML/AI.

Veen · on Nov 16, 2016

Paperspace looks pretty awesome. Given the exorbitant expense of the new MacBook Pros, I'm thinking of getting a iPad Pro instead, except that you still can't really do dev work conveniently on an iPad.

This looks like it might be a cool solution to that problem. Use iOS most of the time (I'm a writer) and the Paperspace VM on the iPad for coding (I'm also a wannabe developer).

Only fly in the ointment is that it's not available in Europe, and I'm worried about latency over a 4g connection.

lima · on Nov 16, 2016

For $25/month, just rent a dedicated server in Europe (or even just put in your basement if your network connection is sufficiently sized). RDP with HyperV has everything you need for that sort of usage (RemoteFX).

For coding, you don't even need a GPU. Just use RDP. Big companies use this all the time (thin clients + central server farm, also called VDI).

Veen · on Nov 16, 2016

I tend to prefer Linux on the server, and I've tried the iPad + tmux + vim route on a remote server and the experience hasn't been great. It's usually latency problems that trip me up. I travel a lot and rely on tethering to my phone's 4g connection, which mostly works but can be quite frustrating.

adventureloop · on Nov 16, 2016

Living permanently on 4G I couldn't get by without mosh. I have managed to do weeks of work with most+tmux+vim at times when ssh was too painful to use.

nwrk · on Nov 18, 2016

On mosh's note. It's great experience for slow connection. Totally agree. There is new ios client recently released client with mosh support build-in [1]. Happy user

[1] http://www.blink.sh/

lima · on Nov 17, 2016

But Paperspace is Windows only.

If you want Linux remoting, I recommend either x11vnc or xrdp.

habosa · on Nov 16, 2016

Paperspace looks awesome, but I found the pricing on the homepage misleading.

It says in big font "Starting at $5/month". Then I clicked "Pricing" and the cheapest I see is "$15/month". What's the $5 plan?

Macuyiko · on Nov 16, 2016

Maybe the $5 means hourly usage (which has a $5 fixed monthly storage cost) and then not actually using it? Strange indeed, but looks very interesting!

amelius · on Nov 16, 2016

Does paperspace suffer from network latency?

udkl · on Nov 18, 2016

Yes and no. There was minor lag ... enough to be a slight nuisance when I was connected over wifi...

There was 0 lag when connected directly over the ethernet. It was seamless.

Though I should add that the paperspace server I was connecting to was located less than 50 miles from my laptop.

rosstex · on Nov 16, 2016

>Your password is too weak

This better be worth it! :P

jnthn · on Nov 16, 2016

Would you also have anything available in Europe?

timdorr · on Nov 15, 2016

> Google Cloud GPUs give you the flexibility to mix and match infrastructure. You’ll be able to attach up to 8 GPU dies to any non-shared-core machine...

Wow, that's impressive. One thing I've loved about GCE has been the custom sizing. This takes it even further, so we don't have to buy what we don't need.

Looking forward to seeing the pricing on this. Looks like they're going to heavily compete with AWS on this stuff.

boulos · on Nov 16, 2016

Yes, I foresee a lot of 8 vCPUs VMs with 8 giant GPU dies attached to them. And that's just fine by us!

Disclosure: I work on Google Cloud (and want to sell you some GPUs!)

MichaelRenor · on Nov 15, 2016

One of the HUGE advanatages of GCE/AWS is that they will gobble up 100% of the unused resources for their own computation. Nothing is wasted, and the machines basically pay for themselves.

Compare this was something like oracle, which simply can't consume the unused resources in order to discount the hardware effectively. They can't beat GCE/AWS at the cloud game until this changes.

matt4077 · on Nov 16, 2016

I'm not sure if Amazon has enough batch-processing type work to make this effective. You can't run amazon.com with these resources because their needs are inflexible.

Actually not sure about Google either – their preemtible instances are sold at an 80% discount, so that puts an upper bound on the usefulness of these machines for them.

MichaelRenor · on Nov 16, 2016

Amazons largest "batch" job is their holiday shopping season. They must purchase the horsepower to get them through that season, and then the rest of the year they can resell it. I believe this was the original motivation for AWS.

prodigal_erik · on Nov 16, 2016

A coauthor of the original proposal for AWS denies that:

http://www.networkworld.com/article/2891297/cloud-computing/...

dragonwriter · on Nov 16, 2016

If it was, they'd have to shutdown AWS for the holidays.

syllogism · on Nov 16, 2016

It's really not about the platform being a "buyer of last resort" of the unused capacity. Think about the narrow band of values you need for that to work. Google must value these compute jobs highly enough to have them offset the unused capacity...But they can't value them so highly that they're worth buying hardware to make sure they run. And this has to remain true, even as the hardware gets much cheaper?

Here's how I think AWS and GCE work...

Imagine you buy some hardware. You have n=1 users, and so the variance on your compute needs is very high. You buy for peak use, so your median utilisation is low.

Now imagine you have n=10,000,000 users. Again, you stock for peak use...But the variance is now much much lower. Your median utilisation is now much better.

dekhn · on Nov 16, 2016

I built a system at google called Exacycle which used idle cycles for computation. You can think of my system as the user of last resort. We ran MD simulations, drug docking, and ray tracing simulations of the LSST telescope (along with fundamental mathematical problems), then published the results from this for free. The economics of idle cycles in a large production environment aren't trivial to analyze, and anybody outside of Google speculating is likely to make a mistake.

manigandham · on Nov 17, 2016

Isnt this what folding@home and similar services have been doing for decades now?

dekhn · on Nov 17, 2016

That's correct. in fact, we were the largest (in terms of total simulation seconds produced) provider of resources to Folding@Home and worked with Vijay Pande closely on this effort.

vgt · on Nov 16, 2016

I think there's a better way to think about economies of scale (looking at it through the lens of Google Cloud).

Let's quantify your workload as "resource-seconds", what many of Google's services (and some AWS services, like Lambda) allow you to do is maximize resources and minimize seconds (assuming efficient parallelization of workload, which is afforded by Google's super-awesome Networking). Cost is the same to you, but you get your results faster.

This is afforded by the scale and by the multi-tenant nature of underlying services. This model can only work when there's lots of folks using such a service.

(work on Google Cloud Big Data)

reacharavindh · on Nov 15, 2016

Did you mean to say GCE/AWS could gobble up "unpaid for" resources for their own computation?

Curious because if I pay for using a P100 for 2 hours - it is 100% available to me and not virtualized of any sort right?

However, if I use it only for 2 hours that day, and no one else were ready and waiting to buy the time on those, then Google uses it internally until...

boulos · on Nov 15, 2016

From our post:

> GPUs are offered in passthrough mode to provide bare metal performance. Up to 8 GPU dies can be attached per VM instance including custom machine types.

When you're using it, it's 100% yours.

Disclosure: I work on Google Cloud (and pitched in on this!)

alberth · on Nov 15, 2016

Can you explain this in more detail about GCE/AWS bs Oracle.

What "unused resources" are used for "their own compitation"?

How does this "pay for itself"?

ddorian43 · on Nov 15, 2016

He's saying they have their own apps (like google search) that can use vps that aren't currently rented to do their own computations (like pagerank etc).

honkhonkpants · on Nov 16, 2016

I'm curious to know what workloads you think Google is running on the same hardware that cloud customers are using.

jfoutz · on Nov 16, 2016

What an odd question. Seems like training a new voice recognition model, or image classifier, or any of a zillion other research projects google is working on would be perfect for any unused capacity.

I've never worked at google. Do they just let each team go with their own crazy hardware setup? Or maybe two separate systems, one internal and one external? That seems a little wasteful (they have the money, so whatever).

But wouldn't programming against a big unified api be a bit more, well, sane? I can see really sensitive stuff locked away on its own hardware, but the search front end? why not serve some JS or provide chrome downloads from a giant pool of common hardware?

Things grow organically, and they might be technically locked into a specific layout right now, that's totally understandable. But, unused capacity is just capital depreciating steadily away with no gain. Soaking up every spare cycle doing something useful should probably be somewhere in the company goals.

hueving · on Nov 16, 2016

It's not odd from a security perspective. There is a reason the CIA pays massive amounts of money to Amazon for a completely separate set of hardware from the public. Hypervisors aren't perfect and you might not want customers to be one kvm exploit away from perusing someone's Gmail being processed on the same machine.

honkhonkpants · on Nov 16, 2016

Like I said, I'm just curious. Google has so much capacity that I'm not sure it would be important for their bottom line to move it between Google and customer workloads in quanta smaller than one physical machine. Your theory also requires Google to have a substantial standby workload that is currently not scheduled, doesn't it?

jfoutz · on Nov 16, 2016

In my very limited experience, trying to answer research questions can soak up all the computation you can throw at it. Consider something like the traveling salesman problem. The only answer is to check every path. There are heuristics to get good answers, but you never know if they're optimal.

You can get an answer on your laptop in an hour. You'll probably get a better answer throwing 1000 hours at your algorithm though, and a better answer throwing 10000 hours.

You can do a lot of neat things if p=np. Since no one knows if that's true, we're stuck with bad big O. More computer time is the only way out right now.

The point is, I'd be disappointed in google researchers if they didn't have a huge amount of unscheduled workload.

hueving · on Nov 16, 2016

There are some questions though where the saved energy from not running the batch jobs is worth more than the answer (e.g. searching for the next prime).

dekhn · on Nov 16, 2016

We (google) ran numerous theory problems on Google's computing platform via the Exacycle program. Peter Norvig convinced me this was a bad idea, but mainly to say that time was better spent on disproving the conjecture using pure math, not search. http://norvig.com/beal.html

"But Witold Jarnicki and David Konerding already did that: they wrote a C++ program that built a table of Cz(modp)Cz(modp) up to 5000500050005000 , and, in parallel across thousands of machines, searched for A,BA,B up to 200,000 and x,yx,y up to 5,000, but found no counterexamples. On a smaller scale, Edwin P. Berlin Jr. searched all CzCz up to 10171017 and also found nothing. So I don't think it is worthwhile to continue on that path."

MichaelRenor · on Nov 16, 2016

All kinds of CPU intensive work. Isn't that most work?

hueving · on Nov 16, 2016

No, a lot of work is IO bound. That's why M:N threading (e.g. Go's coroutines, python's gevent) is so popular.

packetslave · on Nov 16, 2016

All of them.

gruez · on Nov 16, 2016

source?

slizard · on Nov 15, 2016

Kudos for Google and happy to see that at least in principle AMD is still an option.

I wonder what kind of device driver does GCE use with AMD, the new ROCm?

What about Power8 + NVLink harware? Does anybody know if the current NVIDIA GPUs, in particular the P100s are all on x86?

boulos · on Nov 16, 2016

I'll just answer your driver question: you bring whatever you want. Our VMs just boot off of bytes you have stored in your image (which is in turn just stored on GCS). If those bytes happen to be Debian 8 with some AMD driver that supports the 9300, awesome ;).

We will be working with both vendors to make sure we highlight which drivers work most reliably on our stack though (virtualization + GPUs is way too rare!). We want to work closely with both vendors to do the qualification, so you at least know what is known good.

Disclosure: I work on Google Cloud.

slizard · on Nov 16, 2016

Thanks for the explanation, that's what I guessed -- which has both good & bad aspects. The good is the flexibility, the bad is that:

> If those bytes happen to be Debian 8 with some AMD driver that supports the 9300, awesome ;).

happens to support might alone not be enough given how much difficulties AMD has been having. I just hope that there will be enough customer demand for their SPFlop cruncher GPUs for interested parties -- including Google -- to pitch and contribute to their ROCm stack.

puzzle · on Nov 16, 2016

When is Kubernetes support going to be added? Right now it has it only in a very basic form for NVIDIA GPUs.

eudoxus · on Nov 16, 2016

There is a PR[1] to extend the current alpha GPU implentation. Hopefully it will get merged soon.

[1] - https://github.com/kubernetes/kubernetes/pull/28216

puzzle · on Nov 16, 2016

That's still NVIDIA-only, plus you can't even tell a P100 from a K80 for scheduling purposes.

nanliu · on Nov 18, 2016

This is the reason node-feature-discovery exists. Kubernetes needs an extensible way to detect and label hardware features: See: https://github.com/kubernetes-incubator/node-feature-discove...

boulos · on Nov 16, 2016

I'd just use a separate NodePool and match your jobs using a selector as appropriate.

puzzle · on Nov 16, 2016

So, is gci going to come with the right drivers?

boxerab · on Nov 15, 2016

Very very happy to finally see AMD GPUs in cloud.

wyldfire · on Nov 15, 2016

I love to root for an underdog. But I've been pulling for AMD for years and they just can't get their linux support to the same level of NVIDIA's. I speculate that they have undiagnosed design errors in their driver.

boxerab · on Nov 15, 2016

You should check out ROCm

http://hothardware.com/news/amd-rocm-13-adds-support-for-pol...

They are revamping their entire linux stack for HPC, heterogenous compute.

wyldfire · on Nov 16, 2016

Fool me several times, shame on me.... ;) Really -- ever since the ATI acquisition I thought AMD could make it happen. And it's true that they have come a long way since then.

"New Linux Driver and Runtime Stack optimized for HPC & Ultra-scale class computing" sounds great, let's hope this time they can do it.

boxerab · on Nov 16, 2016

:) I've only been deeply involved with compute for the past two years, so I am still an optimist. I think AMD pretty much has no choice but to get it right this time: they need server and HPC dough to stay afloat.

slizard · on Nov 15, 2016

Still no OpenCL, still only promised as a "preview" for mid Dec.

I'm still very worried that AMD will fail, not because of the hardware, but due to their lacking software stack.

eudoxus · on Nov 15, 2016

This amazing!!! First cloud provider to have P100! Amazing opportunities ahead with compute power like that.

eDameXxX · on Nov 15, 2016

What about this[1]?

"GPU-Accelerated Microsoft Cognitive Toolkit Now Available in the Cloud on Microsoft Azure and On-Premises with NVIDIA DGX-1"

DGX-1 is powered by 8 x P100

[1]http://nvidianews.nvidia.com/news/nvidia-and-microsoft-accel...

thesandlord · on Nov 15, 2016

AFAIK the DGX-1 is for running on premise, not on the cloud

science404 · on Nov 15, 2016

They aren't the first..

https://www.nimbix.net/blog/2016/10/04/ibm-nvidia-powerful-g...

eudoxus · on Nov 16, 2016

Didn't know the added those recently. I knew that had Titan X's and K80s. This is awesome too.

Nice to have a full cloud provider with a lengthy feature set AND P100s though.

redbeard0x0a · on Nov 16, 2016

First with the P100, lets see if they can be the first with full IPv6 support at the instance and/or container level! (I'm not holding my breath)

fulafel · on Nov 16, 2016

What's the assurance like regarding security against other concurrent users on the same hardware? Historically multitenancy with GPUs has been quite iffy and not much security research around, even if there theoretically are IOMMU's.

boulos · on Nov 16, 2016

We care a lot about security, that's part of the motivation for passthrough instead of any of the "split up a GPU". So as I quoted above from our post:

> GPUs are offered in passthrough mode to provide bare metal performance. Up to 8 GPU dies can be attached per VM instance including custom machine types.

So when your VM boots up, the GPUs you attach will not be shared with others. I agree that the state of GPU virtualization is not yet where it needs to be to make me personally comfortable enough.

Disclosure: I work on Google Cloud (and worked on this a little).

kozikow · on Nov 16, 2016

Now it would be great if kubernetes on GKE would work nicely with GPUs. It's still in the works: https://github.com/kubernetes/kubernetes/blob/master/docs/pr... .

otto_ortega · on Nov 16, 2016

Awesome news! The Tesla P100 is a monster, this will push ML development to new heights.

AlexCoventry · on Nov 15, 2016

Is there any public access to the TPUs?

boulos · on Nov 16, 2016

Not direct access, yet (they're not sufficiently programmable and isolatable). But as mikecb alludes to, we have a number of services like Translate, Vision, Speech, and a few others in Alpha that may be trained using TPUs (see https://cloud.google.com/ml for more).

For models you write yourself and have the Cloud ML service train, they will use our regular VMs today and as this blog post says, GCE will soon have GPUs (and therefore, Cloud ML will expose them, too).

Disclsoure: I work on Google Cloud (and sorry for the confusion!)

mikecb · on Nov 15, 2016

If you use the cloud ml service.

mnbbrown · on Nov 16, 2016

Really? I was under the impression they just use standard compute instances to power Cloud ML?

kesor · on Nov 15, 2016

This happened some months ago ... how does it compare? Anyone in the know can pitch in on a short comparison?

https://aws.amazon.com/about-aws/whats-new/2016/09/introduci...

wmf · on Nov 16, 2016

Amazon: up to 16 K80s and ~700 GB of RAM

Google: up to 8 K80s and ~200 GB of RAM but much more flexible

dylanz · on Nov 16, 2016

There are a lot of excited posts here about this announcement! For someone that doesn't use GPU's in everyday life, can someone explain why this is great and maybe touch on the current landscape around GPU usage and the cost landscape?

slizard · on Nov 16, 2016

The excitement comes from the fact that some workloads benefit greatly from GPUs -- mostly those where the GPU arch offers some niche-advantage that can give GPUs >=10x perf[/W]. A good example is deep learning where Intel is struggling to compete [1], and even in linear algebra there is a 4-5x power efficiency difference [2, slide 6].

At the same time, in many other complex applications the performance/W/$ advantage has not been all that clear, especially since GPUs have been quite behind in process technology. A good example is a simulation code I work on where a 2-4x speedup from GPUs, which (due to inherent overhead of using accelerators) gradually vanishes in strong scaling, means that when price is considered, only cheap consumer cards can imrpove the performance/buck metric significantly (that's time-to-solution/buck in our case), professional cards don't offer significant significant improvements other than performance density [3, figures 6,7].

There are plenty more examples that NVIDIA collects [4], but always take the claims with a grain of salt, especially the ones with "incredible" >10x speedup claims :)

Last, I'd note that with the recent 14-16nm jump, the gap between the traditional CPUs and the simpler accelerator architectures has increased (I've just seen MAGMA BLAS on Tesla P100 results which show >10x GFlops/W, more than double that of the previous arch) and I expect it to keep increasing partly due to the manufacturing/process technology gap shrinking between Intel and the rest and partly due to architecture and programmability improvements of accelerators.

[1] https://www.nvidia.com/object/gpu-accelerated-applications-t... [2] http://on-demand.gputechconf.com/gtc/2015/presentation/S5476... [3] https://www.academia.edu/13753737/Best_bang_for_your_buck_GP... [4] https://www.nvidia.com/object/gpu-applications.html

mattlondon · on Nov 16, 2016

GPUs for certain workloads are much more powerful than CPUs, meaning you can get stuff done much quicker.

Machine Learning is one such field, batch-rendering for CG studios etc is another.

You could throw 100 CPUs at something that perhaps 1 GPU might do in the same time (made up numbers but it gives you an idea of why people want it)

alecco · on Nov 16, 2016

Is it possible to have a non-shared machine? Is it virtualized anyway?

boulos · on Nov 16, 2016

Even if we had a "You get this box all for you", we'd still run our hypervisor on it. We wouldn't give someone a raw, unfettered machine in our datacenter (and I hope you wouldn't either!).

What is your concern with virtualization in this context?

Disclosure: I work on Google Cloud (and for a long time specifically Compute Engine).

pktgen · on Nov 16, 2016

> We wouldn't give someone a raw, unfettered machine in our datacenter (and I hope you wouldn't either!).

I think you're alluding to security issues with renting bare metal hardware to potentially untrusted customers, right? I've often wondered about the security implications of that - think backdooring BIOS/UEFI or other component firmware. But there are many large providers that rent dedicated hardware without a hypervisor. Are they just ignoring these concerns, hoping that this kind of tampering would be rare anyway?

> What is your concern with virtualization in this context?

I'm not the parent poster, but performance was probably the concern. There's still some (marginal, these days) overhead with virtualization in general, and VT-d specifically.

boulos · on Nov 16, 2016

From our perspective: yes, we don't feel comfortable letting untrusted users run side by side without a sandbox.

In this context though of GPU compute, you are just talking sporadically to a PCIe attached device directly. There isn't any virtualization or sharing of the device. And yes, I assumed that the complaint would be performance, but especially with just a pass through GPU, it's going to be honestly funny to find the right "oh wow this is 20% slower because virtualization specifically".

pktgen · on Nov 16, 2016

> From our perspective: yes, we don't feel comfortable letting untrusted users run side by side without a sandbox.

That's what I figured. I still wonder what other large providers (SoftLayer, OVH, etc.) do about this. Unfortunately, I suspect the answer is "nothing, we just hope/assume nothing will happen."

> In this context though of GPU compute, you are just talking sporadically to a PCIe attached device directly.

I know; I'm familiar with passthrough. It's not completely "free," but the overhead is very minor. I don't think anyone will be able to complain with a straight face about its performance. :P

lima · on Nov 16, 2016

> But there are many large providers that rent dedicated hardware without a hypervisor. Are they just ignoring these concerns, hoping that this kind of tampering would be rare anyway?

Yes. You can try to mitigate the risk a bit, but it's still there. Just think of all the other firmware you can update, not just UEFI.

Also, for large contracts, rented servers are usually bought new and not re-used for multiple reasons (this is why providers have used server markets!).

Virtualization overhead is so low nowadays that it's not worth the risk. Provisioning is much harder, too.

brianwawok · on Nov 16, 2016

> Virtualization overhead is so low nowadays that it's not worth the risk

The entire Finance industry would like to disagree with you.

On the other hand, you don't run latency sensitive code in the cloud, you run on your own hardware. Since GCE is a cloud, no reason not to have the virtualization.

shaklee3 · on Nov 16, 2016

Does anyone know what the cost will be for these? AWS is quite high for the K80.

nojvek · on Nov 16, 2016

Nvidia got a massive bump in share price. I was quite sad because I sold all my shares after election downfall. I think this announcement might have caused the huge peak. Could have made 10% in one day.

kayoone · on Nov 16, 2016

They announced very good results for the quarter, i think that was the reason for this spike. Their datacenter/compute revenue grew by >150%

_v37c · on Nov 16, 2016

The large recent bump in share price happened while the market was closed between last Thursday and Friday, when they announced earnings: http://nvidianews.nvidia.com/news/nvidia-announces-financial...

The Google announcement didn't move their stock much at all.

n00b101 · on Nov 15, 2016

Great news!

eDameXxX · on Nov 15, 2016

Similiar: http://nvidianews.nvidia.com/news/nvidia-and-microsoft-accel...

jaspervdmeer · on Nov 16, 2016

Mine ALL the bitcoins

largote · on Nov 15, 2016

I wonder what kinds of cores will be available and whether that will be visible. Optimizing your code for a particular GPU architecture can have massive performance differences, much more so than for GPUs.

boulos · on Nov 16, 2016

9300x2s, K80s and P100s.

From the post:

> Google Cloud will offer AMD FirePro S9300 x2 that supports powerful, GPU-based remote workstations. We'll also offer NVIDIA® Tesla® P100 and K80 GPUs for deep learning, AI and HPC applications that require powerful computation and analysis. GPUs are offered in passthrough mode to provide bare metal performance. Up to 8 GPU dies can be attached per VM instance including custom machine types.

Disclosure: I work on Google Cloud and pitched in on this.

alimbada · on Nov 16, 2016

They have pictures in the post if you don't want to read the rest of the article...

kesor · on Nov 15, 2016

Am I the only one annoyed that their "announcement" talks about something that will happen in the future?

What kind of asshole move is this? Why not just say "here, you can use it now, good luck".

puzzle · on Nov 16, 2016

Presumably you can contact them and ask for early access.

boulos · on Nov 16, 2016

Yes, that's the goal of the survey:

> Tell us about your GPU computing requirements and sign up to be notified about GPU-related announcements using this survey. Additional information is available on our webpage.

Survey: https://goo.gl/mgEI9X Landing Page: https://cloud.google.com/gpu/ (which also links to the survey and indicates as you rightly presumed, that if you fill it in, we'll add you to our waitlist)

Disclosure: I work on Google Cloud.