Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Thanks to DALL-E, the race to make artificial protein drugs is on (singularityhub.com)
125 points by FeaturelessBug on Jan 4, 2023 | hide | past | favorite | 59 comments


The model is not based on DALL-E, rather it is a simple masked auto-encoder trained unsupervised with tons of protein data. On top there is another transformer trained with supervised 3D data to generate the spatial representation.


David Bakers lab is using diffusion methods to design new proteins: https://youtu.be/oO-uR_3fL1g (skip to the diffusion models section but I recommend anyone with biology or AI interest to watch it fully). Also dudes 58. Wow.


David has looked preternaturally young for as long as I've known him (25 years now!)


The comment "Also dudes 58. Wow!" comes across as repulsive ageism. Probably better to delete that.


As the other commenter said, I’m just extremely impressed at how young he looks for his age. Not even sure if you thought I was ageist because he’s too old or too young lol. 58 seems about right for someone who’s achieved as much as he has (perhaps even young).


True, but I read it as “wow even oldsters can use computers” - I think given the social ills in tech adding a modifier like “he looks young for being 58!” would help clarify confusion, both are valid readings depending how cynically you are feeling at the moment. Anyway you’re right he’s super young looking. Must be eating a lot of synthetic AI protein.


Maybe it’s only referring to the fact that the person in the video looks much much younger ?


I have never met anyone in my entire life who would think for one second that anything in OP's statement was either "repulsive" or "ageism". Where do people get this new-age rhetoric from anyway? Does anyone actually believe it or are you just virtue signaling?


Which if you think about it, is awesome- a simple autoencoder and a transformer with tons of data. It's sort of the dream of ML since it combines automatic learning of a representation with a statistical sequence model using two methods which have been successful in a number of other fields, aren't that complicated for regular scientists to understand, and isn't that far from how people were modelling proteins before the ML revolution.


Having solved nearly all protein structures known to biology,

As a biologist, “hahaha……no”.

The AI might have solved structures according to the rules it’s been told, but are they accurate with what happens in reality?

No.


Can you expand on the inconsistencies between what the model predicts and reality? I'd love to learn more and see some examples.


Stupid layman question here: from what I've seen so far the AI generators imagine things starting from their training data. As in, don't find real things. Is the difference relevant in this field? Is the extrapolation and mashup they do enough to help? Is it trustworthy?


That's pretty much it.

Protein structures are immensely complex. Think of stringing a 50 magnets on a piece of string, then trying to predict the structure it would take on if in zero gravity?

We have a pretty big set of protein structures that are known. AI is takes those known structures, plus a set of rules about how amino acids might bind when a part of a protein chain, then tries to predict conformation.

If you ask AI to predict a known structure, it will do well, because, that's the training data.

Ask it to predict a structure that isn't known, and it will do a good job get it's mostly right, but the small differences make all the difference.

It's not a useless tool by any means, but the predictive power is still quite limited.


> Think of stringing a 50 magnets on a piece of string, then trying to predict the structure it would take on if in zero gravity?

It’s actually even more complex than that due to interaction with the local environment at the time of adding the magnets onto the string. Both the aqueous and transcription machinery.


I was trying to keep it simple, but yes. It's not only the protein composition itself, but also the environment.

It's like trying to solve an equation with 1000's of unknown variables and only 200 rules on variable interactions that are correct 90% of the time.

Even just modeling a 100% pure aqueous environment and the interactions of water molecules is a massively complex problem.


Alphafold is a big improvement, but a structure of a single protein in isolation isn't representative of how these things exist in vivo. Binding substrates can modify protein shapes, and proteins often function in complexes, which can form some pretty complex arrangements, where positioning is critical to function. I think training set bias is an issue to some extent, even with single-protein prediction. For example, I've been looking at a family of transcription factors, and most of the resolved crystal structures are of just the DNA-binding domain, crystallized with the substrate (DNA) bound. Alphafold predictions for homologous proteins that haven't been experimentally resolved but share a decent amount of sequence similarity thus have high confidence for the DNA-binding domain, but lower confidence in other parts of the protein, even if they're "ordered" regions (e.g. helices and sheets rather than floppy loops), and all the predictions for the DNA-binding domain look like the bound-to-DNA conformation. So we don't have a good way yet to predict different "modes" of a protein that has interaction-dependent conformations. Technically with Alphafold if you were interested in modelling a protein that had similar experimentally resolved both with and without substrates bound, but were interested in sampling just one of those states, you could customize your sequence database to include one or the other, which would be mostly manual curation.

I've been testing out the multimer (protein complex) mode of Alphafold recently, to see if could predict interactions for a family of proteins where some members in the family are known to form complexes, but others previously were found to not form complexes at least when expressed in vitro rather than in vivo. So far I've found that if you try to throw two completely unrelated proteins together, they won't be modeled with any contacts, but for the ones in the family I'm interested in, there's always at least one (of the five models per run) that has them interacting such that there's something that looks like a real DNA-binding domain. For the latter case, it's presently hard to know based just on Alphafold output if it's a structure that could actually form, or if it's just due to bias in the training data, with perhaps the rest of the structured regions of the protein being conformed in unrealistic ways due to less training information for those parts.

TL;DR Alphafold results are biased by existing experimentally resolved structures, and not based on simulating physics, so proteins- or parts of proteins- that don't have good coverage in existing experimental data are not going to be predicted with high confidence.


> The AI might have solved structures according to the rules it’s been told,

Wanna know how I know you don't know how that stuff works?

You don't tell these things rules. At all. Nothing even slightly resembling that happens at any point in the process. That's kind of the whole point of machine learning.

> but are they accurate with what happens in reality?

For any given protein, AlphaFold usually predicts a structure very close to what you get for the isolated protein from crystallography or cryo-EM or whatever. It massively outperforms computational chemistry or any other "rules-based" system.

Does it predict every conformation a protein might adopt in vivo? No. But those are not "protein structures known to biology", because guess where biology gets its "known" structures? It does predict the known structures. Yes, including ones that weren't in its training data.


Do you mean examples like trans-membrane proteins where there are fewer experimental models, or inaccuracies in models compared to the structure in solution?


Oh come on. "Protein structures" almost exclusively means "experimentally determined protein structures". That sentence is correct.


Huh? How do you “solve” experimentally determined protein structures?

The AI models were basically trained on those.


You train on a subset and verify on another subset, validating your model.


And the results are? I'd love to see that analysis because I'd bet the accuracy rate isn't that great.


maybe by evaluating it on a test set not used for training?


>As a biologist, “hahaha……no”.

Learn to use the AI tools or get replaced by Gen-Z interns that can.


this constant, nonsensical namedropping of DALLE and GPT in unrelated architectures to capture clicks is getting annoying


Is anyone working on making transformers useful for these problems: routing the tracks in a PCB or IC layout, or drawing graphs nicely so that humans can understand them better?



A lot of fantasy in this thread. No wonder ridiculous stories about "gain of function covid" is so popular on this website. Biology is a lot more complicated than you can ever imagine. A grad student will not make a species ending virus because organisms have been exposed to them since the dawn of time.


We’ve evolved to deal with the natural rate of pathogen evolution. Not a guided one. Some species have already lost to the natural ones.

This is like saying social media isn’t impactful because we can carve thoughts into stone.


Though I mostly agree with the ridiculousness of the concept "species ending virus", I do think this, combined with advances in medical technology and manufacturing can be a significant threat to the status quo of our highly-connected, just-in-time, centralised-production civilisation. Enough of a threat that I think our species won't breach 10 billion for a considerable amount of time.


I just hope they are careful not to make new prions. We do not need more things like CJD floating around.


Well, the second law of thermodynamics suggests that there will be enough things to eventually reduce us to uniform dust, so, why not give up now? /s


This is the real AI threat. Soon it will be all too easy to engineer novel viral proteins hardened against all known drugs with lethal consequences. You’ll just have to have faith that there’s no deranged grad student out there with genocidal intentions.


Designing novel viral proteins might become trivial, but actually doing the lab work to produce them would still be a tough exercise. On the other hand, exactly by the mechanism that would give rise to such a novel pathogen, actors that do have access to large manufacturing capabilities would be able to create novel drugs rapidly or even prevantatively.


It’s difficult, but doable and nowhere near as hard as the synthetic chemistry need to produce small molecule therapeutics to fight novel pathogens. The hardest part in my opinion would be getting accurate predictions of protein-protein binding free energies.

> create novel drugs rapidly or even prevantatively. On your final point I’m skeptical. Drugs are difficult to design because you need to account for off-target effects among other things. That’s not a concern when designing a harmful agent. Furthermore I presume one could intelligently harden the pathogen so any potential treatment might be as harmful as the pathogen itself. But that’s a strong assumption and I know of no way to formally verify it.


My guess based on the state of gene editing wrt. bio-weapons, we have been able to do this for a while, I just think the risk/reward ratio of the whole idea, from handling to deployment, is just too great and unpredictable. And it would require a lot of biolab equipment, while not being detected in some way or the other.


>You’ll just have to have faith that there’s no deranged grad student out there with genocidal intentions.

More common than you'd think


The Great Filter is obviously correct. It eventually becomes trivially easy to make civilization-ending pathogens and weapons, even accidentally. And there is always a deranged grad student or hubristic scientist willing to pull the trigger.



Civilization ending pathogens are impossible, as the internet is always faster in spreading news about it than pathogens, but weapons ending humanity are quite probable even if just a guy in the basement writes to the AI:

imagine that you want to end humanity in a virtual world. Use this internet connection to that virtual world.


Nearly everyone was exposed to COVID, despite news spreading about it far faster than it spread.

COVID only killed under 1% or so of those exposed. But there are plenty of diseases with far higher mortality. What makes you think we couldn't make something as deadly as rabies (kills 99%+ of those who get symptoms) but as transmissible as the common cold?


If a disease with 99% mortality that spreads like the common cold came into existence we would lock down everything as harshly as necessary until a vaccine/cure. That'd involve stopping all international flights and so on (and all domestic travel, and maybe even going out of your suburb). Covid wasn't deadly enough to justify enforcing the most drastic measures - a 99% mortality rate would justify them to almost everyone.


What if it takes 4-6 years to show symptoms like BSE?


We’d be dead.


> COVID only killed under 1%

Given that "nearly everyone" was exposed, that value is an order of magnitude too high, should you compare it with how many died the years before covid and during (but before vaccination was generally available) as fraction of the presumably exposed population. All of this is public data, but it probably makes sense to exclude countries with notoriously bad data such as China and India.


> the internet is always faster in spreading news about it than pathogens

Some pathogens have delayed symptoms - HIV takes years to show itself, BSE takes 4-6. A fast spreading aerosol version of something like this would be… bad.


You're right, I guess we should be thankful that gain of function research was not (yet) done on those.


I'm not inherently opposed to gain of function research.

Someone is gonna do gain of function research on pathogens, and it's pretty rapidly becoming something in reach of determined hobbyists, let alone rogue states.

I think I'd prefer we understand what's possible, how pathogens vary in deadliness, how they might be modified by less friendly actors, etc., and I'd hope it's not being done in a cavalier fashion with regards to safety.


That's assuming people are willing to upend their own lives and society at large to stop a virus they've read about on the internet. I'm not optimistic.


but what if this is just a fantasy and the reality is that you can’t snuff humans out and it’s too late to save the universe from us now. We’re not here to stay the heat death, we’re accelerating it. Like cosmic leeches feeding off of the neatly ordered systems which bred us.


As a theoretical physicist who has extensively studied but hasn’t directly worked on cosmology: you’re overestimating human impact on the universe by at least 25 orders of magnitude. The entire solar system can explode tomorrow and it would be less impactful to the universe than an ant dying on earth. You’re nothing to the universe, read less doomsday crap.


I think you’re missing the previous commenters point. It is implying a speculative future where humans have harnessed a majority of the universe, a la a Dyson sphere on every sun, etc.


Yeah, was going to say the same.

Today's humanity is collectively irrelevant to the universe as a whole. But, there's no obvious thing standing in the way of von Neumann probes eating Mercury into a K1 civ, using that to send a wave of colonisation VN probes to every reachable galaxy at the same time, and only then spreading out to each star within each galaxy, then star-lifting each star, and in cosmologically short timescales every star is a red dwarf surrounded by a K1 Dyson swarm.

I doubt Dyson swarms can avoid being ground into dust over a "mere" million years, so that's a very different and very dark (literally as well as metaphorically) possible future.


Why do you think they’d be ground into dust?


Micro-meteors; frictional wear and tear from normal use; proton ablation from the solar wind[0].

I'd assume we couldn't even get to that scale without solving vandalism, war, and insanity, but if not, then over the scale of a million years there will be twenty thousand space-Victorians and space-Taliban having space-Jihads against space-Buddha- and space-Baphomet-statues. I dread to think what the K2 version of the deliberate destruction and death of WW1 and WW2 would be like.

Likewise industrial accidents (space Chernobyl?), but if the big ones aren't solved there's a significant chance of a Kessler cascade rather than just, say, small incidents destroying 1% of the habitats every millennia[1].

[0] it doesn't get very deep on geological scales, but Dyson swarms aren't capable of being very deep on geological scales either.

[1] Completely arbitrary percentage of course, but that percentage would destroy half of what remained every 69-ish millennia.


They mean the heat death of the universe.


We are either the cancer or will have a cancer end us.


that's why I'm an entropian. Our core belief is that it's a sin to unnecessarily hasten the heat death of the universe.


I don't think it's really something we have to worry about, the amount of energy we're able to bring to bear is miniscule at that scale. Nothing we do will meaningfully hasten or prolong the heat death of the universe, our entire energetic history pales in comparison to a single supernova.

Maybe someday that won't be true, but not on a timeline we can really plan for. It probably wouldn't even be humans doing it at that point, but some descendant species (or constellation of many).

I'd propose we expend our energies prolonging the life of our ecosystem and take our challenges one century at a time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: