Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Intellectually, I don't like this approach.

Predicting the end-result from the sequence of protein directly is prone to miss any new phenomenon and would just regurgitate/interpolate the training datasets.

I would much prefer an approach based on first principles.

In theory folding is easy, it's just running a simulation of your protein surrounded by some water molecules for the same number of nano-seconds nature do.

The problem is that usually this take a long time because evolving a system needs to compute the energy of the system as a position of the atoms which is a complex problem involving Quantum Mechanics. It's mostly due to the behavior of the electrons, but because they are much lighter they operate on a faster timescale. You typically don't care about them, only the effect they have on your atoms.

In the past, you would use various Lennard-Jones potentials for pairs of atoms when the pair of atoms are unbounded, and other potentials when they are bonded and it would get very complex very quickly. But now there are deep-learning based approach to compute the energy of the system by using a neural network. (See (Gromacs) Neural Network Potentials https://rowansci.com/publications/introduction-to-nnps ). So you train these networks so that they learn the local interactions between atoms based on trajectories generated from ab-initio theories. This allows you to have a faster simulator which approximate the more complex physics. It's in a sort just tabulating using a neural network the effect of the electrons would have in a specific atom arrangements according to the theory you have chosen.

At any time if you have some doubt, you can always run the slower simulator in the small local neighborhood to check that the effective field neural network approximation holds.

Only then once you have your simulator which is able to fold, you can generate some dataset of pairs "sequence of protein" to "end of trajectory", to learn the shortcut like Alpha/Simple/Fold do. And when in doubt you can go back to the slower more precise method.

If you had enough data and can train perfectly a model with sufficient representation power, you could theoretically infer the correct physics just from the correspondence initial to final arrangements. But if you don't have enough data it will just learn some shortcut and accept that it will be wrong some times.





> it's just running a simulation of your protein surrounded by some water molecules for the same number of nano-seconds nature do.

No, the environment is important. Also, some proteins fold while being sequenced.

Folding can also take minutes in some cases, which is the real problem.

> which is a complex problem involving Quantum Mechanics

Most MD simulations use classical approximations, and I don't see why folding is any different.


Being able to quantify the importance of the environment is one advantage of using a simulator based approach. You know what's happening, and you can simulate other environments by adding the relevant molecules around.

Speeding-up the folding is not the real problem, knowing what happen is. One way to speed-up the process is just to minimize the free-energy of the configuration (or some other quantity you derive from the neural network vector potential). (That's what the game fold-it was about : minimizing the Rosetta energy function). An other way would be to just use generative method like diffusion model to generate a plausible full trajectory (but you need some training dataset to bootstrap the process). Or work with key-configuration frames. The simulation can take a long time but it goes through specific arrangements (the transitions between energy plateau), and you learn these key points.

The simulator can also be much faster because it doesn't have to consider all the pair of atom arrangements (n^2 behavior if you are naive) into O(n) with n the number of atoms (with the bigger constant which is running the neural network hidden inside the O notation).

The simulations are classical but fundamentally they rely on the shape of the electron clouds. The electron density can deform (that's what bonding is), providing additional degrees of liberty, allowing the atom configuration to slide more easily against itself and avoid getting stuck in local optimum. Fortunately all this mess is nicely encapsulated inside the neural network potential and we can work without worrying about the electrons, their shape being implicitly defined by the current position of the atoms (using the implicit function theorem make abstracting their behaviour sound because of the faster timescales).


No, this is all basically wrong.

Potential != free energy. Entropy is a driving force behind folding.

> The simulations are classical but fundamentally they rely on the shape of the electron clouds.

This is not what is meant by classical




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: