Intelligence as an instrument of discovery

Most of the problems we care about in science have an awkward shape. The governing law is known — Anfinsen's hypothesis fixes a protein's fold, quantum mechanics fixes a material's energy, the Navier–Stokes equations fix tomorrow's weather — yet evaluating that law at the scales that matter is hopeless. Direct simulation pays the full price of the physics at every point in space and time, and for the questions worth asking that price is simply unpayable. So the field that owns the problem learns to wait, and progress is rationed by the cost of computing the answer one configuration at a time.

RMH Deeplink's AI-for-science program is built on the observation that there is almost always a second copy of the same law lying around, recorded implicitly in data. Evolution has already run billions of folding experiments; quantum chemistry databases hold millions of solved configurations; decades of reanalysis encode the atmosphere's dynamics. A model trained on that data can become a fast, differentiable stand-in for the expensive map — and once you have such a stand-in, the science changes character.

The same pattern, again and again

10⁶⁰

Plausible drug-like molecules — a search space no laboratory can scan, but a learned surrogate can rank

10×

Expansion of the catalog of predicted stable inorganic crystals beyond the experimentally known set

seconds

What once took supercomputer-hours, on problems from folding to forecasting, now runs on an accelerator

Why a learned model helps

A surrogate is not just a faster simulator. Once the forward map from cause to effect lives inside a differentiable network, three things become possible that brute-force computation never offered.

Acceleration — Computations that consumed supercomputer-hours collapse to accelerator-seconds. The change is not only in cost but in kind: you can ask questions whose answers require a million evaluations, because a million evaluations are now cheap.
Inversion — Because the model has gradients, the map can be run backward. From a desired property to a candidate molecule, from a target structure to a sequence that folds to it, from a goal to a design. Prediction quietly becomes generation.
Discovery — The representation a model learns, when interrogated, reveals structure no one put in by hand — regularities and couplings that suggest hypotheses a scientist can then go and test.

The leverage in every case comes from exploiting structure that simulation ignores. Symmetry, locality, sparsity, compositionality — when the architecture encodes the symmetries the physics already obeys and learns only the residual, an intractable problem becomes tractable. An equivariant network that knows energy is invariant to rotating a crystal, or that a protein's geometry is relative rather than absolute, does not waste data and capacity rediscovering those facts. That single principle recurs from molecular biology to plasma control.

Choosing targets where the answer is checkable

Pointing this instrument well is a discipline of its own. The problems where surrogates succeed share a profile: a vast configuration space, a forward map that is expensive but well-defined, data that samples that map richly even if imperfectly, and exploitable structure an architecture can encode. Just as decisive is the verification layer. A prediction is a hypothesis, and an instrument earns trust only when its output can be checked against ground truth — a synthesized crystal, a structure resolved by cryo-EM, a proof a kernel accepts, a forecast scored against tomorrow's weather, a controller run on a real tokamak. We design every program around a tight loop between in-silico proposal and real-world validation, and report results in the units the domain respects: hit-rates, confirmed structures, theorems proved.

Abstract visualization of a learned model mapping a vast configuration space

An instrument that cannot be calibrated against reality is not an instrument. It is a guess.

Calibration, uncertainty, and the integrity of the science

As learned models take on more of the scientific loop — proposing the experiment, reading the result, suggesting the next move — the integrity of the science comes to rest on honest uncertainty. A surrogate that is confidently wrong outside its training distribution does not merely waste laboratory time; it actively misleads. This is why a calibrated confidence output is not cosmetic. It is what lets a structure prediction function as evidence rather than an assertion: a biologist can trust the confidently predicted core of a fold, treat the low-confidence loops as genuinely disordered, and plan experiments accordingly.

So we invest heavily in uncertainty quantification — ensembles, calibrated confidence, out-of-distribution detection — so that a model knows, and says, when it is extrapolating beyond what it has learned. Active learning then closes the loop in the other direction: where the model is most uncertain becomes the signal directing the next real experiment, so that costly empirical effort is spent precisely where it most reduces ignorance. Done well, the model's humility stops being a limitation and becomes the engine of discovery.

A closed loop between in-silico proposal and real-world experimental validation — The instrument is only as good as the loop around it: every program is built so that in-silico proposals are validated in the wet lab, the kernel, or the real apparatus — and the result flows back to update the model.

The portfolio

The same engine, aimed at different grand challenges, produces a research program that spans much of modern science. The applications sit at very different stages of maturity — some already routine infrastructure, others early and unproven — but each follows the same arc: match human or classical capability, then exceed it on speed and scale, then unlock the inverse-design and discovery modes that have no classical analog, and finally become a tool the field reorganizes around.

Biology & medicine — Structure prediction for proteins, complexes, and their dynamics; de novo design as the inverse problem; variant-effect models and a path toward the virtual cell. Explore life sciences →
Materials & chemistry — Equivariant interatomic potentials at near-quantum accuracy, generative crystal proposal, retrosynthesis, and the closed-loop autonomous laboratory.
Mathematics — Machine-checkable proving grounds where contamination is controllable and the gap between plausible and correct is unforgiving. Explore mathematics →
Fusion — Learned plasma models and controllers that hold a confinement state on real tokamaks, where every prediction is validated against the apparatus itself.
Climate & the Earth system — Fast learned forecasters that turn ensemble weather prediction from a supercomputing task into a routine one. Explore climate & energy →

What unites them is not a shared codebase but a shared shape of problem — and a conviction that the frontier itself is moving. The proteins we could not fold are folding; the materials we could not find are being found; the weather we could not forecast fast we now forecast in seconds.

Open questions

How far can the inverse-design mode be pushed before the surrogate's errors compound — and how do we detect that boundary before it costs a laboratory campaign?
What is the right way to certify a model's uncertainty so that out-of-distribution failure is reported rather than silently passed downstream?
As autonomous research loops design and run their own experiments, how do we keep verification scaling faster than generation, so trust never lags capability?
Which scientific domains have the structure — vast space, defined map, rich data, exploitable symmetry — that makes them the next good targets, and which only look that way?
When a learned instrument suggests a hypothesis no human proposed, what standard of evidence should govern acting on it?