33 Zeroes: Verseon Advances AI Accuracy for Novel Small Molecule Designs
CEO and founder Adityo Prakash explains the limits of current AI-based drug discovery and how new tech can unlock seemingly infinite chemical space
By Jonathan D. Grinstein, PhD September 12, 2024
Credit: Rost-9D / iStock / Getty Images Plus

Adityo Prakash was pursuing a PhD in mathematical physics to become an academic in the early 1990s, but his dream was dashed when the Cold War ended. Suddenly, funding for esoteric fields like mathematical physics dried up, and many people preparing to be the next generation of academic physicists and mathematicians went into exile, looking for new opportunities.

“The people I knew that should have stayed and become the replacements for the very best researchers out there were all leaving going to Wall Street doing quantitative modeling and building financial weapons of mass destruction,” Prakash told Inside Precision Medicine. “I would get calls from them all the time saying, ‘What the heck are you doing in graduate school? We have the perfect job for you.’ However, I have never had an interest in ‘quant’ modeling. That always seemed like you were trying to figure out fancy ways of taking money from grandmother’s purses.”

While moving apartments, Prakash was struck by why television and the internet had yet to marry, allowing people to access content on demand in their living rooms. So he left academia and headed to Silicon Valley, where he founded his first company, Pulsent Corporation, and developed next-generation video compression processing technology.

After selling Pulsent in 2002, Prakash became intrigued by the emerging convergence of technology and drug discovery. While it may appear to be a leap from video compression processing to biotech, a direct line between the two, linked by physics and math, points to the modeling of protein-drug interactions. The result was the launch of Verseon, which has had a much slower and longer path to success.

Recently, Verseon’s AI technology, VersAI, has taken a significant step, reaching benchmark test results critical to Prakash’s vision for small molecule discovery. The advance is vital to creating and utilizing a vast theoretical chemical space many orders of magnitude larger than today’s catalogs to create novel small molecules with unique therapeutic properties.

“Ultimately, what matters is, are you coming up with completely novel drugs with uniquely desirable profiles that will actually move the needle in how we treat human disease?” said Prakash. “If you can deliver novel, interesting new drug candidates that work differently from what others have been able to find and are not just a tweak on something else that matters.”

An ocean of possibilities

While the number of drug-like compounds in small molecule catalogs worldwide is around a quarter of a billion, Prakash said that most of these can be clustered down to less than 10 million chemotypes because most of the differences are just minor tweaks to the same set of chemical backbones.

Prakash says the limit for possible compounds using the currently known rules for organic synthesis is a decillion, which is ten to the thirty-third power (1e33). There are other theories, such as Lipinski’s rule-of-five for oral bioavailability, which estimates that so-called “drug-like” chemical space consists of ten to the sixtieth power (1e60) for all drug-like molecules and somewhere between ten to the twentieth and twenty-fourth power (1e20–1e24) for all molecules up to 30 atoms.

Compared to Prakash’s or Lipinski’s estimates, a quarter of a billion small molecules is, as Prakash put it, “fishing in a tiny little droplet, not even in a tide pool by the side of an ocean.”

Prakash thinks we’re stuck in a droplet-sized catalog of potential drug-like molecules, in part, because AI algorithms are often misunderstood and misused. For example, generative AI approaches, frequently seen as panaceas for creating never-before-seen molecules, require training on large numbers of examples similar to the problems they will attempt to solve. However, when the problem at hand is dissimilar to the training data, generative AI struggles. According to Prakash, it’s like using a large language model, a type of generative AI, like ChatGPT, trained in English to start speaking in French, let alone a language with a different alphabet like Chinese.

Prakash said, “AI requires a lot of data for training, and when you give it something similar, it knows what to predict. It’s good at interpolation but terrible at extrapolation. Ask AI trained on the only available experimental data to do something, and it’ll help you tweak existing molecules. Every one of these primarily AI-driven drug-discovery companies… is producing these little tweaks to known molecules because that’s how AI works. It’s no different from what medicinal chemists have done for the last five decades. Medicinal chemists at least put a new grill on the front of the car before calling it a new model. These [AI companies] put a new paint job on and call it a new [car].”

Chemical crabbing

At the most microscopic level, drug binding is a physics problem—atoms in a protein and a drug pushing against one another, causing the overall structure to flex and twist. He believes that the starting point for drug discovery must be the 3D model of a nonstatic target protein—all the different ways the protein can twist and flex need to be understood.

The dynamic model must then be tested for binding interactions with other molecules that cause the protein to adopt a desired conformation. Prakash said that to get accurate information about where and how strong binding interactions are, many things must be considered. These include what happens when you are in water (rather than just an empty digital space) and the fact that the bound entities are not still—they vibrate. These variables are complex on their own, let alone when combined.

“People throw their hands in the air and say, ‘It’s too complicated! It can’t be done,’” said Prakash. “But water is not just a continuous medium; it’s made of discrete H2O molecules that bind the nooks and crannies of proteins and determine where a drug can go. When drugs and proteins bind, they form hydrogen bonds, often bifurcated ones. These are highly complicated quantum mechanical phenomena that people have no idea how to model correctly.”

Once the model’s parameters have been established, the screening of a decillion compounds can begin. But how? Certainly, it would be inefficient to go through the catalog one compound after the other. The best strategy for mining candidate molecules is similar to catching king crabs in Arctic waters: set out a bunch of traps, see which ones catch the most, and then move all the traps to these hotspots and repeat.

This process cannot be completed entirely computationally; it must eventually be tested on cells containing a plethora of other variables that may affect the ideal candidate, including side effects causing unintended phenotypes due to nonspecific binding. These variables are determined on the lab bench rather than by adding more and more to the modeling. So, the strategy is to take several hundred compounds, test them in the lab, and select the best binders, which are then used as starting points for the next modeling round.

The realm of small data

It’s one thing to be able to chart out a theoretical path to making huge numbers of unique and useful small molecules; it’s another matter entirely to put together the computational framework to handle everything that goes into being able to model how countless compounds may stack together to change a dynamic protein’s state into a desirable conformation.

Most AI systems rely on massive amounts of dense data to make accurate predictions. However, in fields such as life sciences—particularly drug development and clinical trials—the amount of data is sparsely distributed compared to the number of variables or features an AI model must monitor. Ironically, in these situations, traditional “big data” AI systems struggle to produce accurate results—if they can even create a predictive model from the limited data at all. Instead, new AI algorithms need to be designed to solve the problem of “small data.”

Verseon acquired Edammo in late 2022 to develop specialized AI tools for the “small data” problem internally. Edammo’s technology performs exceptionally well with small data, bringing the necessary efficiency to actually make the entire drug discovery framework work. The outcome is a brand-new AI technology called VersAI, which lowers AI prediction error rates by up to 35% compared to cutting-edge Deep Learning frameworks like Google AutoML.

With all of that, Prakash’s vision only grows larger, fueling the notion that this approach could drive personalized small molecule development. Without a doubt, the applications could be revolutionary, but they are meaningless unless there is proof that Verseon’s platform actually works.

Next-gen anticoagulants and more

Verseon is using its platform to develop first-in-class novel drugs for several indications, including diabetic retinopathy, hereditary angioedema, fatty liver disease, and some cancers.

Verseon has shown great promise in the context of anticoagulation. Verseon has used its platform to develop a new class of thrombin inhibitors, which could provide safer treatment options for many of the world’s 400 million cardiovascular disease patients. In contrast to existing anticoagulants, Verseon’s Precision Oral Anticoagulants (PROACs) demonstrate a distinctive mechanism of action that mitigates the risk of hemorrhage while effectively inhibiting the formation of perilous blood clots associated with heart attacks and strokes.

Maybe Prakash is really onto something revolutionary in drug development just as he was in video processing, where his tech is now core to applications like Netflix and Zoom calls and makes Intel billions of dollars. But it is early for Prakash to suggest that Verseon is superior to its competition based on the evidence that there has yet to be an AI-generated drug make it past Phase II, let alone get approved by the FDA—Verseon hasn’t gotten there either, with their furthest programs in Phase I.

Suppose Prakash is correct and he can figure out how to use a catalog consisting of “one followed by 33 zeroes” worth of organic synthesizable compounds properly. In that case, Verseon could generate trillions of dollars per year if its programs are successful, as many of them have markets on the order of hundreds of billions of dollars. If he’s not, at least he will have spent most of his career trying to make his vision for the future of drug development a reality instead of selling out and disappearing into the world of financial ‘quant’ modeling to make money out of thin air.