High Throughput Quantum Chemistry for Drug Discovery – Towards Reaction Screening
In the domain of drug discovery, there can be a world of difference between a computer generated hit compound, which is predicted to bind well to a drug target and what can be reliable synthesized at scale, or indeed synthesized at all. This discrepancy has been a lingering point of discord between the Discovery and R&D efforts in the chemical industry. Computer aided drug design (CADD) has become an increasingly valuable tool by providing essential screening data and unique insight into drug action and mechanism, but it does not model the more complex world of chemical reactivity and synthetic chemistry.
Synthetic chemistry and computational chemistry traditionally overlap at the interface of physical organic chemistry (kinetics and mechanistic rational) and reaction design (often catalysis) where quantum mechanics is leveraged to provide quantitative predictions. There is a growing need for more quantum computations of reactions to control for this ‘obtainability’ factor extending to the optimization process as well, not just for R&D scale reactions but all the way through to production. Synthetic and process chemistry efforts carried out by medicinal chemists and chemical engineers carry an increasingly greater risk and cost to execute but suffer from not having their own computational CADD-like counterpart to achieve the same benefits that modelling is known to deliver at the beginning of the process.
In principle, theorists can now actually inform synthetic chemists how to synthesize that ‘impossible’ molecule, but this has come at a considerable cost in time and computing resources. Thus, in practice the vast majority of computational work is performed on molecular structure and docking using molecular mechanics running on personal computers. This means that the interface between what could be a great drug and what lead compounds can actually be made is left with a technological gap where the analysis of the reaction transformations necessary to achieve these lead structures are left largely to literature searching or more high throughput data-driven methods like cheminformatics, and more recently, machine learning approaches.
State of Computation in Industry
Following increasing trends on digitization and computation, the chemical industry is preparing for an increasingly large user-base of chemical researchers and practitioners engaging with computations. There are over 1 million potential individual users working across thousands of innovation-driven companies. Roughly USD 19 billion p.a. will be spent on computation by 2020 within chemical R&D compared to about USD 9.5 billion currently (http://www.cefic.org/Facts-and-Figures/).
As computation becomes more ubiquitous it will become an expected aspect of chemical workflow. This technology is becoming more central to the process of drug design as it provides metrics at the point in in the development and research cycle when ideas are transformed into real world lead compounds.
Computation is still a niche market. The software sales market for traditional software packages in chemistry is at least 150 M p.a. (https://zenodo.org/record/260408/files/Scientific%20software%20industry.pdf). This market is dominated by a number of high-end suites of programs. Accelrys and Schroedinger Inc. represent a large market share for general computational chemistry environments. In 2013, Accelrys had a revenue ca. 170 M USD, a growth of 3.7% and employed over 750 people with over 2000 clients. In 2008, Accelys was valued at 100 M USD. The company was later sold to Dassault Systems under the new label Biovia for 750 M USD in 2012. The market is clearly growing (5% per year). The purchase price for industrial agents for these softwares range from USD 2k – 50k. In the quantum area, Schroedinger is the most innovative and is recently moving into the cloud area with early elements of automation as well.
However, there remains a large usage gap, where quantum chemistry continues to be regarding as a primarily academic tool. Indeed, the vast majority of the market is dominated by molecular mechanics force field based approaches that focus on molecular structure. This is unfortunate as the dichotomy we are discussing here limits early access to chemical information, with a knock-on effect on the quality of chemical IP generated and a company’s ability to protect that IP through patents. With quantum chemistry, the what and the how can be readily addressed earlier, quantitatively, thus ensuring greater intellectual property protection and more efficient product development. The barrier to implementation of quantum is the poor ease of use and difficult interpretation of quantum methods as well as the steep infrastructure and software hurdles required to execute computations on largescale distributed computing clusters.
Reactive Fragment-based Lead Discovery
As the technological gap between structure and reactivity has not been adequately addressed to date, discovery researchers have found other practical solutions. A relatively recent approach to address the problem of ‘obtainability’ is called reactive fragment–based lead discovery (RFBLD). Building on a typical fragment based approach, particular attention is directed towards fragments that can form known synthons for reactive assembly into larger molecular structures. The active functional groups undergo highly selective and/or high yielding reactions with complimentary fragments generating a library of accessible drug targets with the synthetic route encoded into their reactivity. Ideally, the fragments themselves are commercially available, or available through robust and well characterized short synthetic pathways. These lead molecules, when assembled from reactive fragments, have a much higher likelihood of surviving the scrutiny of the R&D and process scale-up efforts and thus will be more likely to be able to be prepared. This will lead to earlier toxicity testing and eventually a higher rate of clinical trial entry. See the offering of Bioblocks Inc. for an example of a CRO working in this space or also Enamine, which is operating in the robotics area on such reactive fragment based libraries.
While this RFBLD approach reduces the pain in the potential design of impossible to synthesize molecules, legacy modelling softwares are unable to accurately assess reactivity and thus subtleties in the reactivities, essential for prioritizing the hit lists, are not available and the technological gap remains. This is because molecular mechanics force field approaches are non-physical and are not conceptually adequate to treat the complex electronics involved in reactive intermediates and, especially, transition-states where the concept of a `bond` is no longer easily applied. Thus, the generated hit libraries, which have been made with their reactivity in mind, cannot be further assessed for reactivity from a computational perspective leaving only cheminformatics approaches to provide qualitative suggestions, rather than the highly prized quantitative answers.
Efficacy of a Chemical Informational Approach
Cheminformatics has been forcibly applied in the chemical reaction-forecasting (as pioneered by E.J. Corey) domain for decades. Reaction forecasting has primarily employed a rules-based approach to induce how molecules may react based on their molecular connectivity, relying exclusively on prior experience (rules) of known reactions or transforms, often supplemented with experimental knowledge of known reactions. This data is commonly accessed through large databases of chemical literature such as Scifinder and Reaxys. For the latter, big data has been used with great effect by employing structural similarity algorithms (SMARTS) to match specific queries on new molecules/reactions to known examples to forecast likely outcomes. The program LHASA (https://www.lhasalimited.org/) and the SAVI (https://cactus.nci.nih.gov/download/savi_download/) project employing CACTVS are thus currently able to provide key prediction for synthetic route planning, the first focusing on in vivo transformations and toxicity and the latter on the generation of new virtual molecules using robust transforms from commercially available molecules. Other players include ChemAxon, Infochem and Open-eye Software. As these solutions use reported chemical data and do not require computation and are therefore able to deliver results very cheaply. However, the inaccuracy is a severe limitation.
If you are a chemist and you have tried a reaction you already know that structural similarity based on literature searching is merely prescriptive. The quantitative scoring of whether these ‘possible’ reactions will actually happen has proven to be a very difficult challenge even leveraging big data and machine learning techniques, which are being developed by Wiley who offers their own services to this end (Chemanager (http://www.chemanager-online.com/en)) while MilliporeSigma is developing ChemAtica (http://chematica.net/#/). Such approaches using experimental information as the base data type require increasingly complex rules to discern the differing reactivity between structural similar molecules. Small changes in molecular structure have dramatic and, often unforeseen, consequences on reactivity and as such, reaction forecasting is limited to non-quantitative suggestions. More recent advances in rules-based machine-learning treat more fundamental molecular features (including atomic charge, resonance etc.) to predict reactions (https://arxiv.org/abs/1608.06296). These methods do about as well as a first-year organic chemistry student. Reaction forecasting is a 50-year-old challenge yet to be solved by a cheminformatics and/or rules-based approach.
Finally, considering that the absolute size of molecular chemical space has been estimated to be about 1060 possible organic compounds (equivalent to the number of atoms in the Milky Way). This number is bounded by an upper limit in molecular weight of 500 amu constituting all possible combinations of small to medium sized organic molecules. Any purely experimentally data-driven approach will be severely limited and myopically focused on the ca. 100 million molecules, which have been reported (108 compared to 1060 possible) – essentially nothing.
Automatic Ultrafast Quantum Computation
There is a need for a high throughput quantum mechanics-based reaction screening tool that can quickly and systematically measure ‘reactivity’ as defined by the thermodynamics and kinetics of an elementary step or series of steps, such as that occurring between reactive fragments in the RFBLD approach. Our developing solution moves beyond a traditional rules-based and literature searching method for planning synthetic chemistry strategies. Progressing from using cheminformatics reaction data (reaction forecasting) to instead, a computation and big data to Now-Cast chemical reactivity. Now-casting uses quantum mechanics calculations to address the exact research question the chemist is asking at the moment they are asking it. Reaction forecasting relies on identifying ‘similar’ molecules or reactions to the specific query and an, often erroneous, equating of their reactivities. Our Now‑Castor does not make such an assumption. Since our approach treats molecules from a structural/energetic perspective, no a priori knowledge is required and all of chemistry, including completely novel reactions and molecules can be considered for their efficacy or suitability without the need for chemical rules or prior knowledge. The needed chemical structures are generated by computational fast quantum methods as needed on-the-fly. The speed and generality of this approach allows the systematic exploration of complex systems of chemical reactions directly from standard 2D chemical syntax with reliable energies.
ChemAlive deployed the first fully cloud-based quantum chemistry interface in 2017 and is now working on the development of its reaction spawning and reaction screening technology. Many of the technological hurdles to achieve the reaction screening service have already been addressed with our cloud-based approach to conformational analysis available for demo here: