DoE and other enabling technologies

How do DoE, HTE, Bayesian Optimisation and Self-optimising reactions compare, and is there a best option for process development?

The role of the synthetic chemist has changed dramatically over the last couple of decades with the incorporation of many advanced techniques, particularly for those working in industry. However, when new technologies become available it can be hard to know what will add value to the chemist’s toolkit and when to wait for things to develop further and become easier to implement. With ever complicated target structures and high functionality compounds to synthesize within the guidelines of minimal cost, waste and time, chemists are constantly looking to improve their efficiency while gaining more understanding of their process. In our roles as consultants and trainers in process development and experimental design, we often encounter questions about high throughput experimentation, Bayesian optimization and self-optimising reactions. This article aims to provide a very brief introduction to each technique to allow a quick comparison with their benefits and weaknesses.

In High Throughput Experimentation (HTE) multi-well plates are employed to carry out experiments in parallel. HTE can be used with different goals in mind but we will consider HTE for reaction optimization. An initial hit is taken and different conditions, substrates and solvents can be screened and evaluated in different arrays while in parallel. Generally, the most common analytical method used with HTE is HPLC but other automated chromatographic methods can also be useful. Using in-line GC or HPLC can make the experimentation fast but if the reaction requires work up and isolation it can be laborious. Due to the endless possibilities of chemical reactivities, some rationale must be applied in choosing reagents and this can be done using physical organic parameters (e.g. pKa, dielectric constants etc) but there is no way to relate factors to each other or see interactions between the factors and the reaction output. (ANYTHING TO ADD HERE?- In short, a large area of chemical space can be covered but the instrumentation can be prone to inaccuracy and the information gained mainly directs you to an area of interest where much further study is required.)

Experimental design is a systematic approach to experimentation which should cover a wide range of chemical space. It uses statistical methods to extract relationships between factors and responses and provides information about the way the total system works. However, it only investigates what you choose. A missing factor is missing information and thorough factor selection should be considered before choosing a design. The experiments are carried out as one package of work before the statistical analysis of all results together. This element of DoE enables the extraction of the relationships but the downside is if the design is run poorly your analysis will be less reliable. Using sequential experimental design (maximum of 3 designs) can take you from the start of a project to an optimized process in far shorter timescales. In combination with Principal Component Analysis (PCA) to study the effect of solvents and ligands, highly powerful designs can be created to explore vast areas of chemical space, which do not rely on the intuition of the chemist, and can reveal novel and unexpected chemistry through accessing different chemical pathways and mechanisms.

Bayesian Optimization starts with results from an initial reaction set e.g. a DoE or a database and creates a probabilistic surrogate model with the aim of finding the highest reaction output in the remaining unknown space. New experiments are proposed based on a balance between exploring new space and the most successful previous experiments, these are carried out and the results used to update the model. The process continues iteratively until a maximum is achieved, sources are depleted or the space is explored such that finding improved conditions is improbable. BO has been shown to be more accurate at finding optimums than human interpretation and you can get to a result faster than without it, but it is still an iterative process and can take time. Additionally, the models currently work best for continuous variables (as is so often the case), though there are work-arounds for discrete inputs.

Self-optimising reactions are generally automated continuous flow processes whereby a set of experiments are programmed to run autonomously by the equipment and the results later analysed and processed, enabling subsequent experiments to be designed and performed based on the data the system acquires. The work started with the use of a basic simplex algorithm for optimization but has since been developed to include Bayesian algorithms to its advantage, and several research teams are delivering updated algorithms to accommodate more complex reactions. It is our opinion that this technology is still mainly beneficial in academic settings rather than for process development in industry, however, it seems to be only a matter of time before industrial chemists are encourage to incorporate programming whether that is through simpler plug and play type equipment or from self-learning.

With all these techniques we want to stress that it is so important to keep the chemistry in the forefront of your mind, stick to the technique you know the best and try to keep the applications as simple as possible. With process development chemistry you don’t know how many steps or iterations are required to meet your target, but what you are aiming for should be clearly stated.

In our opinion, the most applicable and established tool for process development in chemistry is combined DoE and PCA. See our case studies for an example of this in action and how we have saved a client over $400,000 in a project lasting 3 weeks and consisting of 2 designs. Please get in touch if you want to know more or have a problem you need help with.