# Machine Learning

Neural Networks and Gaussian Process Regression in Chemistry

The most time-consuming step in all calculations of theoretical chemistry is the calculation of the electronic structure and the corresponding energy. Electronic structure methods scale steeply with the number of atoms in the system.

Instead of calculating the electronic structure for every geometry of a molecule for which we want to calculate the energy we can also try to learn the underlying energy function with our computers. From only few training points we let the computer learn the potential energy surface (PES) for interesting molecular configurations.

In our group we study two basic approaches to achieve this goal: neural networks (NN) and Gaussian process regression (GPR). Furthermore, we develop descriptors to represent the molecule in a way that our machine learning methods can learn the molecular properties most efficiently.

### Neural networks

Neural networks (NNs) are probably the most well-known machine learning methods.

Basically, they consist of so-called *neurons *that are inspired by the biological counterpart. As depicted on the right, they can have multiple inputs that are summed up. Applying the so-called *activation function *(f in the diagram) on this sum yields the output of the neuron.

Graph-like arrangements of multiple neurons can be used to learn all kinds of patterns. The most simple approach are feedworward neural networks as depicted on the left. In our case we use them to interpolate potential energy surfaces.

We use our self-written NN-code to represent the PES. We can feed energy, gradient and Hessian information into the NN. The resulting NN-PES allows cheap, but still sufficiently accurate evaluations of the energy and the Hessians. This allows us to do precise rate calculations with instanton theory with a fraction of the computational cost that was required before [1].

In our recent approaches we study atomistic NNs that use a single NN for every atom type. These can be scaled up to large system sizes due to their local character. To that end, we participate in the aenet project.

### Gaussian process regression

Gaussian process regression (GPR) is a so-called *Bayesian method*. That means it has statistical properties. For example one can make uncertainty estimates. The most probable energy is used as the prediction of the potential energy surface. Like in NNs, we can feed energy, gradient, and Hessian information to the algorithms. In the picture on the left we have a simple one-dimensional example for GPR: order 0 with only energy information, order 1 with additional gradient information. The uncertainties are depicted by the grey areas between the orange lines.

In contrast to NNs the results of GPR are obtained within seconds (NNs need a much longer training phase) and they need much fewer training points to yield locally meaningful predictions. This enables us to use them for on-the-fly algorithms like geometry optimization. [2, 3]

On the other hand, they are not capable of handling as large datasets as NNs.

With only few training points (the black dots on the right) we can already obtain a good representation of the real PES (grey) with our learned GPR-PES (blue) in the vicinity of the training points.

We make use of the GPR-PES to find minima and saddle points on the real PES. This is a very important task in theoretical chemistry which can be sped up using machine learning.

Our GPR code and the GPR-based optimizers will be made available in DL-FIND.

### Development of Descriptors

The efficiency of all machine learning methods can be strongly dependent on the quality of the representation we feed the data in. For example in chemical systems we have to consider various invariances that should be incorporated in the machine learning method: We can change the rotation or position of the investigated system without changing its energy. Also we can exchange two atoms of the same type in a molecule. The resulting molecule has exactly the same properties as the original one. We call these rotational/translational/permutational invariances.

The incorporation of these invariances is most easily done by chosing a coordinate system that is already intrinsically invariant to these features. There are well-known coordates like the Z-matrix, but we also use a selection of bond distances and their inverses.

The usage of specialized coordinate systems improves the efficiency of machine learning methods both in training speed and also in accuracy.