Physics infused machine learning force fields for 2D materials monolayers

Large-scale atomistic simulations of two-dimensional (2D) materials rely on highly accurate and efficient force fields. Here, we present a physics-infused machine learning framework that enables the efficient development and interpretability of interatomic interaction models for 2D materials. By considering the characteristics of chemical bonds and structural topology, we have devised a set of efficient descriptors. This enables accurate force field training using a small dataset. The machine learning force fields show great success in describing the phase transformation and domain switching behaviors of monolayer Group IV monochalcogenides, e.g., GeSe and PbTe. Notably, this type of force field can be readily extended to other non-transition 2D systems, such as hexagonal boron nitride ( h BN), leveraging their structural similarity. Our work provides a straightforward but accurate extension of simulation time and length scales for 2D materials.


INTRODUCTION
Two-dimensional (2D) materials have attracted tremendous research interest in the past decades because of their various physical properties [1][2][3][4][5][6][7] .As an important member, 2D ferroic materials show great potential applications in emerging 2D sensors and actuators, and their functionalities are attributed to reversible domain switching and phase transformation triggered by external fields [8][9][10][11] .However, the kinetics of ferroic-order change in 2D ferroic materials remain largely elusive, making them hard to control in devices.To address this issue, one needs not only preparation of high-quality 2D material samples but also characterization of high temporal and spatial resolution, both of which are challenging for experimental studies [8,12] .Atomic simulations are alternative effective methods.However, there is a trade-off between the accuracy and efficiency of calculations.On one hand, ab initio molecular dynamics (AIMD) simulations are ideal tools for accurately studying the dynamic behaviors of 2D ferroic materials in response to external stimuli [13][14][15][16] ; however, the limited size of supercells and simulation time makes AIMD simulations unable to capture the rare events that are critical for ferroic phase transformation processes, e.g., the nucleation of new domain/phase.On the other hand, molecular dynamics (MD) provides a more efficient approach to studying the kinetics of 2D ferroic materials.Based on a semiempirical force field, MD simulations can reproduce the microstructural evolution within 2D materials at the time scale of several nanoseconds.Nevertheless, there remains a lack of high-fidelity force fields for ferroic 2D systems.For example, there is still not a satisfactory force field that can capture the strongly coupled ferroelastic-ferroelectric orders in monolayer Group IV monochalcogenides [16] .
Recently, machine learning (ML) force fields emerged as a new type of force field that can capture the interatomic interactions with quantum mechanical precision [17][18][19][20] , opening new ways for MD simulations of 2D ferroic materials.Starting from high throughput Density functional theory (DFT) calculations, ML algorithms can help map the local atomistic structures into the corresponding energy and force.Consequently, the force fields generated via ML methods share the accuracy of DFT calculations and the efficiency of semiempirical force fields simultaneously.Nevertheless, most ML force fields, so far, are developed based on the strategy of "data-driven", i.e., directly learned from massive DFT data of various atomic configurations [18] .Preparing such a training dataset is computationally demanding.More importantly, such data-driven force fields usually lack physics information and interpretability and are quite different from classical force fields that depend on the physical and chemical models chosen to describe interatomic bonding.
Here, we present a physics-infused ML framework that enables the efficient development and interpretability of interatomic interaction models for 2D materials.By integrating the physical model of interatomic bonding into the Gaussian approximation potential, we successfully construct a series of ML force fields for monolayer GeSe and PbTe.For each case, only less than ~10,000 DFT is required to achieve a highly accurate force field, which can well capture the ferroic phase transformation and domain switching processes.Strikingly, we find the force field model can be easily transferred to other non-phase transition 2D materials, e.g., hexagonal boron nitride (hBN), which have similar topologies and chemical bonding characters with GeSe and PbTe.Finally, we show that the transferred model successfully describes the fracture behavior of monolayer hBN.

MATERIALS AND METHODS
There are two basic assumptions to construct a ML force field [17,21] in order to map the structure and its corresponding energy.First, atomistic potential energy relates only to its local configuration, and second, the total energy of a system equals the sum of single atomistic energies.In general, there are four important steps, namely the generation of a database by DFT calculations, featurization, ML regression, and benchmarks, to train a ML force field [Figure 1].They will be discussed in detail in the following sections.

Generation of the training database
AIMD simulations were performed at varying temperatures with different strains for monolayer GeSe and PbTe [Table 1].The high-temperature configurations are used for sampling structures that are outside of local equilibrium states, while the medium-and low-temperature configurations are important to capture the potential energy surface near the equilibrium state.Besides, DFT references related to phase transformation and ferroelastic domain switching are also used to expand the training database.Here, we select 101 configurations during the ferroelastic domain switching for the GeSe dataset and 404 configurations of stress-induced ferroic phase transition for PbTe.
All the referenced configurations are calculated based on first-principles DFT [22,23] via the Vienna ab initio simulation package (VASP) [24,25] by using a Perdew-Burke-Ernzerhof (PBE) [26] form of exchange-correlation function with the generalized gradient approximation (GGA) [27,28] .The models of GeSe and PbTe contain 64 atoms.For a given temperature and strain state, we performed an AIMD simulation over 6,000 steps with a timestep of 2 fs.The plane wave-basis cutoff energy is 300 eV for both GeSe and PbTe.A dense k-point mesh of 3 × 3 × 1 is used to determine accurate forces and energies.

Featurization
Coding atomic configurations into feature space needs to satisfy the requirements of continuity and symmetry [29] .Besides, bonding information between different elemental pairs should also be considered independently.For a binary system, we considered three kinds of pairwise interactions and eight kinds of three-body interactions [Table 2].
Pairwise interactions are related to the nature of covalent bonds; i.e., covalent bonding is a short-range interaction, and the strength of bonding decreases sharply above a critical bond length.Pairwise interactions relate to the bond length between two atoms, i and j.It can be easily captured by the Fourier series of a step function.Therefore, after distinguishing the interacting atomic types, pairwise descriptors can be expressed as:

Materials Pairwise interaction Three-body interaction
GeSe Ge-Ge Ge-Se Se-Se Ge-Ge-Ge Ge-Se-Ge Ge-Ge-Se Ge-Se-Se Se-Ge-Ge Se-Se-Ge Se-Ge-Se Se-Se-Se  3] are used to construct the descriptor vector for the pairwise descriptor; that is, Three-body interactions relate to the bond angle and bond length among atoms i, j, and k.Descriptors can be expressed as: where r ij and r ik represent the bond length from atom j and m to atom i, respectively.We use four different η 3 [Table 3].θ jik is the angle between atoms i, j, and m centered on atom i.We choose three different forms of f(cosθ jik ) in this work [Table 3].These parameters should be not far from the cutoff distance.A set of too large or too small η and k can greatly reduce the model accuracy.As shown in Supplementary Table 1, an order of magnitude greater or smaller than our parameters will lead to a poor fitting performance.

Kernel ridge regression
The ML force field is based on the formulation by Botu and Ramprasad [18] .Kernel ridge regression (KRR) [30,31] , one of the Gaussian process-type ML approach [32,33] , maps the structural descriptors and the total energy.A linear kernel function, K(V input , V ref ), is used to measure the similarity between the input (V input ) and the referenced structural descriptors (V input ).We can express the target energy E as: where ω t denotes the weighting coefficient during fitting.t labels each referenced atomic environment and V ref t is its corresponding descriptor.All the descriptors are normalized to [0, 1] before the training is produced.The ML fitting is performed by an open-source Python code named scikit-learn [34] .In our work, we do not include atomic forces during the training process.However, the atomic forces are used in the

Benchmarks
To evaluate the quality of the ML interatomic force fields for monolayers GeSe and PbTe, we first check the errors, e.g., RMSE and mean absolute error (MAE) in potential energy [Figure 2].Predicted potential energy by ML interatomic force fields agrees well with the referenced DFT calculations.Small errors (< 2 meV/ atom in RMSE) and large R 2 (> 0.99) indicate a good performance of our force fields.
Then, we checked the quality of the predicted atomistic force.Atomistic force can be easily calculated by differentiating the expression of total energy E (Equation 3) due to the selection of linear kernel function.The RMSE and MAE of atomic force for monolayer GeSe and PbTe are shown in Figure 3.The small value of RMSE (< 0.1 eV/Å) and the R 2 indicate the good agreement in force prediction.Thus, we believe that the ML force fields can well describe microstructural evolution at finite temperatures and external straining.
Accurate prediction in energy and force results in the reliable prediction of phonons [Figure 4] and the elastic constant [Table 4]. Figure 4 shows a reasonable agreement in the phonon spectrum between ML prediction and DFT calculations, especially for the acoustic branches.All the above benchmarks indicate that the ML force fields have a good predicting ability for monolayer GeSe and PbTe.We do more critical benchmarks by comparing the defect formation energy from our ML force fields with DFT calculations [Supplementary Table 2].Actually, these two force fields can also describe the phase transformation/ domain switching behaviors, which will be shown in Section 3.

Details of MD simulation
Stress-induced ferroic domain switching in monolayer GeSe and stress-induced ferroic phase transition in monolayer PbTe are performed by the learned ML force fields.The technical details of these MD simulations are given here.
A periodic boundary condition is applied along the x and y directions.All models are relaxed at 50 K in an isothermal-isobaric ensemble for 40 pico-seconds before any perturbations are applied to it.Averaged stress along both x and y directions remains zero after the relaxation.An isothermal-isobaric ensemble is used with the help of the Nosé-Hoover thermostat [35,36] and the Parrinello-Rahman barostat [37] .Tensile deformation of GeSe and PbTe along the x direction was applied with a strain rate of 5×10 8 per second at 50 K.The corresponding tensile stress was calculated by the total force divided by the effective thickness.The effective thicknesses for GeSe and PbTe are 9.61 and 9.41 Å, respectively.All MD simulations are  performed by using the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) code [38] .Atomic configurations are visualized by AtomEye [39] .

RESULTS AND DISCUSSION
We then apply the ML force fields to different cases, i.e., temperature/stress-induced ferroic phase transformation/domain switching in monolayer GeSe and PbTe.

Temperature-induced ferroic phase transformation in monolayer GeSe
Figure 5 shows the potential energy as a function of temperature upon a heating-cooling cycle in a monolayer GeSe.The discontinuity in potential energy upon heating (cooling) indicates the phase transformation, in which the long-range spontaneous ferroic orders (strain or polarization) disappear (appear).The insets in Figure 5 schematically illustrate the typical domain structure in our heating-cooling cycle.Starting from a single-domain ferroelastic phase with a spontaneous strain along the y direction, the model transforms into a para-phase, where the long-range spontaneous strain and polarization disappear when the temperature is higher than ~310 K.The para-phase GeSe will transform back into the ferroic one upon cooling, while the transition temperature is ~260 K.After cooling down, the monolayer GeSe shows a multi-domain structure of the low temperature phase.All the detailed microstructure evolution can be accessed from our previous works [40,41] .

Stress-induced ferroic domain switching in monolayer GeSe
The switching between different domain variants in monolayer GeSe can also be reproduced by MD simulations of its response to external stress or electrical fields.Figure 6 shows a tensile deformationinduced ferroic domain switching in monolayer GeSe at 50 K. Figure 6A shows the evolution of tensile stress as a function of tensile strain (ε) in monolayer GeSe at 50 K.Starting from an undeformed structure, the sample undergoes an elastic deformation and then yields at ε = 4% with a yielding strength of ~1.2 GPa.After yielding, the stress drops down to ~0.3 GPa.Then, it increases linearly again until a second drop occurs at ε = 10.0%.The stress drops down to 0.7 GPa at ε = 11.8%, which is followed by a third linear increase at ε = 14.0%.
Figure 6B-G shows the typical atomistic configurations upon tension, where we use an arrow, which starts from a Ge atom and ends in an adjacent Se atom, to represent the Ge-Se pairs.We colored the arrows based on their directions, which could help to identify different domains.Figure 6B shows the undeformed monolayer GeSe, in which all the ferroic orders are along the same direction (the spontaneous strain η along y direction and the spontaneous polarization P is along -y direction, marked as (η y , -P y ) and magenta arrows).At the yielding point (ε = 4.0%), a new domain with ferroic order (η x , -P x ) nucleates (cyan arrows in Figure 6C) and then grows up fast [Figure 6D].A stripped pattern forms in the sample and keeps stable until ε = 10.0%, when some secondary domains (η x , P x ) nucleate (red regions in Figure 6E) with the secondary drop of the tensile stress [Figure 6A].However, in this stage, the secondary domains are not stable.Instead of the secondary domains, the primary domain (η x , -P x ) grows up at the end of this stage (ε = 11.8%)[Figure 6F].The final drop of the tensile stress at a highly deformed state (ε = 14.0%) is induced by the re-appearance of secondary domains [Figure 6G].
The 2D domain switching is also realized by a motion of the domain boundary and kinks inside the boundary.Therefore, the 2D domain switch can be resolved into one-dimensional manners, e.g., the domain boundary motion, and zero-dimension manners, e.g., the kinks inside the domain boundary.Hierarchical domain structure forms with a high domain boundary density, which shows the great potential of 2D ferroics in domain boundary engineering.

Tensile deformation induced ferroic phase transformation in PbTe
Figure 7A shows the strain-stress curve of monolayer PbTe upon loading and unloading at 50 K.Based on the structure evolution upon loading and unloading, we divide the entire process into six ranges labeled as R1~R6.There are three typical phase structures named para-phase, dual-phases, and ferro-phase [Figure 7B-D] that appeared or disappeared in the loading-unloading loop.The para-phase is the equilibrium state for monolayer PbTe, where the centers of Pb atoms and Te atoms coincide [Figure 7B].Thus, there are no spontaneous ferroic orders.However, the ferroic phase appears under some specific external fields, e.g., tensile stress.Spontaneous strain and polarization appear in the ferroic phase [Figure 7D].Dual phases are an intermediated state during the phase transformation [Figure 7C].
Starting from a perfect structure, the monolayer PbTe undergoes an elastic deformation in R1 (0.0 < ε < 0.07), in which the structure remains in a para-phase [Figure 7B].Yielding occurs at ε = 0.07, with a yielding strength of ~1.5 GPa.The yielding is induced by the nucleation of the ferroic phase.The ferroic phase grows up while the para-phase shrinks via the motion of phase boundaries in R2 (0.07 < ε < 0.16).The tensile stress remains in a platform with large fluctuations, which is induced by the pinning and depinning of the phase boundary.This process is similar to the pinning-depinning of dislocation motion in metals.More details can be seen in Supplementary Figure 1 and Supplementary Movie 1.All the para-phase is transformed into the ferroic phase at the end of R2.The elastic deformation of the ferroic phase in R3 (0.16 < ε < 0.25) induces a further increase of the tensile stress [Figure 7A].Upon loading in R4 (0.25 > ε > 0.12), the elastic energy stored in the ferroic phase is first released, while the tensile stress decreased from 1.80 to < 1.0 GPa.Then, the reverse phase transformation occurs and remains in R5 (0.12 > ε > 0.05), after which the system recovers back to the para-phase.Further unloading releases the elastic energy of para-phase and induces a full recovery in R6 (0.05 > ε > 0.0).The monolayer PbTe sustains a recoverable strain as large as 0.25 with low hysteresis.

Physics-infused descriptor designing for monolayer GeSe and PbTe
Phase transformation and domain switching in 2D ferroic materials are accompanied by the repetition of bonding and debonding.Different from that in metallic materials, covalent bonding is a short-range interaction, i.e., there exists a sudden drop (increment) of bond strength or bonding energy as a function of distance (red line in Figure 8A).Here, we select an oscillated attenuation function to describe the pairwise interaction [Figure 8B] rather than a Gaussian function [17][18][19] , which is widely used in fingerprinting pairwise interaction of metals.As shown in Equation 1, η controls the decay rate, and k affects the oscillation period of V 2b i .Here, we choose eight different (η, k) pairs [Table 3] to represent the structural characters, and their combinations make the descriptors suitable for describing the covalent bonding process.As illustrated in Figure 8A, the interactions between two atoms are rather weak once the bond breaks.Often, the effective distance of covalent bonding interactions is slightly larger than the equilibrium bond length of the first neighbors (a 0 ).This suggests a much smaller cutoff for covalent systems than metals.Figure 9 shows the evolution of the RMSE and R 2 as functions of cutoffs in monolayer GeSe and PbTe.By normalizing R c by a 0 , a similar trend appears.Once the cutoff is larger than the second nearest neighbor (e.g., > 1.5a 0 ), the RMSE drops into a value smaller than 2 meV/atom, and the R 2 rises towards a value of 1.This encourages us to select cutoff values tailored to specific materials.

Structural similarity of monolayer GeSe and PbTe
Besides the similar bonding information in monolayer GeSe and PbTe, their structural similarity also holds on for the good performance of the descriptors.As shown in Figure 10, monolayer GeSe and PbTe have the same structural topology.Taking GeSe as an example, Ge and Se atoms occupy different sites with a zigzag structure but different planes [Figure 10A].We use four dihedral angles, i.e., θ 1 , θ 2 , θ 3 , and θ 4 , to describe the relationship between these atomic planes.The unit cell of monolayer GeSe can be seen in Figure 10B.All the dihedral angles are within the range of between 90° and 180°, leading to the eccentricity between Ge atoms and Se atoms, which is the origin of spontaneous polarization.Changing all the dihedral angles to 90° gives rise to the typical structure of PbTe without spontaneous polarization [Figure 10C].At the other extreme, i.e., all four dihedral angles are equal to 180°, all atoms lie within the same plane [Figure 10D].It is the situation of monolayer hBN.Therefore, we infer that the descriptors should also work for developing the force field of hBN, the validation of which will be presented in the subsequent sections.
Structural similarity can also be manifested through the distribution of bond length in GeSe, PbTe, and hBN. Figure 11 shows the distribution of normalized distance between atomic pairs in monolayer GeSe and PbTe.Considering the atomic types, we use X to represent Ge/Pb/B atoms and Y to represent Se/Te/N atoms.The distances between atomic pairs are normalized by their respective nearest bond lengths a 0 .These distributions in GeSe, PbTe, and hBN share the same shape, although the monolayer GeSe has the widest spectrum (cyan points).The more scattered distribution of bond length in GeSe originates from the slightly shifted 90° bond angle.

ML force field describing the fracture of hBN
The descriptors were then transferred into the force field model of monolayer hBN.As expected, the performance of ML force fields is good.The RMSEs in energy and force prediction are only 0.84 meV/atom and 0.1 eV/Å, respectively [Figure 12A and B  Firstly, a molecular static simulation was applied to the tension deformation via energy minimization.We use the same model for the DFT calculations, which contains 60 atoms with a dimension of 1.25 nm × 1.30 nm.The model is stretched along the armchair [Figure 13A] and zigzag directions [Figure 13B], respectively.The good agreement in the stress-strain curve for the ML prediction and the DFT calculations indicates the high reliability of our physics-infused ML models.
Figure 13C and D shows the tensile strain-stress curves along the armchair and zigzag directions at 50 K, respectively.In each direction, we studied the tensile response of different monolayer hBN supercells with periodic boundary conditions.The size of the supercells varies from 1.25 nm × 1.30 nm to 12.5 nm × 13.0 nm.We find that in the near-equilibrium state, there is almost no size effect.The elastic modulus is ~710 GPa in the armchair direction while ~798 GPa in the zigzag direction.However, the fracture strain and strength exhibit a size-dependent behavior.We find the mechanical response converges when the sample size is larger than 5.2 nm × 5.0 nm.The converged fracture strain along the armchair direction is ~0.18 with a tensile strength of 81 GPa, while ~0.21 with a tensile strength of 96 GPa when loading along the zigzag direction.All these results show good agreement with previous simulations and experiments [42,43] .

CONCLUSION
In summary, we show a physics-infused ML force field with a small training dataset to describe the microstructural evolution of monolayer 2D materials.The developed force field is suitable for describing the atomic process of phase transformation, domain switching, and fracture, which are all related to a unified bonding/debonding process.The high transferability of the force fields originates from the strategy of physics-infused feature engineering, which integrated the nature of covalent bonds and structure topology.These force fields provide an efficient path for understanding the dynamics of 2D materials up to nanoseconds.

Figure 1 .
Figure 1.Strategy to generate a ML force field.DFT: Density functional theory; MD: molecular dynamics.
Te-Te Pb-Pb-Pb Pb-Te-Pb Pb-Pb-Te Pb-Te-Te Te-Pb-Pb Te-Te-Pb Te-Pb-Te Te-Te-Te where r ij is the bond length between atoms i and j. f c (r ij ) = 0.5[1 + cos(πr ij /R c )] is a damping function for atoms within the cutoff distance R c , and it is zero elsewhere.k and η are related to the damping of an exponential function.Eight k and η values [Table

jim 2 Figure 2 .
Figure 2. Comparison of the predicted atomistic energy and the true values in monolayer (A) GeSe and (B) PbTe.The training dataset contains 11,893 and 10,148 configurations for GeSe and PbTe, respectively.DFT: Density functional theory; MAE: mean absolute error; RMSE: root mean square errors.

Figure 3 .
Figure 3.Comparison of the predicted atomistic force and the DFT counterpart of monolayer (A) GeSe and (B) PbTe.The testing dataset contains ~0.86 million and ~1.5 million points for GeSe and PbTe, respectively.We calculate the RMSE, MAE, and R 2 of the ML model for the testing data, and only 10% of the data are shown in the plots.DFT: Density functional theory; MAE: mean absolute error; ML: machine learning; RMSE: root mean square errors.

Figure 4 .
Figure 4. Comparison of the phonons between the predicted and the DFT calculations for monolayer (A) GeSe and (B) PbTe.DFT: Density functional theory; ML: machine learning.

Figure 6 .
Figure 6.Stress-induced domain switching in monolayer GeSe.(A) Tensile stress evolution as a function of tensile strain; (B)-(G) Typical atomistic configurations of GeSe upon tensile-induced domain switching.Here, we use different colors to indicate the ferroic orders, where the magenta means domains with ( η y , -P y ), cyan means domains with ( η x , -P x ), and red means domains with ( η y , P y ).

Figure 7 .
Figure 7. Stress-induced phase transformation in monolayer PbTe.(A) Tensile stress-strain curve upon loading and unloading; (B)-(D) Typical phase structures of PbTe upon loading and unloading.

Figure 8 .
Figure 8. Oscillated attenuation features to describe the covalent bonding or debonding.(A) Schematic for the differences between covalent and metallic bonds; (B) Evolution of V 2 i b with bond length r ij .

Figure 9 .
Figure 9. Fitting quality evolution as a function of R c .

Figure 10 .
Figure 10.Structure similarity of monolayer GeSe, PbTe, and hBN.(A) Atomic structure of monolayer GeSe; (B) Basic structure of monolayer GeSe with all dihedral angles in the range between 90° and 180°; (C) Basic structure of monolayer PbTe with all dihedral angles equal to 90°; (D) Basic structure of monolayer hBN with all dihedral angles equal to 180°.hBN: Hexagonal boron nitride; 2D: two-dimensional.
].The phonon spectrum in Figure12Calso shows great agreement with the DFT data.With this force field, we run MD simulations to study the fracture behavior of monolayer hBN.

Figure 13 .
Figure 13.Tensile deformation induced fracture in monolayer hBN.(A) and (B) Comparisons of the stress-strain curve upon static tensile loading along the armchair and zigzag direction, respectively; (C) and (D) Size effects on mechanical response along the armchair and zigzag direction, respectively.DFT: Density functional theory; hBN: hexagonal boron nitride; ML: machine learning.

Table 4 . Comparison between the ML force fields and DFT Calculations on elastic constants
DFT: Density functional theory; ML: machine learning.