Dataset Title: Data related to communication: "Is DFT enough? Towards Accurate High-Throughput Computational Screening of Azobenzenes for Molecular Solar Thermal Applications" Dataset Authors: Aleotti, Flavia (University of Bologna, ORCID: 0000-0002-7176-5305); Soprani, Lorenzo (University of Bologna, ORCID: 0000-0002-7127-2770); Rodríguez-Almeida, Lucas Francisco (University of Bologna, ORCID: 0000-0002-9785-703X); Calcagno, Francesco (University of Bologna, ORCID: 0000-0002-0986-4425); Loprete, Fabio (University of Bologna, ORCID: 0009-0003-4883-0716); Rivalta, Ivan (University of Bologna, ORCID: 0000-0002-1208-602X); Orlandi, Silvia (University of Bologna, ORCID: 0000-0001-9201-9398); Canè, Elisabetta (University of Bologna, ORCID: 0000-0002-1811-7915); Garavelli, Marco (University of Bologna, ORCID: 0000-0002-0796-289X); Conti, Irene (University of Bologna, ORCID: 0000-0001-7982-4480); Muccioli, Luca (University of Bologna, ORCID: 0000-0001-9227-1059) Dataset Contact Person: Aleotti, Flavia (flavia.aleotti@unibo.it, University of Bologna, ORCID: 0000-0002-7176-5305), Garavelli, Marco (marco.garavelli@unibo.it, University of Bologna, ORCID: 0000-0002-0796-289X); Conti, Irene (irene.conti@unibo.it, University of Bologna, ORCID: 0000-0001-7982-4480); Muccioli, Luca (luca.muccioli@unibo.it, University of Bologna, ORCID: 0000-0001-9227-1059) Dataset License: This work is licensed under CC BY-SA 4.0 Publication Year: 2024 Project info: GEM (Getting the MOST out of the Sun) funded by International Foundation Big Data and Artificial Intelligence for Human Development (IFAB), under the funding program "IFAB Call for Projects 2022", https://www.ifabfoundation.org/project/gem-getting-the-most-out-of-the-sun/ Dataset contents the data set consist of: - 1 archive named "Potential_energy_data.zip" containing 5 files in excel format reporting the potential energy profiles along CNNC torsion and CNN inversion of azobenzene (AB) and four of its derivatives. "AB.xlsx" "F2-AB-F2.xlsx" "NH2-AB-NH2.xlsx" "NO2-AB-NO2.xlsx" "NO2-AB-NH2.xlsx" - 1 archive named "cartesian_structures.zip" containing cartesian geometries in xyz format. For each molecule there is a directory in the archive ("AB", "F2-AB-F2", "NO2-AB-NO2", "NH2-AB-NH2", "NO2-AB-NH2"), which is organised as follows: /: - CASPT2/: - "E.xyz" cartesian file containing the geometry of E isomer optimized at CASPT2 level - "Z.xyz" cartesian file containing the geometry of Z isomer optimized at CASPT2 level - INV_SCAN/: directory containing geometries of the inversion scan at CASPT2 level starting from Z isomer (from CNN = 130° to CNN = 179° with step 5°), each geometry file is named "BXXX.Opt.xyz" where XXX = CNN value. - TORS_SCAN/: directory containing geometries of the CNNC torsion scan at CASPT2 level (from CNNC = 20° to CNNC = 170° with variable step), each geometry file is named "TXXX.Opt.xyz" where XXX = CNNC value. - DFT/: - "E.xyz" cartesian file containing the geometry of E isomer optimized at DFT level - "Z.xyz" cartesian file containing the geometry of Z isomer optimized at DFT level - INV_SCAN/: directory containing geometries of the inversion scan at DFT level starting from Z isomer (from CNN = 130° to CNN = 179° with step 5°), each geometry file is named "BXXX.xyz" where XXX = CNN value. - TORS_SCAN/: directory containing geometries of the CNNC torsion scan at DFT level (from CNNC = 20° to CNNC = 170° with variable step), each geometry file is named "TXXX.xyz" where XXX = CNNC value. - BS-DFT:/ - TORS_SCAN/: directory containing geometries of the CNNC torsion scan at BS-DFT level (from CNNC = 75° to CNNC = 105° with step 5°), each geometry file is named "TXXX.xyz" where XXX = CNNC value. - 1 archive named "MOLCAS_ORBITALS.zip" containing CASSCF molecular orbitals from OpenMolcas software in h5 format - subfolder "E_isomers" contains the molecular orbitals for the optimized E isomers of the five molecules ("AB_E.rasscf.h5", "E_F2-AB-F2.rasscf.h5", "E_NO2-AB-NO2.rasscf.h5", "E_NH2-AB-NH2.rasscf.h5", "E_NO2-AB-NH2.rasscf.h5") - subfolder "Z_isomers" contains the molecular orbitals for the optimized Z isomers of the five molecules ("AB_Z.rasscf.h5", "Z_F2-AB-F2.rasscf.h5", "Z_NO2-AB-NO2.rasscf.h5", "Z_NH2-AB-NH2.rasscf.h5", "Z_NO2-AB-NH2.rasscf.h5") - 1 README file in .txt format "README_AB_MOST.txt" Dataset documentation: Abstract This dataset contains data related to the publication "Is DFT enough? Towards Accurate High-Throughput Computational Screening of Azobenzenes for Molecular Solar Thermal Applications" (DOI: 10.1039/D4ME00183D). It contains cartesian structures of azobenzene and five of its derivatives, molecular orbitals and the raw data for potential energy profiles (at XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1, BP86/def2-SVP and BS-BP86/def2-SVP levels of theory) along the thermal isomerization reaction paths of five molecules: azobenzene (AB), 2,2',6,6'-tetrafluoroazobenzene (F2-AB-F2), 4,4'-dinitroazobenzene (NO2-AB-NO2), 4,4'-diaminoazobenzene (NH2-AB-NH2) and 4,4'-nitroaminoazobenzene (NO2-AB-NH2, also known as disperse orange 3). Methodologies: DFT: The E and Z isomers of all the considered molecules were optimized at the BP86/def2-SVP level of theory, applying Grimme’s DFT-D3 dispersion correction [J. Chem. Phys. 132, 154104 (2010)]. Following, a relaxed surface scan at the same level of theory was performed along one of the two CNN angles starting from the Z between CNN = [130°; 180°], with steps of 5°. Only in the case of NO2-AB-NH2 (which shows two non-equivalent CNN coordinates due to asymmetric substitution of the AB backbone), both CNN bending angles were scanned. To obtain the torsional profile, two relaxed scans were performed between the Z isomer and CNNC = 40° (step 10°) and between the E isomer and 150° (step 10°), respectively. For more rotated geometries, instead, some constrained optimizations were performed to avoid the convergence to a linear transition state: we have made a linear interpolation of the CNN and NNC values at the scan points CNNC = 40° and CNNC = 150°, and we have performed optimizations keeping the CNN/NNC values fixed at their interpolated value for CNNC = 50°, 60°, 70°, 75°, 80°, 85°, 90°, 95°, 100°, 105°, 110°, 120°, 130° and 140°. All DFT calculations were performed with ORCA 4 [J. Chem. Phys. 152, 224108 (2020)]. BS-DFT: For all the investigated molecules, BS-DFT (BP86/def2-SVP, applying Grimme’s DFT-D3 dispersion correction [J. Chem. Phys. 132, 154104 (2010)]) was tested along the CNNC torsion scan between CNNC = 75° and CNNC = 105° (step 5°) to converge to a broken-symmetry solution (open-shell singlet) flipping the spin of one of the two nitrogen atoms. No further constraints were applied besides the CNNC value. All BS-DFT calculations were performed with ORCA 4 [J. Chem. Phys. 152, 224108 (2020)]. CASPT2 calculations: The E and Z isomers of all the considered molecules were optimized at the XMS-CASPT2 level of theory (using an imaginary shift of 0.2 a.u and without any IP/EA correction) employing the ANO-R basis set, in its R1 variant [J. Chem. Theory Comput. 2020, 16, 1, 278–294]. The perturbative correction was applied on CASSCF wavefunctions obtained with an active space of 10 electrons in 8 orbitals (3 pi, 3 pi* and two N lone pairs), including two electronic states (S0, S1) in the state-averaging procedure. We have performed fully-relaxed potential energy surface scans along CNN (inversion) and CNNC (torsion) coordinates starting from the Z isomer with the same scan steps employed for DFT scans (see previous section). CASPT2@DFT: The XMS-CASPT2/SA2-CASSCF(10,8)/ANO-R1 level of theory was also used to perform energy calculations along DFT scans. All CASPT2 calculations were performed with OpenMolcas 23.10 [J. Chem. Theory Comput. 2020, 16, 1, 278–294], using analytical CASPT2 gradients for geometry optimizations and scans. Content of the files - file "AB.xlsx" contains raw data relative to the azobenzene (AB) molecule. In particular, the potential energy profiles of for Z->E isomerization along the torsional (CNNC) and inversion (CNN) pathways at XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1, DFT and BS-DFT (BP86/def2-SVP) and with the combined CASTP2@DFT approach described Methodologies section of this file. - file "F2-AB-F2.xlsx" contains raw data relative to the 2,2',6,6'-tetrafluoroazobenzene (F2-AB-F2) molecule. In particular, the potential energy profiles of for Z->E isomerization along the torsional (CNNC) and inversion (CNN) pathways at XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1, DFT and BS-DFT (BP86/def2-SVP) and with the combined CASTP2@DFT approach described in the communication. - file "NH2-AB-NH2.xlsx" contains raw data relative to the 4,4'-diaminoazobenzene (NH2-AB-NH2) molecule. In particular, the potential energy profiles of for Z->E isomerization along the torsional (CNNC) and inversion (CNN) pathways at XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1, DFT and BS-DFT (BP86/def2-SVP) and with the combined CASTP2@DFT approach described in the communication. - file "NO2-AB-NO2.xlsx" contains raw data relative to the 4,4'-dinitroazobenzene (NO2-AB-NO2) molecule. In particular, the potential energy profiles of for Z->E isomerization along the torsional (CNNC) and inversion (CNN) pathways at XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1, DFT and BS-DFT (BP86/def2-SVP) and with the combined CASTP2@DFT approach described in the communication. - file "NO2-AB-NH2.xlsx" contains raw data relative to the 4,4'-nitroaminoazobenzene (NO2-AB-NH2) molecule. In particular, the potential energy profiles of for Z->E isomerization along the torsional (CNNC) and the two possible inversion (CNN-(NH2) and NNC-(NO2)) pathways at XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1, DFT and BS-DFT (BP86/def2-SVP) and with the combined CASTP2@DFT approach described in the communication. - archive "cartesian_structures.zip" contains cartesian geometries of the torsional (CNNC dihedral) and inversion (CNN angles) pathways for the five molecules at three different levels of theory: XMS-CASPT2/SA-2-CASSCF(10,8)/ANO-R1 (subfolders named "CASPT2"), DFT (BP86/def2-VDZP, subfolders named "DFT") and Broken Symmetry DFT (BP86/def2-VDZP, subfolders named "BS-DFT", present only for torsional scans). - archive "MOLCAS_ORBITALS.zip" contains the subfolders "E_isomers/" and "Z_isomers/" (for E and Z minima, respectively) containing the molecular orbitals (SA2-CASSCF(10,8)/ANO-R1) obtained from software OpenMolcas in h5 format. These can be used as input orbitals to reproduce our calculations with OpenMolcas or to visualize the molecular orbitals with visualization softwares (e.g. pegamoid, see https://pypi.org/project/Pegamoid/).