Ontent/5/Page three ofparameterizations [39,4450] and modifications [47,51,52] of EEM are still below improvement. Its accuracy is comparable towards the QM charge calculation method for which it was parameterized. Moreover, EEM is extremely quickly, as its computational complexity is (N 3 ), exactly where N is definitely the number of atoms in the molecule. Thus, in the present study, we concentrate on pKa prediction employing QSPR models which employ EEM charges. Specifically, we developed and evaluated QSPR models primarily based on EEM charges computed working with 18 EEM parameter sets. We also compared these QSPR models with corresponding QSPR models which employ QM charges computed by the identical charge calculation schemes employed for EEM parameterization.MethodsEEM parameter setsIn our study, we employed all EEM parameters published till now. Especially, we identified 18 unique EEM parameters sets, published in eight different articles [39,4450]. The parameters cover two QM theory levels (HF and B3LYP), two basis sets (STO3G and 61G) and six population analyses (MPA, NPA, Hirshfeld, MK, CHELPG, AIM). Unfortunately, only some combinations of QM theory levels, basis sets and population analyses are obtainable. However, more parameter sets have been published for some combinations (i.e., six parameter sets for HF/STO3G/MPA). All of the parameter sets contain parameters for C, O, N and H. Some sets include also parameters for S, P, halogens and metals. Most of the sets do not include parameters for C and N bonded by triple bond. Summary information and facts about all these parameter sets is offered in Table 1.EEM charge calculationa model as possible, with the risk that the accuracy of such a model may not be high. The second strategy is usually to develop far more models, each and every of them being dedicated to a particular class of compounds. Here we took the second approach, following a related methodology as in preceding studies [2124]. Especially, we focus on substituted phenols, since they may be one of the most prevalent test set molecules employed inside the evaluation of novel pKa prediction approaches [2124,5658].168892-66-8 structure Our information set contains the 3D structures of 74 distinct phenol molecules.SulfoxFluor uses This information set is of higher structural diversity and it covers molecules with pKa values from 0.PMID:33583316 38 to 11.1. The molecules had been obtained from the NCI Open Database Compounds [59] and their 3D structures had been generated by CORINA two.6 [60], with no any further geometry optimization. Our data set is often a subset of the phenol data set utilized in our earlier perform related to pKa prediction from QM atomic charges [24]. The subset is made up of phenols which contain only C, O, N and H, and none from the molecules contain triple bonds. This limitation is necessary, simply because the EEM parameters of all 18 studied EEM parameter sets are accessible only for such molecules (see Table 1). For each phenol molecule from our information set, we also prepared the structure with the dissociated type, where the hydrogen is missing in the phenolic OH group. This dissociated molecule was made by removing the hydrogen from the original structure without having subsequent geometry optimization. The list on the molecules, like their names, NCS numbers, CAS numbers and experimental pKa values, might be found inside the (Further file 1: Table S1a). The SDF files using the 3D structures of molecules and their dissociated forms are also inside the (Additional file 2: Molecules).Information set for carboxylic acidsThe EEM charges were calculated by the plan EEM SOLVER [53] working with each and every from the 18 EEM parameter sets.QM cha.