Data Loading...

Molecular Modeling of Geometries, Charge Distributions ... Flipbook PDF

Molecular Modeling of Geometries, Charge Distributions, and Binding Energies of Small, Druglike Molecules Containing Nit


129 Views
60 Downloads
FLIP PDF 1.01MB

DOWNLOAD FLIP

REPORT DMCA

1718

J. Chem. Theory Comput. 2008, 4, 1718–1732

Molecular Modeling of Geometries, Charge Distributions, and Binding Energies of Small, Druglike Molecules Containing Nitrogen Heterocycles and Exocyclic Amino Groups in the Gas Phase and in Aqueous Solution Brian R. White,† Carston R. Wagner,†,‡ Donald G. Truhlar,‡ and Elizabeth A. Amin*,† Department of Medicinal Chemistry, College of Pharmacy, and Department of Chemistry and Minnesota Supercomputing Institute, UniVersity of Minnesota, Minneapolis, Minnesota 55455 Received March 5, 2008

Abstract: We have tested a variety of approximate methods for modeling 30 systems containing mixtures of nitrogen heterocycles and exocyclic amines, each of which is studied with up to 31 methods in one or two phases (gaseous and aqueous). Fifteen of the systems are protonated, and fifteen are not. We consider a data set consisting of geometric parameters, partial atomic charges, and water binding energies for the methotrexate fragments 2-(aminomethyl)pyrazine and 2,4-diaminopyrimidine, as well as their cationic forms 1H-2-(aminomethyl)pyrazine and 1H-2,4-diaminopyrimidine. We first evaluated the suitability of several density functionals with the 6-31+G(d,p) basis set to serve as a benchmark by comparing calculated molecular geometries to results obtained from coupled-cluster [CCSD/6-31+G(d,p)] wave function theory (WFT). We found that the M05-2X density functional can be used to obtain reliable geometries for our data set. To accurately model partial charges in our molecules, we elected to use the well-validated charge model 4 (CM4). In the process of establishing benchmark values, we consider gas-phase coupled cluster and density functional theory (DFT) calculations, followed by aqueous-phase DFT calculations, where the effect of solvent is treated by the SM6 quantum mechanical implicit solvation model. The resulting benchmarks were used to test several widely available and economical semiempirical molecular orbital (SE-MO) methods and molecular mechanical (MM) force fields for their ability to accurately predict the partial charges, binding energies to a water molecule, and molecular geometries of representative fragments of methotrexate in the gaseous and aqueous phases, where effects of water were simulated by the SM5.4 and SM5.42 quantum mechanical implicit solvation models for SE-MO and explicit solvation was used for MM. In addition, we substituted CM4 charges into the MM force fields tested to observe the effect of improved charge assignment on geometric and energetic modeling. The most accurate MM force fields (with or without the CM4 charges substituted) were validated against gas-phase and aqueous-phase geometries and charge distributions of a larger set of 16 druglike ligands, both neutral and cationic. This process showed that the Merck Molecular Force Field (MMFF94) with or without CM4 charges substituted, is, on average, the most accurate force field for geometries of molecules containing nitrogen heterocycles and exocyclic amino groups, both protonated and unprotonated. This force field was then applied to the complete methotrexate molecule, in an effort to systematically explore its accuracy for trends in geometries and charge distributions. The most accurate force fields for the binding energies of nitrogen heterocycles to a water molecule are OPLS2005 and AMBER.

1. Introduction Continuing advances in molecular modeling and computational chemistry have greatly facilitated the structure-based * To whom correspondence should be addressed. E-mail: [email protected]. † Department of Medicinal Chemistry, College of Pharmacy. ‡ Department of Chemistry and Minnesota Supercomputing Institute.

design of small-molecule inhibitors of proteins.1-15 Although molecular mechanics (MM) force fields16-20 can model protein structure, they often lack parameters that accurately represent the heteroatomic groups present in pharmaceuticals.21-23 Density functional theory24 (DFT) and wave function theory (WFT)25 do not require new parameters for each type of atom; however, current technology still limits the calculations to smaller molecules and exploratory studies

10.1021/ct8000766 CCC: $40.75 ! 2008 American Chemical Society Published on Web 09/18/2008

Druglike Molecules Containing Nitrogen Heterocycles

J. Chem. Theory Comput., Vol. 4, No. 10, 2008 1719

Figure 2. Structure of methotrexate.

Figure 1. DHFR2MTX2 chemically induced dimer.

on larger systems. Two viable approaches for simulating a protein bound to a druglike inhibitor are to obtain MM parameters for force fields that yield accurate molecular geometries and partial charges or to find a suitable level of combined QM/MM theory26-28 in which a critical or active region of the system is treated by quantum mechanics (QM) and the surrounding areas by MM. An economical QM level for such calculations would be semiempirical molecular orbital29-34 (SE-MO) theory. For small enough QM regions or short simulations, one can also use more reliable QM methods such as DFT. Reliable WFT calculations are, however, affordable only for the smallest systems. A recent application of molecular modeling is the prediction of mutation effects on protein-protein interactions.35-42 Protein multimer stability can be modified through the introduction of interfacial residue mutations, and it would be valuable to be able to predict the relative change in stability of a mutated protein multimer compared to the wildtype species. Such calculations would aid in understanding the functional evolution of proteins, as well as the development and control of stable, self-assembled protein structures, with applications ranging from nanoscale multiprotein constructs to drug delivery. With the advent of chemically induced dimerization (CID), our laboratory has demonstrated the ability to create self-assembled E. coli dihydrofolate reductase (DHFR) dimers from naturally existing DHFR monomers using a bivalent methotrexate dimerizer (MTX2, complex-DHFR2MTX2) (Figure 1).43 While the nature and effects of linker length have been examined previously, we are still in the early stages of in vitro and in silico observation of interfacial mutation effects on the dimer. We have found that the introduction of complementary interfacial mutations putatively leads to a stabilized DHFR heterodimer, which allows for a level of control over the assembly of such constructs. One complicating issue present in our system, as well as other biological systems, is the protonation state of the ligand in solution and in complex with the protein. While the DHFR inhibitor MTX is unprotonated in solution, it is protonated on N1 (Figure 2) when bound to DHFR.44,45 This raises the question of whether it is appropriate to use a single set of MM parameters to describe MTX both in solution and bound to the enzyme. To assist the in silico prediction of mutation effects on dimer stability, we have undertaken a study to develop a set of MM parameters that can accurately model the DHFR2MTX2 complex. To accomplish this goal, we will first try to establish an accurate method to model the single

substrate, MTX, and then try to extend this method to the DHFR2MTX2 complex. During this process, we have tested a large variety of methods on a set of druglike molecules containing nitrogen heterocycles and exocyclic amino groups, and the results of these tests are presented in the present article because they should be of general interest for a variety of potential applications. A critical issue in simulating systems with nitrogen heterocycles is modeling the charge distributions. Because partial charge distributions are not experimental observables, we will rely on theory to establish reasonable values. For this purpose, we first require accurate geometries, and we begin by establishing benchmark values using high-level WFT and DFT calculations on the MTX fragments 2-(aminomethyl)pyrazine (2-AMP), 1H-2-(aminomethyl)pyrazine (1H-2-AMP), 2,4-diaminopyrimidine (2,4-DAP), and 1H2,4-diaminopyrimidine (1H-2,4-DAP). These fragments were chosen because of the role of the pteridine moiety in MTX binding to the DHFR active site. We then use DFT with class IV charges46 to establish benchmark partial atomic charges. Coupled cluster theory47,48 with single and double excitations (CCSD) and the M05-2X49 density functional with the 6-31+G(d,p)50 basis set are used for geometries, and charge model 451 (CM4) is used for partial atomic charges. The performance of widely available SE-MO and MM parametrizations is then surveyed for these four fragments to find the parametrized model that most accurately predicts the geometries, binding energies to water, and charge distribution of the unprotonated and protonated states of 2-AMP and 2,4-DAP. In addition, we consider the selected MM methods when CM4 charges are substituted for the force field’s default charges in an effort to observe if increased accuracy in partial charge distribution leads to increased performance in geometric and energetic modeling. The most accurate methods are then used to calculate partial charges and geometries for a series of pharmacophorically similar molecules containing nitrogen heterocycles and exocyclic amines, and these results are compared to DFT and CM4 benchmarks to explore the validity of our chosen MM parameters more broadly.

2. Methods and Software 2.1. Computational Methods. For geometries, we consider three categories of QM theory plus MM. The QM categories are WFT, DFT, and SE-MO. For partial charges, we consider Mulliken population analysis,52 MM53-57 and the CM1,46 CM2,58 CM3,59 and CM451 charge models. DFT calculations in the aqueous phase use the implicit solvation model SM6,51 while solvation in SE-MO methods30-32 was included by using the implicit SM5.460 or SM5.4261 solvation

1720 J. Chem. Theory Comput., Vol. 4, No. 10, 2008

models. For aqueous-phase MM calculations, we employed explicit solvation using the respective programs’ soak algorithms in conjunction with periodic boundary conditions (minimum cell size of 15 × 15 × 15 Å) to eliminate solvent-vacuum interfaces. WFT calculations were carried out by CCSD with the 6-31+G(d,p)50 basis set. These calculations were carried out with the Gaussian03 computer program (Gaussian, Inc.).62 The CCSD method was chosen over the popular Møller-Plesset second-order perturbation theory63 as CCSD (or the closely related QCISD64) has been shown to yield more accurate geometries.65 The DFT methods examined are B3LYP,66,67 mPW1PW68,69 (which is also called mPW1PW91, mPW0, and MPW25), MPWB1K,67,69,70 MPW1KCIS,71 M06-L,72 and M05-2X49 with the 6-31+G(d,p)50 basis set. Calculations were performed using a locally modified version of the Gaussian03 program incorporating the MN-GSM 6.073 and MN-GFM 2.0.174 solvation and DFT modules. To select a density functional to generate benchmark values, gas-phase calculations were carried out, and the resulting geometries were evaluated relative to CCSD. The best functional was subsequently used to perform calculations in the aqueous phase using the SM651 solvation model for implicit solvation. We have tested charge model 4 (CM4) partial charge assignments based on gas-phase and SM6 DFT calculations. CM4 charges, the fourth generation of class IV46 charges, have a distinct advantage over the class II52,75-77 and III78-80 charges used in Gaussian03. Whereas the reliability of class III charges depends on the wave function and basis set used, class IV charges represent an extrapolation to full configuration interaction with a complete basis set.46,51 Furthermore, class III charges are unstable with respect to buried charges,81-84 while class IV charges provide a reliable method for obtaining buried charges. CM4 charges, in particular, have been parametrized against a large training set (398 molecules) and are well suited for modeling aliphatic functional groups, which makes them more suitable for modeling hydrophobic effects, a primary factor in proteinprotein interactions. SE-MO methods examined in the current study include AM1,30 PM3,31,32 and PDDG/PM3.85 Calculations were performed for AM1 and PM3 using AMSOL 7.186 (a derivative of AMPAC 2.1) with Mulliken, CM1, CM2, or CM3 charges obtained from gas-phase calculations. For aqueous-phase AM1 and PM3 calculations, AMSOL 7.1 was used to obtain Mulliken, CM1, or CM2 charges within the SM5.4 (for Mulliken and CM1) or the SM5.42 (for CM2) solvation models. GAMESSPLUS87 was used to obtain a second set of CM3 charges in the gas phase in an effort to test consistency in charge assignment across software. The notation used in this article for AM1 calculations with differing partial charge assignments is AM1, AM1-CM1, AM1-CM2, and AM1-CM3 for AM1 calculations with Mulliken, CM1, CM2, and CM3 charges, respectively. The notation for the PM3 calculations is analogous to that for AM1. It is important to note that because they are post-self consistent field (SCF) analysis tools, CMx or Mulliken charges of gas-phase wave functions do not alter an

White et al.

optimized molecule’s geometry. Slight differences may be attributed to variances in the convergences of the SCF and geometry optimizations. In solution, each SMx model uses a particular choice of charge model; this choice, along with all the other SMx parameters, does affect the molecules’ geometry. PDDG/PM3 gas-phase optimizations were performed using MOPAC 5.011mn,88 and Mulliken partial atomic charges were obtained. In addition, PM3-Mulliken charge analyses were carried out with Gaussian03 and MOPAC 5.011mn as part of a comparison between partial atomic charges assigned to optimized geometries calculated by PM3 and PDDG/PM3. The MM force fields employed are AMBER,55 AMBER*, CVFF,53 CFF91,89 MMFF94,56 OPLS2005,57 and Tripos.54 The AMBER force field employed is the ff03 version,90 in conjunction with the general atom force field91 (GAFF) commonly utilized for small organic systems. The AMBER* force field contains additional atomic parameters as implemented in MacroModel. In most cases, we elected to use nonrigid water with each force field’s default parameters for water molecules. However, the AMBER* force field was locally modified to use OPLS2005 nonrigid water (in OPLS2005, the nonrigid water has the same Lennard-Jones parameters as the rigid TIP3P water model), and the AMBER force field, via the SOLVATEOCT command, utilized the rigid TIP3P water model. Stretch, bend, Coulombic, and Lennard-Jones parameters for the water models used can be found in each of the force field’s descriptions (see references above), except for AMBER*, for which the modified water parameters are described by the OPLS2005 reference. In addition to the standard force fields, we also employ local modifications of the force fields that substitute CM4 charges for their default partial charges. This combination of a force field and CM4 charges is denoted as X-CM4, where X is the name of the original force field. In contrast to gas-phase SE-MO calculations, when CMx charges are used with MM, they can and do alter the optimized geometry. Note that when we use CM4 charges with MM calculations, we use gas-phase M05-2X/6-31+G(d,p)/CM4 charges calculated at gas-phase M05-2X/6-31+G(d,p) geometries for gas-phase MM calculations, and we use SM6/M05-2X/ 6-31+G(d,p)/CM4 aqueous-phase charges calculated at aqueous-phase SM6/M05-2X/6-31+G(d,p) geometries for aqueous-phase MM calculations. General MM optimization conditions consisted of at least 1000 steps of conjugate gradient minimization with an energy gradient cutoff of 0.01 kcal mol-1 Å-1. In cases where water was included, the entire system was minimized, and although the stringent conditions for termination we have established were not typically reached, the system energy fluctuated only slightly around a stable potential energy. To test the effect of water position around the small molecules on geometries, six random orientations of 2-AMP and 2,4-DAP within the water box were used in minimizations with OPLS2005-CM4 and MMFF94-CM4. Because this process yielded only a slight standard deviation in geometries (on the order of less than 0.006 Å in bond length and 0.7 degrees in bond angle), we used only the default water orientation for all other aqueous-phase MM optimizations. We note, however, the

J. Chem. Theory Comput., Vol. 4, No. 10, 2008 1721

Druglike Molecules Containing Nitrogen Heterocycles

2-AMP and 2,4-DAP bond lengths and angles used for OPLS2005-CM4 and MMFF94-CM4 assessment represent the average of these six results. In binding energy studies, water molecules were placed at positions on 2-AMP, 1H2-AMP, 2,4-DAP, and 1H-2,4-DAP, where the DHFR-MTX crystal structure denoted that hydrogen bonding takes place between the ligand and the enzyme.45 The small molecule and water were optimized together, and the binding energy of the complex computed by subtracting the energies of the individual optimized molecules. While testing the MM force fields, it was found that the CVFF and CFF91 force fields do not properly assign partial atomic charges to the 1H-2-AMP or 1H-2,4-DAP cations because the total charge assigned to the molecule is not +1.0. We therefore calculated the gas-phase Hartree-Fock molecular electrostatic potential with the 6-31+G(d,p) basis set for the neutral and cationic species and obtained ChelpG80 electrostatic-potential-filling charges to substitute into the force fields. Geometry optimizations with the ChelpG partial charges were then carried out in both the gaseous and aqueous phases, and the resulting geometries, denoted CVFFHF and CFF91-HF, were used in our evaluation. 2.2. Platforms, Software, and Molecules. Quantum mechanical calculations (WFT, DFT, and SE-MO) were performed on an IBM Power4 (p690 and p655) computer system running under the AIX operating system and an SGI Altix cluster running under the Linux operating system. Molecular mechanics calculations were performed on a Silicon Graphics O2 workstation running under the IRIX 6.5 operating system. Molecules were constructed for quantal calculations using the GaussView 3.0 (Gaussian, Inc.) visualization program, and the generated Z-matrices were converted to Cartesian coordinates where appropriate. Molecules for MM calculations were constructed using InsightII 2005 (Accelrys, Inc.) for the CFF91 and CVFF force fields, SYBYL 7.3 (Tripos, Inc.) for the MMFF94, AMBER and Tripos force fields, and Maestro 7.5 (Schrodinger, Inc.) for the MMFF94, AMBER*, and OPLS2005 force fields (utilizing the MacroModel and Impact applications, respectively). These programs were also used to set up and run the MM minimizations except for AMBER minimizations, for which SYBYL was used only to generate molecular coordinates in Protein Data Bank or Mol2 formats, and the AMBER 992 suite was used for the minimizations. The small molecules included in the present study are illustrated in Figure 3. Each molecule was modeled in both its neutral and protonated form. When an exocyclic amine is present, the proton was added there. Otherwise, the proton was added on a heterocyclic amine.

3. Results and Discussion 3.1. Error Analysis. To rank the methods we chose to test, we calculated the unsigned residual between a calculated value with method m and the corresponding benchmark value Rix,y,m ) |xicalcd,m(y) - xibenchmark(y)|

(1)

where y is the phase (gas phase or aqueous phase) and xi(y) signifies case i of molecular property x in phase y, for example, when x is bond length (r), x1(y) is the first bond

Figure 3. Structures of small molecules used in current study.

length. Alternatively, x could stand for partial charge (q), bond angle (θ), or binding energy (Eb). The overall error in a particular molecular property x for a particular method m and phase y is quantified by the mean unsigned error (MUE) nx,y

MUEx,y,m )

" Rix,y,m i)1

nx,y

(2)

where nx,y is the number of combinations of xi and y for which Rix,y is evaluated. We also calculate the average mean unsigned error (AMUE) for each property across N methods N

AMUE )

" MUEx,y,m

m)1

N

(3)

Division of this value by a method’s MUE yields the reduced (unitless) mean unsigned error (RMUE) for the method for partial charge, bond length, or bond angle. RMUEx,y,m )

MUEx,y,m AMUEx,y

(4)

The RMUE is a measure of each method’s performance relative to the mean of the others for calculating a particular molecular property. A value of 1.0 indicates that the method is average. Lower values indicate better methods, while higher values indicate worse methods. The reason for introduction of these unitless reduced quantities is so that we can combine errors for r and θ (which have different

1722 J. Chem. Theory Comput., Vol. 4, No. 10, 2008

White et al.

Table 1. Mean Unsigned Error (Å and deg) for Gas-Phase Bond Lengths and Angles Between Those Calculated with CCSD and Each Functional 2-AMP method B3LYP MPW1PW mPWB1K mPW1KCIS M06-L M05-2X AMUEa a

1H-2-AMP

2,4-DAP

1H-2,4-DAP

bond bond bond bond bond bond bond bond length angle length angle length angle length angle 0.003 0.005 0.011 0.005 0.006 0.004 0.006

0.73 0.44 0.52 0.78 0.52 0.28 0.54

0.003 0.005 0.010 0.005 0.005 0.004 0.005

0.37 0.49 0.42 0.50 0.33 0.26 0.40

0.003 0.005 0.012 0.004 0.004 0.005 0.006

0.82 0.87 0.99 0.90 0.54 0.94 0.84

0.003 0.004 0.010 0.004 0.004 0.003 0.005

0.27 0.26 0.26 0.28 0.22 0.22 0.25

Mean of entire column.

units) to make an overall assessment for combined geometric performance. We define a reduced deviance (Dy,m) of a SE-MO or MM method from the average performance in either the gas or aqueous phase by averaging the RMUE for r and θ in each method for both 2-AMP and 1H-2-AMP Dy,m )

1 2-AMP 2-AMP 1H-2-AMP + RMUEθ,y,m + RMUEr,y,m + (RMUEr,y,m 4 1H-2-AMP RMUEθ,y,m )(5)

The reduced deviance for 2,4-DAP and 1H-2,4-DAP is averaged with the reduced deviance for 2-AMP and 1H-2AMP to yield an overall performance. Reduced deviance in partial charge (q) assignment is calculated similarly. Reduced deviance for the validation of the most accurate methods is also calculated similarly, with all molecules taken into account. 3.2. Establishing Geometry and Partial Charge Benchmark Sets. The MUE of the gas-phase geometries calculated by DFT with respect to the CCSD-calculated geometries is given in Table 1. All DFT methods perform well, with mean unsigned errors within ∼0.01 Å for bond length and less than one degree for bond angles. On the basis of its high degree of accuracy for both bond length and angle, we chose to proceed with the M05-2X level of theory to generate our geometric benchmark set. We selected the well-validated51,59,93-96 CM4 charge model to obtain benchmark partial atomic charges. To examine whether the CM4 charge model would assign partial charge similarly for each functional used, we compared the gas-phase CCSD partial charges generated by Mulliken population analysis (a class II46 charge method) to CM4 partial charges assigned by each density functional tested (note that the CM4 charges are probably more accurate). The results are summarized in Figure 4. In this figure and in this whole article, charges are given in atomic units, in which the charge on a bare proton is 1.0. Overall, the mean unsigned deviations between the CCSD Mulliken analysis and the CM4-assigned partial charges vary by e0.01 charge units regardless of the functional or phase tested. We therefore applied the geometrically accurate M05-2X functional in conjunction with M05-2X/CM4 partial charges to obtain our benchmark set. 3.3. Exploration of CVFF and CFF91 Atom Typing and Charge Distribution. As mentioned in section 2.2, a deficiency in the CVFF and CFF91 force fields is that they

Figure 4. Mean unsigned deviation of DFT/CM4 charges relative to CCSD/Mulliken charges for (a) 2-AMP and it is cation (white, 2-AMP; black, 1H-2-AMP) and (b) DAP and it is cation (white, DAP; black, 1H-DAP).

do not assign a total charge of +1.0 to the pyrazinium or pyridinium cations. Upon further exploration using 2-AMP, the default atom type assigned to N1 in 2-AMP by both force fields is “np” (an sp2 nitrogen in a 5- or 6-membered ring), and the partial charge on this unprotonated nitrogen is -0.22 in CVFF and -0.48 in CFF91. Upon protonation and automated reassignment of atom types by InsightII 2005, the CVFF nitrogen atom type remains unchanged, and the CFF91 atom type changes to “nh+” (a protonated nitrogen in a 6-membered ring). The protons added have partial charges of +0.28 (CVFF) and +0.33 (CFF91), and the partial charges on the nitrogens change to -0.50 and -0.81 charge units, respectively. The partial charge on all other atoms in the molecule are unaffected by the addition of the proton. This charge balancing yields a total charge on the molecule of 0, which is incorrect for a cation. Some of the partial atomic charges in the CVFF and CFF91 force fields are derived from fits to the Hartree-Fock molecular electrostatic potential,97-99 and we took this as a cue for how to correct the problem in a way consistent with these force fields. In particular, we carried out single-point, gas-phase Hartree-Fock (6-31G(d,p) basis set) calculations on the optimized 2-AMP and 1H-2-AMP geometries and obtained partial atomic charges by electrostatic potential fitting with the ChelpG80 algorithm. The resulting partial atomic charges were used in the CVFF and CFF91 force fields for geometry optimizations in both the gas and aqueous phases. We used the gas-phase partial charges in both phases because the original CVFF and CFF91 partial charges are based on gas-phase calculations (we note that all MM force fields considered in this article use partial charges that do

Druglike Molecules Containing Nitrogen Heterocycles

J. Chem. Theory Comput., Vol. 4, No. 10, 2008 1723

Figure 5. MUE in (a) partial charge, (b) bond length, and (c) bond angle for selected SE-MO and MM methods in the gas phase relative to M05-2X/CM4 (white, 2-AMP; black, 1H-2AMP). AMS denotes AMSOL, and GMSP denotes GAMESSPLUS.

Figure 6. MUE in (a) partial charge, (b) bond length, and (c) bond angle for selected SE-MO and MM methods in the gas phase relative to M05-2X/CM4 (white, 2,4-DAP; black, 1H2,4-DAP).

not depend on the phase). To denote the new Hartree-Fock partial charges in the force fields, we name the force fields that use these newly assigned charges as CVFF-HF and CFF91-HF. 3.4. Evaluation of SE-MO and MM Calculations. We tested available SE-MO and MM parameter sets in an effort to select the most accurate method for modeling our small molecules both in the gas phase and in solvent. In addition, we substituted CM4 charges into the MM force fields tested to observe effects on geometric accuracy. A summary of the results of these tests is given in Figures 5 and 6 (gas phase) and 7 and 8 (aqueous phase). In the gas phase, the AM1 and PM3 SE-MO methods utilizing the CM1, CM2, and CM3 charge models, as well as the CVFF-HF, CFF91-HF, AMBER*, and MMFF94 force fields all predict the partial charges of the atoms in both sets of molecules to within 0.15

(Figures 5a and 6a). In most of the methods tested, the partial charge assignment becomes less accurate in the cationic species: the most notable examples, for which the average error increases by a factor of 2, are the OPLS2005 force field for 1H-2-AMP, the PM3-CM1, -CM2, and -CM3 methods for 1H-2-AMP, and the SE-MO methods utilizing the CM1, CM2, and CM3 charge models for 1H-2,4-DAP (with the exception of PM3-CM3). In contrast, the MMFF94 force field becomes more accurate for 1H-2-AMP, and the AMBER force field becomes more almost 2-fold more accurate for 1H-2,4-DAP, although AMBER’s overall charge assignment is somewhat inaccurate (the mean MUE for the neutral and cationic species together is 0.20). The force fields with CM4 charges substituted are not included in the partial charge analysis since their mean unsigned error in partial charge is always zero.

1724 J. Chem. Theory Comput., Vol. 4, No. 10, 2008

Figure 7. MUE in (a) partial charge, (b) bond length, and (c) bond angle for selected SE-MO and MM methods in solution relative to M05-2X/CM4 (white, 2-AMP; black, 1H-2-AMP).

The gas-phase partial charges assigned by the PDDG/PM3 method have a mean unsigned error of 0.22 and 0.28 for the neutral species and 0.26 and 0.32 for the cationic species of 2-AMP and 2,4-DAP, respectively. Table 2 summarizes our comparison of the PDDG/PM3 charges to Mulliken population analysis of the PM3-optimized structure of 2-AMP as calculated by both Gaussian03 and MOPAC 5.011mn (the numbering system is given in Figure S1 of the Supporting Information); this comparison verifies the consistency of the two programs (as a check). Figures 5 and 6 show that PDDG/ PM3 is less accurate than PM3 for gas-phase partial charges, bond lengths, and bond angles for both 2-AMP and 1H-2AMP and for gas-phase bond angles for both 2,4-DAP and 1H-2,4-DAP. Because the PDDG reparameterization of PM3 deteriorated the performance for these molecules, PDDG/ PM3 was not considered in the aqueous-phase calculations that follow.

White et al.

Figure 8. MUE in (a) partial charge, (b) bond length, and (c) bond angle for selected SE-MO and MM methods in solution relative to M05-2X/CM4 (white, 2,4-DAP; black, 1H-2,4-DAP). Table 2. Gas-Phase Partial Charges of 2-AMP Calculated by PM3 (Mulliken Population Analysis) and PDDG/PM3 atom

PM3(G03)

PM3(MOPAC)

PDDG/PM3(MOPAC)

C1 C2 N3 C4 C5 N6 H7 H8 H9 C10 H11 H12 N13 H14 H15

-0.1310 -0.1067 -0.0382 -0.1095 -0.1085 -0.0239 0.1344 0.1319 0.1316 -0.0569 0.0677 0.0830 -0.0282 0.0301 0.0242

-0.1309 -0.1067 -0.0382 -0.1094 -0.1085 -0.0239 0.1344 0.1319 0.1316 -0.0569 0.0677 0.0830 -0.0282 0.0301 0.0242

-0.1316 -0.1647 -0.0274 -0.1664 -0.1581 -0.0211 0.1814 0.1798 0.1789 -0.1491 0.1151 0.1296 -0.0928 0.0665 0.0598

For 2-AMP and 1H-2-AMP, gas-phase bond lengths (Figure 5b) are predicted to within ∼0.01 Å only by the

Druglike Molecules Containing Nitrogen Heterocycles

CFF91-HF, AMBER*, MMFF94, and OPLS2005 force fields. All methods except OPLS2005 become less accurate when modeling the cationic species. Notably, the AMBER force field becomes 2-fold less accurate when modeling the cation, and it clearly benefits from revised charges because the AMBER-CM4 force field yields a 2-fold performance improvement when modeling bond lengths. On average, force fields with CM4 charges substituted tend to perform slightly better than those with default charges, except in the case of TRIPOS-CM4. Gas-phase bond angles (Figure 5c) are predicted to within ∼2.0 degrees by all the SE-MO methods, while the majority of the MM force fields are ∼0.5-1° less accurate. Exceptions include the MMFF94 and AMBER force fields, which predict them to within ∼1.5°. All methods are slightly more inaccurate dealing with the cation. As with bond lengths, when CM4 charges are substituted, bond angle predictivity is increased slightly in most methods. OPLS2005-CM4 is notable in that it has a 2-fold improvement in performance over OPLS2005 when modeling the neutral species and a smaller, yet significant (∼0.5°), increase in accuracy for the cationic species. For 2,4-DAP and 1H-2,4-DAP, bond lengths (Figure 6b) are predicted to an accuracy of ∼0.025 Å by all methods except CVFF-HF, CFF91-HF, and TRIPOS. The MMFF94 and OPLS2005 force fields both display accuracy less than 0.01 Å for 2,4-DAP, while AMBER and OPLS2005 are accurate to better than 0.015 Å for 1H-2,4-DAP. MMFF94 is 2-fold less accurate for the 1H-2,4-DAP cation. With regard to 2,4-DAP and 1H-2,4-DAP gas-phase bond angles, the best performing methods include the AMBER*, AMBER, MMFF94, and OPLS2005 force fields, predicting bond angle to within 1.9°, 3.1°, 2.9°, and 1.1° for both 2,4-DAP and 1H-2,4-DAP, respectively. With respect to force fields with CM4 charges substituted, the clearest example of a force field benefiting from revised charges is the CVFF-HF and CFF91-HF force fields which, as previously discussed, have major partial charge assignment problems. When CM4 charges are substituted into these force fields, both methods show a dramatic improvement in performance for modeling bond lengths, with CVFF-CM4 yielding a 4-fold more accurate bond length prediction overall, and CFF91-CM4 yielding the same improvement in accuracy when modeling the neutral species. The TRIPOS force field also benefits, with TRIPOS-CM4 displaying a 2-fold improvement in accuracy for both molecules. All methods except CVFF-HF, CFF91-HF, and TRIPOS predict bond angles to within ∼4.0 degrees for both species (Figure 6c). Both CVFF-HF and CFF91-HF presented a challenge when attempting to minimize structure with respect to the fact that 1H-2,4-DAP could not be assigned proper atomic partial charges. As described earlier, we substituted the default charges with ChelpG charges derived from HF calculations on 2,4-DAP and 1H-2,4-DAP. When these charges were substituted into the force field, the charges on the protons on the exocyclic amines were made largely more positive than the default charges. This effect, in conjunction with the largely negatively charged heterocyclic amines, caused the minimization to distort the sp3 structure of the exocyclic amines and pull the protons toward the pyrimidine

J. Chem. Theory Comput., Vol. 4, No. 10, 2008 1725

ring. The result is a large error in the assignment of bond angle. When the minimization is performed with the default charges, no distortion of bond angle takes place; however, these charges correspond to a nonphysical total molecular charge. As observed in tests already described, CM4 charge substitution greatly improves geometric modeling performance. The CVFF-CM4 and CFF91-CM4 force fields essentially eliminate the problems seen with these force fields, causing both to model bond angles with accuracy on par with the rest of the methods used. In addition, the AMBER-CM4, MMFF94-CM4, and TRIPOS-CM4 force fields all yield accuracies at least 2-fold greater than their counterparts. For 2-AMP and 1H-2-AMP in the aqueous phase, the AM1-CM2, PM3-CM1, and PM3-CM2 methods, as well as the CVFF-HF, CFF91-HF, AMBER*, MMFF94, and TRIPOS force fields retain a high degree of accuracy for partial charge prediction (MUE e 0.13, Figure 7a). Among these methods, the MMFF94 force field is again the only method to become more accurate when modeling the protonated species. On the other hand, the OPLS2005 force field becomes 2-fold less accurate when predicting the charge of the cation. Bond lengths (Figure 7b) are calculated with an error similar to that of the gas phase, with the CFF91-HF, AMBER*, MMFF94, and OPLS2005 force fields maintaining a mean unsigned error of ∼0.01 Å. Interestingly, while it becomes much less accurate when assigning partial charge to the ionic species, the OPLS2005 force field becomes more accurate in bond length prediction. As with 2-AMP and 1H2-AMP in the gas phase, CM4 partial charge substitution generally produces a modest increase in bond length accuracy. A notable exception is the CFF91-CM4 force field, with an MUE