improved comfa modeling by optimization of settings170499/fulltext01.pdf · improved comfa modeling...

82
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2007 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 61 Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease SHANE PETERSON ISSN 1651-6192 ISBN 978-91-554-6932-0 urn:nbn:se:uu:diva-8140

Upload: tranliem

Post on 24-May-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

ACTA

UNIVERSITATIS

UPSALIENSIS

UPPSALA

2007

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Pharmacy 61

Improved CoMFA Modeling byOptimization of Settings

Toward the Design of Inhibitors of the HCV NS3Protease

SHANE PETERSON

ISSN 1651-6192ISBN 978-91-554-6932-0urn:nbn:se:uu:diva-8140

Page 2: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

���������� �������� �� ������ �������� � �� �������� ������� � ���� ������������� �� ������� ������� �������� �!� �""# �� "$%�& '� �(� ������ ' ���� ')(����(� *������� ' )(������+, -(� �������� .��� �� ������� � /����(,

��������

)������ , �""#, 0������ ���1 ������ �� 2�����3��� ' ������, -.��� �(������ ' 0(������ ' �(� ��4 5 � )������, 1��� ����������� ���������, ���������� � ���� ����� � � ������� ���� ������� �� �� ������� � ������� 6�, !" ��, ������, 0 �5 $#!7$�7&&876$��7",

-(� (�������� � ����� *��4+� .��( � ����� ��������� ' ���(�� �9� �� ��� �(� ��������� �������� ����, 1�� �(� ��� ������� ��4 ������� �� �(� 5 � �������� '�.(��( ������� ���� ��������� (��� ������ ������� ������, 0 �(�� .�:� �������������(�� (��� ��� �������� �� ������� � �(� ����� ' �(������ ' �(� ��4 5 ��������,���������� �������� '���� ������� *���1+ ������ �� �������� ��:�� ��� �(�

�. ��� ���������� ��� ���� � �(�� .�:, ���1 �� �������� �(� ��� .����� ������7; 1< ���(�, ���(���� '� ������� ��� ���������� ���'����� �� ���������6��" �������� ' 7��'���� ���������� (�� ��� ��������, -(�� ���(���� .�������� $ ���� ���� '� ������ ������� �� '�� � ��������� ������ ����� ' �(�������������� ��������, 4������� .�� ���'���� ���� =�� ��

������ ������ ��������

�����3���,�������� ��:�� .�� ���� � ������ 1<� � �. ������ ' �(������ ' �(� ��4 5 �

�������, 0 �(� '���� ������� ���������� ����������� �������� �(�� ���������� ' )������

.��( �(��������� .��� ������ �����, ��:�� ��������� �(�� �(���������7������(������ ��� ����������� � �. �������� ��������� ��� �(�� �(� ������� ��� '��������(��������� ���� ��� ������ � .��� ����� '��� �������� �(� ��� � �����, 0 �(����� ������� >7��� ����� .��� ������� �� ?7��� ���� �����������, 1��(��( >7������� ���������� ��� ������ �(� ������� ���������� ' �������7��:� ������� �(�� ������(.�� �(�� >7��� ���� ���������� �������� � ������� �����, -(� )

������ .�� �����

�������� � ���������� �� �(� ����� (��(���(��� �(� �������� ' ��������� � �(����� (��,������� ��:�� .�� ���� � ������ �(� �'������ �� ������� �������� '� �

���1 ����, -(�� ���1 ����� ������� ���� ��'���� �������� (�� =� @ ",�� �� ������

@",&6, 1�������� ' �(� �����3��� ���(���� ������� � ��� ���������� ���� .��( =�

@ ",8! �� ������

@ ",6!,

� ������ ���1� ��7; 1<� ���� ��������� ��:��� 1<� (�������� � ������ ��4�5 � ������� �(�����

���� � � ���� � ���� �� � � ������� �� ������ ������ ������ ������ �� ������!" #$%� ������� ���� ������ �&'$#()* �������� �� � �

A (�� )����� �""#

0 5 �6&�76�$�0 �5 $#!7$�7&&876$��7"��%�%��%��%����7!�8" *(���%BB��,:�,��B������C��@��%�%��%��%����7!�8"+

Page 3: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

To Cecilia and Clara

Page 4: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease
Page 5: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

List of Papers

This thesis is based on the following papers:

I. Peterson, S. D.; Schaal, W.; Karlén, A. Improved CoMFA Model-ing by Optimization of Settings. J. Chem. Inf. Model. 2006, 46, 355–364.

II. Peterson, S. D.; Schaal, W.; Lundstedt, T.; Karlén, A. A Refined

Approach to Settings Optimization. Manuscript, 2007.

III. Örtqvist, P.; Peterson, S. D.; Åkerblom, E.; Gossas, T.; Sabnis, Y., A.; Fransson, R.; Lindeberg, G.; Danielson, U. H.; Karlén, A.; Sandström, A. Phenylglycine as a novel P2 scaffold in hepatitis C virus NS3 protease inhibitors. Bioorg. Med. Chem. 2007, 15, 1448–1474.

IV. Nurbo, J.; Peterson, S. D.; Dahl, G.; Danielson, U. H.; Karlén, A.;

Sandström, A.; �-Amino Acid Substitutions and CoMFA Model-ing of Hepatitis C Virus NS3 Protease Inhibitors. Manuscript, 2007.

Reprints made with permission from the publishers.

Page 6: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease
Page 7: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

Contents

1. Introduction...............................................................................................11 1.1 Molecular modeling in drug discovery ..............................................11 1.2 The hepatitis C virus (HCV) ..............................................................12

2. Computational Methods............................................................................14 2.1 Comparative Molecular Field Analysis..............................................14

2.1.1 Brief history of (Q)SAR .............................................................14 2.1.2 CoMFA methodology.................................................................15 2.1.3 Modifications to CoMFA methodology .....................................16

2.2 Experimental design...........................................................................20 2.3 Molecular docking..............................................................................21

2. Aims of the Thesis ....................................................................................23

3. Development, validation and refinement of methodology for the improvement of CoMFA models (Papers I and II) .......................................24

3.1 Evaluation of CoMFA parameters .....................................................24 3.2 Data sets .............................................................................................26 3.3 CoMFA validation..............................................................................27 3.4 Comparison of model performance....................................................29 3.5 Improvement in predictive performance ............................................30 3.6 Validation by y-randomization...........................................................33 3.7 Comparison of models .......................................................................34

3.7.1 Optimal model settings. ..............................................................34 3.7.2 Data set dependence ...................................................................36

3.8 Refinement of methodology...............................................................39 3.8.1 The MaxMin algorithm...............................................................39 3.8.2 Compilation of a representative subset .......................................40 3.8.3 Assessment of subset size ...........................................................40

4. Modeling studies in HCV NS3 protease inhibitor design (Papers III and IV).................................................................................................................45

4.1 Phenylglycine-based inhibitors ..........................................................45 4.1.1 Molecular docking methodology ................................................46 4.2.2 Full-length HCV NS3 protein active site....................................47 4.2.3 SAR of phenylglycine-based inhibitors......................................48

Page 8: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

4.2 �-Amino acid replacement .................................................................51 4.2.1 Hydrogen bonding network of HCV NS3 protease inhibitors....53 4.2.2 Biochemical test results ..............................................................54 4.2.3 SAR of �-amino acid-containing inhibitors................................54 4.2.4 CoMFA modeling.......................................................................57 4.2.5 Summary.....................................................................................60

5. Concluding remarks ..................................................................................61

6. Acknowledgements...................................................................................63

7. References.................................................................................................65

Page 9: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

Abbreviations

3D-QSAR 3-Dimensional Quantitative Structure–Activity Rela-tionship

ABR Average Best Rank ACCA 1-amino-cyclopropanecarboxylic acid ACE Angiotensin-Converting Enzyme AChE Acetylcholine Esterase ARUV All Rows Union Volume BZR Benzodiazepine Receptor CoMFA Comparative Molecular Field Analysis COX-2 Cyclooxygenase-2 CPU Central Processing Unit DHFR Dihydrofolate Reductase difluoro-Abu 2-Amino-4,4-difluorobutanoic acid DYLOMMS Dynamic Lattice-Oriented Molecular Modeling System GPB Glycogen Phosphorylase b GUI Graphical User Interface HIV Human immunodeficiency virus HCV Hepatitis C Virus HTS High-throughput Screening LOO Leave-One-Out MS Molecular Shape MLR Multiple Linear Regression NANBH Non-A, Non-B Hepatitis NMR Nuclear Magnetic Resonance NS3 Non-structural Protein 3 PCA Principle Component Analysis PLS Partial Least Squares QSAR Quantitative Structure–Activity Relationship RNA Ribonucleic Acid SAMPLS Sample-Distance Partial Least Squares SAR Structure–Activity Relationship SDEPcv Cross-validated Standard Error of Prediction SPL Sybyl Programming Language THER Thermolysin THR Thrombin vinyl-ACCA (1R,2S)-1-amino-2-vinyl-cyclopropanecarboxylic acid

Page 10: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease
Page 11: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

11

1. Introduction

1.1 Molecular modeling in drug discovery In an effort to reduce the cost of developing new medicines and their time to market, pharmaceutical companies have attempted to streamline the drug discovery process using computational methods. Today, virtually every drug company of appreciable size has adopted computational methodology in most stages of the design process.1-3 Many computational methods comple-ment one another and may be combined to help rationalize the drug discov-ery process.

In cases where it is possible to determine the 3-dimensional structure of the biomolecular target, molecular docking4,5 becomes possible and allows structure-based hit identification and/or lead optimization.6,7 In contrast to hit identification, docking used in lead optimization generally involves a more exhaustive search of the conformations available to the lead candidate within the structure of the target. This provides a more accurate estimation of inter-actions between the lead compound and its target and helps to facilitate ra-tional drug design. Despite its usefulness as a qualitative tool, molecular docking has not been widely successful in estimating biological activity.8

During the later stages of drug design, where a series of compounds has been synthesized and tested, 3-dimensional quantitative–structure activity relationship (3D-QSAR) models may be used to provide an accurate predic-tion of biological activity.9,10 Derivation of a robust 3D-QSAR model re-quires knowledge of the biologically active conformation. Thus, molecular docking and 3D-QSAR methods can be used in a complementary way to provide more information than would have been possible using the two methods separately.11

The material presented within this thesis provides an example of how 3D-QSAR and docking methods may be used cooperatively. First, methodology for improving the predictive ability of a 3D-QSAR technique is developed and validated. Molecular docking is then applied to the optimization of in-hibitors of the hepatitis C virus (HCV) NS3 protease. Finally, information gained from docking is used to allow the derivation of an enhanced 3D-QSAR model.

Page 12: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

12

1.2 The hepatitis C virus (HCV) Hepatitis is a condition characterized by inflammation of the liver. This may be a result of injury, excessive alcohol consumption or may be caused by any of the hepatitis viruses. Specific tests for the hepatitis A and B viruses emerged in the 1970’s and indicated that there was a third and unidentified virus responsible for liver inflammation. While this was initially referred to as non-A, non-B hepatitis (NANBH), it was not until 1989 that the causative agent of NANBH was identified, when it was named hepatitis C virus (HCV).12

The hepatitis A, B and C viruses result in liver inflammation but are ge-netically unrelated viruses. The hepatitis C virus is a member of the Flaviviridae family and unlike the hepatitis A and B viruses, roughly 70% of hepatitis C acute infections become chronic.13 The most recent approxima-tion of HCV prevalence is 2% of the world’s population, or roughly 123 million people14 and HCV is currently the leading cause of liver transplanta-tion.15 Despite the fact that HCV is the most prevalent blood-borne patho-gen,16 the only treatment currently available for chronic infection is a combi-nation of pegylated interferon-� and ribavirin. However, this treatment has many adverse side effects and a low virological response rate, especially for individuals infected with the most prevalent form, genotype 1.17 There is currently no vaccine for HCV infection, although some indication of hope for an HCV vaccine has emerged.18

Although some research has focused on vaccine development, most ef-forts have been devoted to attacking various stages of the viral life cycle.19,20 HCV is a positive-stranded RNA virus whose genome is comprised of ~9600 nucleotides, responsible for encoding the 3011-residue HCV polyprotein. This polyprotein is proteolytically cleaved into 10 mature proteins: C, E1, E2, p7 and the nonstructural proteins NS2, NS3, NS4A, NS4B, NS5A and NS5B.21 While the C, E1, E2, p7 and NS2 proteins are cleaved by various means, the NS3 protein is responsible for cleavage at the NS3/4A, NS4A/4B, NS4B/5A and NS5A/5B junctions.22-24 The NS3 protein is bi-functional, comprised of a serine protease domain in the N-terminus and a helicase domain in the C-terminus. While the NS3 protease was crystallized in 1996,25,26 it was not until 1999 that the full-length HCV NS3 prote-ase/helicase structure was resolved.27 In addition to its importance in poly-protein cleavage, it has been shown that that the NS3 protease is involved in interfering with the host antiviral response to HCV infection.28-31

Clearly the NS3 protein is an attractive target for fighting HCV infection and inhibitors of the HCV NS3 protease have been subjected to clinical tri-als. These include BILN-2061 (ciluprevir),32 VX-950 (telaprevir),33,34 SCH

Page 13: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

13

503034 (boceprevir)35,36 and ITMN-191,37,38 whose structures and activities are listed in Figure 1 In addition to these, Medivir has disclosed the com-mencement of clinical trials of an HCV NS3 protease inhibitor but have not yet revealed its chemical structure.39

Figure 1. Structures of HCV NS3 protease inhibitors subjected to clini-cal trials.

NO

NH

O

N

O

O

O NHO

N

S

HN

BILN 2061 (ciluprevir)

O

OH

N HN

OONHO

HN

O

N

N

O

O

HN

VX-950 (telaprevir)

N HN

OO

SCH 503034 (boceprevir)

O

O

NH2NHNH

O

NO

NH

O

O

O NHO

O

NH

N

O

F

SO O

ITMN-191

Page 14: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

14

2. Computational Methods

2.1 Comparative Molecular Field Analysis

2.1.1 Brief history of (Q)SAR In order to help rationalize the drug design process, medicinal chemists often rely on structure–activity relationships (SAR). An SAR may be developed by making structural changes to a common core structure and observing how these changes affect activity. The resulting SAR model is then useful in lead optimization, where substituents with chemical properties similar to those responsible for high potency may be further explored.

A natural extension of the traditional SAR is the quantitative structure–activity relationship (QSAR), where some measures of chemical properties are correlated with biological activity to derive a mathematical representa-tion of the underlying SAR. One of the first ever QSAR models is that of Richardson in 1869 where the narcotic effect of a series of alcohols was correlated with molecular weight.40 Nearly a century later, the independent development of the Free-Wilson41 method and Hansch analysis42 revolution-ized QSAR. A more detailed review describing the development of QSAR may be found in the works of Kubinyi.9,43

While traditional QSAR models relating biological activity with chemical properties have been useful, they do not account for how changes in three-dimensional molecular shape affect biological response. In an attempt to include shape-related descriptors in QSAR modeling, attention was turned to the development of 3D-QSAR.44-47 3D-QSAR modeling techniques began to emerge in the late 1970’s and early 1980’s with the introduction of the mo-lecular shape (MS) method by Simon et al.48,49 and Hopfinger,50 the distance geometry method of Ghose et al.51 and the prototype dynamic lattice-oriented molecular modeling system (DYLOMMS) method of Cramer.52 It was not until partial least squares (PLS)53 data analysis became available that comparative molecular field analysis (CoMFA) was technically possible.54 Since its development, CoMFA has grown to be arguably the most widely used 3D-QSAR method.

Page 15: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

15

2.1.2 CoMFA methodology In the early stages of 3D-QSAR development, researchers recognized that protein–ligand interactions were mainly non-covalent. Based on this, it was thought that the electrostatic and steric components from molecular mechan-ics force fields may provide sufficient information to allow the derivation of a predictive 3D-QSAR model. This was confirmed in 1988 with the success-ful derivation of a predictive CoMFA model for a set of steroids.54 CoMFA, in its current implementation,55 uses the steric and electrostatic energy terms from the Tripos force field.56

In order to calculate interaction energies and derive a model, CoMFA re-lies on two basic techniques developed shortly before its inception: GRID57 and PLS.53 GRID was developed with the intention of displaying energy contour surfaces encountered within the three-dimensional surface of a pro-tein. This was done by sampling interaction energies between a theoretical probe atom at pre-specified positions near the protein atoms. The method gets its name from the fact that a virtual grid box is drawn in the site of ligand binding and lattice points within that grid box define the locations for where interaction energies are to be calculated.

CoMFA is similar to GRID in that a similar grid box is used to define points in space from which interaction energies are calculated. However, rather than calculating interaction energies between a probe atom and a pro-tein receptor, CoMFA calculates interaction energies between the probe atom and a set of active ligands (often referred to as the ‘training set’). Since ligand shape rather than protein shape is considered, it is assumed that the ligands are in their biologically active conformation. Figure 2 shows a graphical representation of the grid box, its lattice intersections and how they are used to tabulate interaction energies.

In order for CoMFA to interpret how changes in molecular shape affect biological activity, compounds must be aligned prior to analysis. By aligning compounds based on, for example, a common structural core, it is possible to analyze interaction energies within a rigid grid box and determine which changes in ligand structure (i.e. changes in interaction energy) correlate with changes in observed activity.

After calculation of interaction energies at each lattice intersection for each compound, data must be analyzed to find the correlation between changes in interaction energy and potency. Depending on the size and shape of the biologically active ligands, the number of lattice intersections may number in the thousands and analysis of the data goes well beyond the limi-tations of multiple linear regression (MLR). PLS, on the other hand, is well suited to finding the correlation between a small number of dependent vari-ables and a large number of independent variables and made the derivation of CoMFA models possible. The final QSAR equation is comprised of a term at each grid point for electrostatic and steric interaction energies along

Page 16: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

16

with a coefficient for weighting each term. The activity of untested com-pounds can then be predicted by calculating steric and electrostatic interac-tion energies at each lattice intersection and substituting those values in the QSAR equation.

Figure 2. Diagram indicating the position of lattice intersections within the CoMFA grid box and how they are used to tabulate interaction energies.

2.1.3 Modifications to CoMFA methodology Since its original presentation, CoMFA has become the most widely used 3D-QSAR method, which has naturally drawn a lot of attention. Analysis of the scientific literature has uncovered a long list of references where authors have attempted to improve upon the existing CoMFA methodology. For an overview of many of these studies published between 1993 and 1998, the reader is referred to the work of Kim et al.58 and Norinder.59

Ligand pKi S1 .. SN E1 .. EN

Lig1

Lig2

.. .. .. .. .. .. .. ..

7.9 0.04 .. 0.2 -1.4 .. -2.3

5.0 0.36 .. 0.9 -2.7 .. -0.1

O

HOH2CO

OH

H

H

H

Page 17: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

17

Many of the steps related to the development of a CoMFA model have been addressed. These include training and/or test set selection,60-64 position of lattice points,65 grid size and definition,66-69 alignment rule,70-88 geometry optimization74,89-92 or partial atomic charge calculation,73,91,93-96 determination of the biologically active conformation,82-84,86,97-101 and consideration of bio-logical activity determination. Indeed, new 3D-QSAR methods have been developed in order to avoid some shortcomings of CoMFA methodology.102-

110

2.1.3.1 CoMFA parameters More relevant to the work presented in this thesis are the studies devoted to understanding the importance of one or more of the CoMFA parameters.111-

113 In its current implementation, there are a number of CoMFA settings that may be adjusted by users. These settings affect the way in which interaction energies are measured and processed prior to evaluation with PLS. While the majority of users have relied on the default CoMFA parameters, a number of studies have emerged where attempts were made to improve CoMFA results by modifying one or more settings.

A number of studies have been performed in order to understand the ef-fect of varying one or several CoMFA parameters. This approach may be of limited utility since it does not reveal the interdependence of settings and leaves out large areas of the experimental domain. In order to fully under-stand the effect of CoMFA parameters on predictive performance, many combinations of settings must be tested on multiple data sets. We have un-dertaken the task of exploring a wide range of CoMFA settings in order to understand their effect on predictive ability. The results from these investiga-tions are presented in greater detail in the forthcoming sections.

The ten CoMFA parameters under investigation in this thesis are listed and briefly described in Table 1. Due to the large number of studies per-formed in the past where aspects in CoMFA modeling such as data set alignment, orientation, and determination of biologically active conforma-tion have already been explored, we decided to focus this work on develop-ing an understanding of the interdependence of the CoMFA parameters and their effect on predictive performance. Not all of the settings listed in Table 1 have been available since CoMFA was originally introduced. Some set-tings have been included in the CoMFA methodology after the publication of later work advocating their use.

Page 18: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

18

Table 1. Description of CoMFA setting chosen for evaluation.

The Dielectric Function controls the formula for calculating electrostatic interaction energies. Using the default “Distance” setting, electrostatic ener-gies are calculated as a function of 1/r2, allowing for a steep falloff of energy as distance between probe atom and training set atoms increases. The “Con-stant” setting insures that Sybyl uses the classic Coulombic energy function where electrostatic energy varies as a function of 1/r, allowing for a more gradual falloff in energy with increasing distance. Although “Distance” is currently the CoMFA default, the Coulombic function was used in the origi-nal CoMFA publication.54

The Drop Electrostatics setting controls how electrostatic energies are treated at lattice points where steric energies are above their cutoffs. If this parameter is set to “All Rows Union Volume” electrostatic energies are ex-cluded from the forthcoming PLS analysis if the lattice point in question falls within the excluded steric volume for any molecule in the training set. Setting the Drop Electrostatics parameter to “No” ensures that all calculated electrostatic interaction energies are unchanged while setting it to “Yes” causes Sybyl to set electrostatic energies for that lattice point to the mean of all excluded values. In practice, this means that PLS will ignore the column of electrostatic energies since it is one of zero variance.

The Electrostatic and Steric Maxima are the maximum values to record for a given lattice point. If the absolute value of the calculated interaction energy exceeds this value, it is set to its maximum. These settings may take

Setting Name Setting Function Dielectric Function Formula for dielectric energy calculation. Drop Electrostatics Treatment of electrostatics at lattice points

with high steric field values. Electrostatic Maximum Maximum possible electrostatic energy

value. Field Type Which field types to be used in the analysis. H-Bond Function Use of the hydrogen bonding field. Repulsive vdW term Exponent used in the repulsive component of

the Lennard-Jones equation. Steric Maximum Maximum possible steric energy value. Switch Function Method of transitioning between sterically

allowed and disallowed regions. Transform Method for pre-processing steric and electro-

static energy terms prior to use in PLS. Volume Averaging Type Method of defining the location and size of

lattice points.

Page 19: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

19

any positive value but have been evaluated at the levels shown in Table 1 to reduce the number of models to be evaluated.

The Field Type setting controls what type of field or fields should be used in the CoMFA model. Setting this parameter to “Both” will ensure that both electrostatic and steric fields are used, while setting it to either “Steric” or “Electrostatic” ensures that only steric or electrostatic fields will be used.

Setting the H-Bond Function to “Yes” instructs Sybyl to transform the steric and electrostatic fields to hydrogen-bonding acceptor and donor fields, respectively. The field was introduced by Bohacek and McMartin114 and works by filtering lattice points depending on where they are positioned relative to potential hydrogen bonding groups in the training set. If a given lattice point is far from hydrogen bond donor or acceptor atoms, it is as-signed a value of 0. Lattice points may also be assigned a value of 0 if they are located in sterically forbidden areas very close to atoms. Lattice points close to potential hydrogen-bonding groups but not within sterically forbid-den areas are assigned an energy equal to the steric cutoff.

The Repulsive vdW term sets the exponent in the repulsive component of the Lennard-Jones expression for calculating steric interaction energies. The setting is rarely varied in published work possibly due to the fact that it is not available in the CoMFA graphical user interface (GUI). This parameter can be modified by setting the appropriate Tailor variable prior to insertion of the CoMFA column. The Lennard-Jones 6-12 potential is known to produce steep increases in steric potential in areas near the van der Waals surface.115-

117 Adjusting the exponent used in the Lennard-Jones repulsive term may help in overcoming this limitation.

Setting the Switch Function to either “Yes” or “No” corresponds to the setting “Smooth” and “Abrupt,” respectively, in the GUI. This setting allows for a smooth transition from lattice points within the cutoff region to lattice points within the allowed region. Setting the parameter to “Yes” allows for interpolation from 6 kcal/mol below the cutoff up to the cutoff. The switch function must be set to “No” if the Volume Averaging Type parameter is to be used.

The Transform parameter allows for modification of electrostatic and steric interaction energies prior to PLS. If the parameter is set to “Indica-tor,”117 steric and electrostatic energies below their respective cutoffs are assigned a value of zero, while potentials at or above their cutoffs are as-signed an energy equal to the cutoff, thus shifting the focus of the analysis toward the intermediate potentials. When the transform parameter is set to “Squared,” the magnitude of the interaction energy is squared and the sign of the original value retained. This setting is referred to as “Parabolic” in the CoMFA GUI.

The Volume Averaging Type parameter controls the way lattice points are used for interaction energy calculation. When the setting “None” is used, interaction energies are calculated at the precise location of the lattice point.

Page 20: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

20

If the “Box” setting is used, the energy at the given lattice point is set to the average of the energies calculated at the vertices of a box centered on the original lattice point. The “Box” setting has been shown to be helpful in reducing the effects of poor alignment or orientation.113

2.2 Experimental design In a purely combinatorial experimental design, all possible experiments are performed, combining all factors and levels. While this approach ensures complete sampling of the experimental domain, it may be impractical and costly depending on the difficulty of performing the experiments. A number of techniques have been developed to compile a refined list of representative experiments, where the goal is to maximize the information obtained while still performing a small number of experiments. Experimental design tech-niques should be applied where the evaluation of experiments is costly, im-practical or may have ethical implications.118

While technically any process of compiling a list of experiments to be performed may be referred to as an experimental design, the term will herein refer to the deliberate attempt to select a meaningful, short list of useful ex-periments intended to efficiently maximize the information content of ex-perimentation.119-126

Statistical experimental design methods127 were pioneered by Karl Pear-son in the 1920’s and 1930’s and represent a rational approach to experimen-tal design. The most notable characteristic of statistical experimental designs is that they provide a strict mathematical foundation for simultaneously varying all important factors based on only a few experiments (often less than 20). The experimental domain is represented by experiments at the edges, vertices and center of the domain. The assumption that the experi-mental domain has been fully represented holds true so long as it is regularly shaped with no holes or missing regions.

Discontinuous or irregularly shaped experimental domains are commonly encountered in high-throughput screening (HTS) campaigns, where thou-sands of compounds are screened for biological activity. Experimental de-sign is often used to compile a list of compounds representative of the wide array of molecules that may be practically synthesized. However, the ex-perimental domain (in this case a compound library) is irregularly shaped, with holes and slices removed due to the fact that all imaginable combina-tions of atoms cannot be connected to form molecules. In HTS, it is more practical to compile a character set (a list of all feasible experiments, or in this case molecules) and select a representative subset from this pool.

Perhaps the most straightforward type of experimental design that can be applied in cases of irregular experimental domain is a random selection of experiments. From a large pool of possible experiments the practitioner may

Page 21: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

21

randomly select a smaller number of experiments. While this may seem hap-hazard and without rational basis, random experimental designs often work fairly well. Random selection methods have been used in HTS campaigns and there is some debate in the literature whether random or rational experi-mental designs are better.128-131

Representative subsets may be compiled more rationally using cluster-based selection or maximally diverse subset selection.132 In the case of com-binatorial library design for HTS, compounds are characterized using princi-pal component analysis (PCA) or PLS based on chemical descriptors.133 This allows for clustering or distance-based selection within the Cartesian space defined by the PCA scores.

Of the many maximally diverse subset selection algorithms available to-day, MaxMin is among the most highly regarded due to its ability to provide good coverage of the experimental domain with minimal edge effects.131,134 The algorithm was introduced in 1969 by Kennard and Stone in order to provide an efficient solution to experimental design in cases of irregularly shaped experimental domain.135 Although the algorithm is straightforward to implement, it does place a high demand on computational resources due to the need to compute and compare all mutual compound–compound dis-tances. MaxMin was used in this work to provide a refined set of representa-tive CoMFA models, a task conceptually similar to designing a diverse com-pound library. Its use and implementation are described in further detail in Section 3.8.1.

2.3 Molecular docking An understanding of protein-ligand interactions is a key requirement for rational drug design. Although it is possible to experimentally elucidate the structure of protein–ligand complexes using crystallographic methods or nuclear magnetic resonance (NMR), it would be difficult to do so for all ligands in a medicinal chemistry project. Molecular docking may be able to provide this important information by predicting the structure of the protein–ligand complex.

Molecular docking works by exploring the space available to ligands within the cavity formed by the protein active site. A scoring function is then used to assess how well the ligand structures and their conformations com-plement the shape of the protein. Scoring functions are intended to give a quantitative estimate of ligand binding affinity by providing a weighted sum of positive and negative protein–ligand interactions.

While their utility has been evaluated and shown to often correctly predict bioactive conformation,8,136-142 the main drawback with docking is its diffi-culty in estimating biological activity.143 Scoring functions used in docking

Page 22: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

22

are often simplified to allow for computational efficiency and usually do not describe binding free energy.144

Although there are a large number of docking programs available to-day,145,146 the FLO+ docking suite was used for docking calculations in this thesis. More information regarding the methodology used can be found in Section 4.1.1.

Page 23: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

23

2. Aims of the Thesis

This thesis is devoted to the development of methodology for improving the predictive performance of CoMFA and modeling studies of HCV NS3 pro-tease inhibitors.

Specific objectives include:

1. To develop and validate improved CoMFA modeling by settings op-

timization. 2. To analyze and refine CoMFA improvement methodology to allow

for simplified implementation.

3. To provide an understanding of the SAR for two series of tripeptide HCV NS3 protease inhibitors using molecular docking.

4. To derive predictive 3D-QSAR models for HCV NS3 protease in-

hibitors using enhanced CoMFA models and a docking-based alignment rule.

Page 24: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

24

3. Development, validation and refinement of methodology for the improvement of CoMFA models (Papers I and II)

While many aspects of CoMFA methodology have been scrutinized and improved upon, prior to this work, there had been no published study where the interdependence of CoMFA settings was evaluated. Although there have been several published studies highlighting the utility of one or more set-tings, they have been evaluated alone, thus omitting large areas of experi-mental space. To understand how settings are interrelated and their effect on the predictive performance of CoMFA, we attempted to test virtually all possible combinations of parameters. This approach began in our lab with Schaal studying HIV-1 protease inhibitors using CoMFA.147 However, in this thesis a more comprehensive approach is taken, studying more CoMFA settings and applying the method to more data sets.

3.1 Evaluation of CoMFA parameters The settings listed in Table 2 are those chosen for evaluation in this work. In some cases these parameter names may differ from those shown in the CoMFA GUI.

Table 2. CoMFA settings chosen for evaluation.

Setting Name Setting Optionsa Dielectric Function Distance, Constant Drop Electrostatics Yes, No, All Rows Union Volume Electrostatic Maximum 5, 15, 30, 45, 60, 80b Field Type Both, Electrostatic, Steric H-Bond Function Yes, No Repulsive vdW term 8, 10, 12 Steric Maximum 5, 15, 30, 45, 60, 80b Switch Function Yes, No Transform None, Indicator, Squared Volume Averaging Type None, Box a Underlined setting considered default. b Values not limited to those shown.

Page 25: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

25

The task of evaluating CoMFA models using many combinations of settings is complicated by the fact that many combinations are illogical or impossi-ble. For example, settings pertaining to the steric interaction energy field (e.g. steric energy cutoff) need not be adjusted if only the electrostatic field is used. Based on CoMFA documentation and preliminary investigations, it was possible to build a decision tree to determine which combinations of settings are possible. This decision tree is shown in Figure 3.

Figure 3. Decision tree for determining possible combinations of CoMFA settings.

Based on this decision tree, it was possible to compile a list of 6120 combi-nations of settings. The process of deriving CoMFA models using all 6120 combinations of settings was simplified using a Sybyl Programming Lan-guage (SPL) script. Since factors such as region size and definition, orienta-

H-BondFields?

Cutoffs:•Steric

(5, 15, 30, 45, 60, 80)

Yes

Rep vdW• 8• 10• 12

NoField Type:• Both• Electrostatic• Steric

Both

Cutoffs:• Electrostatic• Steric

(5, 15, 30, 45, 60, 80)

Electrostatic

Steric

Cutoffs:• Electrostatic

(5, 15, 30, 45, 60, 80)

Cutoffs:• Steric

(5, 15, 30, 45, 60, 80)

Drp. Elec.• Yes• No• ARUV

TransformFunction• None• Indicator• Squared

Diel. Func.• Constant• Distance

Switch• Yes• No

NoVol Av. Type• None• Box

Rep vdW• 8• 10• 12

Legend:•Electrostatic and steric fields used

•Electrostatic fields used

•Steric fields used

•H-Bond fields used

Yes

Start

Build CoMFAModel

Build CoMFAModel

Build CoMFAModel

Page 26: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

26

tion and alignment have been reviewed by other authors, they will not be evaluated here. Thus, the default grid box (2 Å grid spacing, +1-charged C-sp3 probe atom) was used along with orientation and alignment derived by the original authors of the data sets.

3.2 Data sets To assess the potential improvement in predictive performance made possi-ble by this methodology, it has been applied to nine electronically available data sets. These include the steroids data set presented in the original CoMFA article54 as well as eight other data sets made available in the work of Sutherland et al.148 Data sets, data set sizes and their range in activity are shown in Table 3. Table 3. Data sets, data set sizes and their range in activity.

Data set Size Activity Range ACE 114 2.14 – 9.94 (pIC50) AChE 111 4.27 – 9.52 (pIC50) BZR 147 5.52 – 8.92 (pIC50) COX-2 282 4.03 – 9.00 (pIC50) DHFR 361 3.30 – 9.81 (pIC50) GPB 66 1.30 – 6.80 (pKi) Steroids 31 5.00 – 7.88 (pK) THER 76 0.52 – 10.17 (pKi) THR 88 4.36 – 8.48 (pKi)

The steroids data set has been frequently used as a benchmark data set for testing new 3D-QSAR methods.149 Steroid training set compounds are avail-able in the Sybyl molecular modeling suite and the test set compounds were obtained from the Johan Gasteiger research group.150 3D atomic coordinates of test set compounds were derived by the Gasteiger group using CORINA.151 To ensure uniformity between training and test set compounds, they were minimized to convergence using the Tripos force field,56 Gasteiger–Marsili152,153 charges and the conjugate gradient method. The alignment rule used was the same as that published in the original CoMFA paper and involved least squares fitting all compounds to deoxycortisol us-ing steroid atoms 3, 5, 6, 13, 14, and 17, numbered according to the steroid nomenclature.

The eight Sutherland data sets were obtained from the Supplementary In-formation available online and were used without modification. The data sets were of various size and span in activity and included: 114 angiotensin con-

Page 27: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

27

verting enzyme (ACE) inhibitors,70 111 acetylcholinesterase (AChE) inhibi-tors, 163 benzodiazepine receptor (BZR) ligands, 322 cyclooxygenase-2 (COX-2) inhibitors, 397 dihydrofolate reductase (DHFR) inhibitors, 66 gly-cogen phosphorylase b (GPB) inhibitors,154 76 thermolysin (THER) inhibi-tors,154 and 88 thrombin (THR) inhibitors.155 For the BZR, COX-2 and DHFR data sets, several compounds were removed (by Sutherland) due to indeterminate activity, reducing their sizes to 147, 282 and 361 compounds, respectively. Training and test sets were divided by Sutherland et al. based on chemical diversity and span in potency. Approximately two-thirds of the compounds were assigned to the training set with the remainder of com-pounds reserved for the test set.

Data sets used in the study were chosen due to their previous use and electronic availability. This allows for direct comparison of improvement in predictive performance with other published work.

3.3 CoMFA validation The derivation of a CoMFA model is by no means foolproof and one must continually validate models to be sure of their reliabilty.156 While model fit (r2) has been recommended in the past for validation of QSAR models,157 the traditional means of CoMFA model validation is the cross-validated r2 (q2) (Eq. 1), which is a measure of the ability of the model to make activity pre-dictions for compounds used to train the model.

Cross-validation is an iterative process of excluding one or more training set compound, re-deriving the QSAR equation and predicting the activity of the excluded compound(s). The process is repeated until each training set compound has been excluded exactly once. Thus, the activity of each train-ing set compound will have been predicted using QSAR equations derived without knowledge of those compounds, allowing the calculation of q2. A q2 of 0 represents prediction based on random guessing, a q2 of 1 represents perfectly accurate prediction and a negative q2 represents prediction worse than random guessing. Although a minimum q2 of 0.3 is generally recom-mended, the reliability of q2 has been assessed and it has been shown that q2 values lower than 0.3 may be acceptable depending on the size of the data set.158 The general formula for q2 is given in Equation 1.

� �� � �

��

��

��

2

22 1

yy

yyq pred Eq. 1

Here, y and ypred are the experimental and calculated activities, respectively, and y is the average experimental activity of the training set. q2 was evalu-

Page 28: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

28

ated using Leave-One-Out (LOO) cross-validation. Calculation time was reduced using Sample Distance Partial Least Squares (SAMPLS).159 By de-fault, Sybyl determines model dimensionality using the number of compo-nents corresponding to the highest possible q2. To avoid overfitting,160 the appropriate number of principle components was determined based on the first minimum cross-validated standard error of prediction (SDEPcv).161,162

While q2 has been shown to be a useful validation test,158 external valida-tion has been advocated to more rigorously assess model performance.163 External validation involves setting aside some compounds from the training set to form a test set. After the model has been developed, the activity of these compounds is calculated and another r2 (r2 of prediction, or r2

pred) is calculated. r2

pred is calculated using a similar equation as for q2, but with some minor changes:

� �� � �

��

�� ��

2

22 1

training

predpred yy

yyr Eq. 2

The values y and ypred are the measured and predicted activities for test set compounds, respectively, while y and y training are the measured activities of the test set and average measured activity of the training set members. The range of values for r2

pred are the same as for q2 and interpretation is analo-gous. While r2

pred is generally accepted as a more conservative estimate of predictive ability, its use has also been criticized because it requires the re-moval of information from the training set.164 Depending on the training set size, this may have a negative impact on the predictive ability of the final CoMFA model.

Yet another method for ensuring the reliability of models is response variable randomization. Here, the biological activities for training set com-pounds are randomly re-assigned and then used to derive a new model. Y-randomization tests have been used elsewhere,156 where it was shown that over-optimization in CoMFA was not indicated by q2.165Although high pre-dictive statistics will occasionally arise in randomization tests, randomized models should be clearly distinguishable from those derived using correctly assigned activities.166,167 If it is possible to derive an equally good model using randomly assigned biological activities as is possible using correctly assigned activities, then spurious correlation is likely.166 In this work, where we attempt to improve the predictive performance of CoMFA by sampling thousands of models, it was important for us to be sure that our optimization did not result in models relying on spurious correlation. A thorough evalua-tion of y-randomization tests may be found in the work of van der Voet.168

Since random assignment of activity is different each time it is performed, randomization should be evaluated many times. To understand the likelihood

Page 29: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

29

that our parameter optimization methodology may lead to spurious correla-tions, 100 iterations of activity data assignment were performed for each data set. The parameter optimization algorithm was applied to all 100 sets, resulting in the derivation of 612000 models for each data set and a total of roughly 5.5 million models for the entire study.

3.4 Comparison of model performance To optimize CoMFA parameters, 6120 models are evaluated and model per-formance may be judged by various means, including q2, r2

pred, and r2. Unfor-tunately, selection of the most predictive CoMFA model is not as simple as choosing the model with the highest values in all of these metrics. Figure 4 shows a scatter plot, comparing the q2 and r2

pred values for CoMFA models derived for the steroids data set.

Figure 4. Comparison of q2 and r2

pred for models derived using the Steroids data set. All models with non-zero q2 and r2

pred are shown. Figure 4 shows the lack of correlation between q2 and r2

pred for the steroids data set and similar behavior was observed for each of the eight Sutherland data sets. While changing settings had little effect on q2 (q2 varied roughly between 0.6 and 0.9 for most models) there was a large spread in r2

pred. Thus,

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

r2pred

q2

Page 30: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

30

while most models had acceptable q2 values, many failed to accurately pre-dict activities for the test set compounds.

Given the lack of correlation between q2 and r2pred, optimal model selec-

tion must be based on some weighting of metrics. In line with published recommendation,169 r2

pred was given preference, followed closely by q2. Model fit (r2) was used to differentiate between models with similarly high r2

pred and q2 values. The model subjectively chosen as optimal for each data set possessed a high (although often not the highest) r2

pred and q2 and often very closely resembled the model assigned the highest score based on the following formula:

222 245 rqrScore pred �� Eq. 3

Schultz et al. have used a similar scoring function composed of weighted validation metrics to compare QSAR models used in ecotoxicity.170 Weight-ings for each individual validation metric were based on the authors’ expert opinions as it is difficult to statistically validate the weightings since they are based on the subjective reliability of each metric determined by the modeling community. Additionally, Kolossov et al. have devised an unbiased metric for comparison of QSAR models derived using different regression methods, with training sets of various size, and using different descriptors.171

3.5 Improvement in predictive performance As can be seen in Table 4 substantial improvements in internal and external predictive ability were achieved by selecting more appropriate CoMFA set-tings.

Page 31: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

31

Table 4. Comparison of default and optimized q2, r2pred and r2 for the 9 data

sets studied.

Default Optimized Data set r2

pred q2 r2 r2pred q2 r2

ACE 0.48 0.68 0.80 0.71 0.73 0.96 AChE 0.59 0.52 0.78 0.73 0.70 0.96 BZR -0.01 0.32 0.46 0.45 0.45 0.78 COX-2 0.27 0.49 0.69 0.37 0.56 0.75 DHFR 0.59 0.65 0.79 0.64 0.68 0.86 GPB 0.43 0.42 0.84 0.57 0.73 0.98 Steroids 0.31 0.70 0.94 0.64 0.84 0.95 THER 0.55 0.49 0.94 0.61 0.65 0.95 THR 0.64 0.59 0.85 0.72 0.66 0.87

While it was possible to improve r2

pred, q2 and r2 values for all data sets stud-ied, the extent to which these were improved was data set dependent. Data sets whose predictive statistics were low (e.g. BZR) were more substantially improved than those with already good models (e.g. DHFR). By modifying settings, it was possible to improve the default BZR model (r2

pred = -0.01, q2 = 0.32 and r2 = 0.46) quite significantly, showing improvements in r2

pred (0.46), q2 (0.13) and r2 (0.32). DHFR, on the other hand, with its already high validation statistics (r2

pred = 0.59, q2 = 0.65 and r2 = 0.79) was affected to a lesser extent by settings optimization. Improvements in r2

pred, q2 and r2 were 0.05, 0.03 and 0.07, respectively.

Although CoMFA is the most widely used 3D-QSAR method, it is by no means perfect. Since its inception, other QSAR methods have been proposed to address CoMFA’s shortcomings. HQSAR172 and CoMSIA104 are two such methods for which literature data for the 8 Sutherland data sets exists. Fig-ures 5 and 6 show the graphical comparison of the different QSAR methods and their q2 and r2

pred values, respectively.

Page 32: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

32

Figure 5. A comparison of q2 for Sutherland datasets using default CoMFA, optimized CoMFA, CoMSIA and HQSAR. CoMSIA and HQSAR data taken from the work of Sutherland, et al.148

Figure 6. A comparison of r2pred for Sutherland datasets using default

CoMFA, optimized CoMFA, CoMSIA and HQSAR. CoMSIA and HQSAR data taken from the work of Sutherland, et al.148

While the CoMSIA and HQSAR methods were able to produce models with slightly higher q2 or r2

pred for the COX-2, DHFR, GPB and THR data sets, those models were not higher than optimized CoMFA in both q2 and r2

pred.

0.25

0.35

0.45

0.55

0.65

0.75

0.85

ACE AChE BZR COX-2 DHFR GPB THER THR

Dataset

q2

CoMFA def.

CoMFA opt.

CoMSIA

HQSAR

-0.30

-0.10

0.10

0.30

0.50

0.70

ACE AChE BZR COX-2 DHFR GPB THER THR

Dataset

r2pred

CoMFA def.

CoMFA opt.

CoMSIA

HQSAR

Page 33: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

33

However, optimization of CoMFA settings was successful in deriving mod-els with higher q2 and r2

pred for the remaining 4 data sets (ACE, AChE, BZR and THER) than was done using CoMSIA and HQSAR.

3.6 Validation by y-randomization Y-randomization was used to ensure our settings optimization process did not result in spurious correlation. To fully validate the parameter optimiza-tion procedure, it is necessary to show that by modifying settings it is not possible to develop highly predictive models for data sets with randomly assigned activity. More information regarding y-randomization tests may be found in section 3.3.

Tables 5 and 6 show a comparison of the number of models derived using correct and randomly assigned activities with both q2 and r2

pred above the specified cutoff. It is important to note that there were 6120 models derived for each data set using correct responses, whereas there were 612 000 mod-els derived for each data set using randomly assigned activities.

Table 5. Total numbera of models with both q2 and r2

pred above the specified cutoff derived using correctly assigned responses.

Dataset >0.00 >0.10 >0.20 >0.30 >0.40 >0.50 ACE 6116 6116 6116 6107 5524 4051 AChE 6102 6088 6070 5913 5003 1897 BZR 5598 4221 2347 509 43 0 COX-2 6099 5784 4583 961 4 0 DHFR 6120 6120 6119 6113 6056 5508 GPB 5979 5562 4950 3603 1579 99 Steroids 6020 5926 5603 4727 2519 507 THER 6114 6075 5990 5794 4823 1210 THR 6115 6104 6086 6049 5995 5659 a Represents total number of models from 6120 models derived using real responses

Page 34: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

34

Table 6. Total numbera of models with both q2 and r2pred above the specified

cutoff derived using randomly assigned responses.

Dataset >0.00 >0.10 >0.20 >0.30 >0.40 >0.50 ACE 6738 3 0 0 0 0 AChE 7738 294 0 0 0 0 BZR 15531 0 0 0 0 0 COX-2 6089 0 0 0 0 0 DHFR 22284 0 0 0 0 0 GPB 17235 240 43 0 0 0 Steroids 8940 2556 491 75 5 0 THER 5386 32 0 0 0 0 THR 11524 152 0 0 0 0 a Represents total number of models from 612 000 models derived using randomly assigned responses

Of the 5.5 million models included in Table 6 and derived using randomly assigned activity, very few have acceptable q2 and r2

pred values. Only 534 models have both q2 and r2

pred above 0.2, there were only 75 above 0.3, only 5 above 0.4 and none with both q2 and r2

pred above 0.5. Despite the fact that Table 5 was compiled using only 1% as much data as Table 6, 18 931 mod-els have both q2 and r2

pred above 0.5. Thus, models derived using correctly assigned responses stand in stark contrast to those derived using randomly assigned responses. Based on these y-randomization tests it can be con-cluded that use of the parameter optimization procedure is unlikely to result in spurious CoMFA models.

3.7 Comparison of models

3.7.1 Optimal model settings. Up to this point, parameter optimization has proven effective in enhancing the predictive performance of CoMFA models using 9 different data sets. Y-randomization tests have shown that use of this procedure is unlikely to re-sult in spurious CoMFA models. It may be interesting to see if one particular combination of parameters could replace the current default settings. Table 7 shows a comparison of the best models for each data set, where the best model was selected based on the scoring function described in section 3.4.

Page 35: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

35

Table 7. Comparison of optimal settings for data sets studied.

Settinga Steroids ACE AChE BZR COX-2 Model # 3739 4556 5846 1213 2354 Dielectric Function Distance Distance Distance Distance Distance Drop Electrostatics ARUV No No ARUV No Electrostatic Maximum

30 45 15 5 80

Field Type Both Both Both Both Both H-Bonding No No No No No Rep. vdW term 12 8 12 10 12 Steric Maximum 30 15 5 80 45 Switch Function No No No No No Transform Indicator Indicator Indicator Squared Squared Vol. Av. Type None Box Box Box Box Setting DHFR GPB THER THR Model # 6079 4622 4078 3040 Dielectric Function Distance Distance Constant Constant Drop Electrostatics ARUV No ARUV ARUV Electrostatic Maximum

5 45 5 5

Field Type Both Both Both Both H-Bonding No No No No Rep. vdW term 12 12 10 12 Steric Maximum 5 15 30 45 Switch Function No No No No Transform Squared Squared Indicator None Vol. Av. Type Box Box Box None a Underlined settings are those considered to be default.

Visual inspection of the settings used in the optimal models for these data sets reveals several similarities. For example, all top scoring models use either “Indicator” or “Squared” Transform Types, and many of them also use the “Box” Volume Averaging type. While this is true, further examination of the data indicated that many of the models with the lowest predictive per-formance were also derived using either “Indicator” or “Squared” Transform Types (See scatter plots included in Supplementary Information for Paper I). Perhaps less surprising is the fact that all optimal models use both electro-static and steric fields and maybe also that none are derived using H-bonding fields.

Page 36: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

36

3.7.2 Data set dependence To further analyze data set dependence, model settings were examined to see if there was any particular settings combination that performed better than default (in both q2 and r2

pred) for all 9 data sets. This analysis revealed that there were no such combinations of settings. However, 5 models were found to perform better than default in 7 of the 9 data sets studied. While data set dependence was indicated by this preliminary data inspection, a more com-prehensive multivariate data analysis was done on all 9 data sets to reinforce this point. This was done using PLS, correlating model settings with model performance, using the model score given in Equation 3.

The data matrix, including model settings and their corresponding model scores, is complicated and presents several difficulties in PLS modeling. These difficulties arose mainly due to the fact that model settings are cate-gorical variables expressed as both numbers and text strings. For example, electrostatic and steric settings are numerical and may take (as implemented in this work) the values 5, 15, 30, 45, 60 or 80. Other settings, such as the dielectric function, are categorical and may take the values “Constant” or “Distance.” One added level of complexity was presented by the inapplica-bility of some settings depending on how other settings were configured. For example, in cases where only the electrostatic field is used, settings relating to the steric field, such as the steric cutoff or the vdW term, have no influ-ence on the derivation of the model. To provide an accurate representation of the data matrix, settings were preprocessed prior to multivariate analysis.

The GIFI system of binning173 was found to be a useful data preprocess-ing scheme. Here, the variables are separated into bins and the original vari-able values are replaced with a 1/0 representation. The number of bins each variable is separated into depends on the range of original values. Simple categorical variables are easily represented using this scheme since the num-ber of GIFI bins is equal to the number of levels the categorical variable can take. An extra bin was added to several parameters to clearly represent in-stances where that parameter was unavailable. Thus, GIFI binning was able to transform string and numerical representations of model settings as well as allow for an accurate representation of the parameters actually used in the derivation of each CoMFA model.

There are several variants of the GIFI binning scheme, selection of the appropriate variant largely depends on the nature of the data matrix at hand. In this case, the GIFI-1 binning scheme presented by Berglund et al.174 was used. Figure 7 shows binning of the electrostatic cutoff parameter and how the original variable values were replaced with the appropriate 1/0 represen-tation. Table 8 shows the original 10 CoMFA parameters and their GIFI bin names. GIFI binning was performed using a Perl script.

Page 37: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

37

Orig Var. El. Cutoff E-M E-5 E-15 E-30 E-45 E-60 E-80

N/A 1 0 0 0 0 0 0

5 0 1 0 0 0 0 0 15 0 0 1 0 0 0 0 30 0 0 0 1 0 0 0 45 0 0 0 0 1 0 0 60 0 0 0 0 0 1 0 80 0 0 0 0 0 0 1

Figure 7. Depiction of the GIFI-1 binning scheme using the electrostatic cutoff as an example.

Table 8. CoMFA settings and their GIFI bin names.a Setting Name Setting Options GIFI Bin Names Dielectric Function Distance, Constant Diel_M,c Diel_dist, Diel_const Drop Electrostatics Yes, No, ARUV Drp_M,c Drp_Y, Drp_N, Drp_ARUV Electrostatic Maximumb

5, 15, 30, 45, 60, 80 E_M,c E_5, E_15, E_30, E_45, E_60, E_80

Field Type Both, Electrostatic, Steric F_H,d F_B, F_E, F_S H-Bonding Yes, No N/Ad Rep. vdW term 8, 10, 12 Rep_M,c Rep_8, Rep_10, Rep_12 Steric Maximumb 5, 15, 30, 45, 60, 80 S_M,c S_5, S_15, S_30, S_45, S_60,

S_80 Switch Function Yes, No Sw_M,c Sw_Y, Sw_N Transform None, Indicator, Squared T_M,c T_NONE, T_IND, T_SQ Vol. Av. Type None, Box V_M,c V_NONE, V_BOX a Underlined setting considered default. b Possible values not limited to those shown. c Denotes instances where the setting is unavailable d Setting re-categorized under the Field Type, where it is denoted with F_H.

Subsequent to GIFI data preprocessing, it was possible to derive an 8-component PLS model including all GIFI-transformed settings and model scores for each of the nine data sets. This model explained 50% of the X-block variance, 36% of the Y-block variance and had a q2 of 0.36. To aid in model interpretation, CoMFA settings with a variable importance below 0.7 were removed, resulting in a subsequent 8-component model with R2X = 0.76, R2Y = 0.34 and Q2 = 0.34. Settings removed included F_H, S_5, S_15, S_30, S_45, S_60, S_80, E_30, E_45, E_60, T_M, Sw_M, Rep_12 and Drp_ARUV (see Table 8 for further clarification). As expected, removal of these settings had a positive influence on R2X with little change in R2Y or Q2.

Page 38: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

38

The loadings plot (showing the first two components) for this refined model is shown in Figure 8. The relative placement of the parameters and model scores indicates which settings are positively and negatively corre-lated with high predictive performance. Settings that lie close to the model score for a particular data set are correlated with a high predictive perform-ance, while the opposite is true for settings far from the model scores. Pre-processing the data with GIFI allows for more straightforward interpretation since it shows not only which parameter is correlated with high or low pre-dictive performance, but also how specific settings for parameters correlate with model score.

Figure 8. Loadings plot of the first two components showing settings with a variable importance greater than 0.7. Multivariate analysis was performed using all 6120 CoMFA settings combinations and model scores for each data set.

One of the key features shown in the figure is the correlation between pa-rameters and model scores for the individual data sets. Interestingly, model scores and settings are placed differently within the loadings space for com-ponent, thereby indicating that no generally preferred model settings can be found. The model scores for the data sets are scattered throughout the first and fourth quadrants (as seen in Figure 8). For example, the BZR, COX-2

-0.3

-0.2

-0.1

-0.0

0.1

0.2

0.3

-0.4 -0.3 -0.2 -0.1 -0.0 0.1 0.2 0.3 0.4

w*c

[2]

w*c[1]

GIFI1.M13 (PLS)w*c[Comp. 1]/w*c[Comp. 2]

R2X[1] = 0.183445 R2X[2] = 0.112475

XY

F_B

F_E

F_S

S_M

E_M

E_5

E_15

E_80

T_NONE

T_IND

T_SQ

Sw_N

Sw_YV_M

V_NONE

V_BOX

Rep_M

Rep_8

Rep_10

Diel_M

Diel_dist

Diel_const

Drp_M

Drp_N

Drp_Y

Score-ace

Score-ache

Score-bzr

Score-cox2

Score-dhfr

Score-gpb

Score-ster

Score-ther

Score-thr

Page 39: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

39

and GPB data sets cluster in the first quadrant near the settings T_NONE, V_M, Sw_Y, E_5, E_15, Drp_N. The plot indicates that these settings tend to be used in models with high predictive performance for these data sets. However, for the ACE, AChE, THER and steroids data sets the settings Rep_10, Diel_dist, T_SQ, and V_NONE are correlated with high model score. Furthermore, the DHFR and THR data sets group further from most of the other data sets and the only setting really close to those responses is the F_B, indicating that use of both electrostatic and steric fields is effective.

To summarize, this plot indicates that settings that work well for one par-ticular data set cannot be presumed effective for another. Thus, in order to improve the predictive performance of CoMFA for a new data set, some exploration of different settings combination is likely necessary.

3.8 Refinement of methodology Selection of more favorable CoMFA settings is effective in improving pre-dictive performance, particularly so when default settings fail to produce highly predictive models. Fortunately, y-randomization tests showed that use of this parameter optimization methodology is unlikely to result in spurious CoMFA models. Although one of the original goals of the project was to find better default settings, data analysis indicates that optimal settings are highly data set dependent.

While the methodology is effective, it does have a high demand on com-putational resources. Depending on the data set studied, evaluation of all 6120 models required up to 38 hours of CPU time. Furthermore, evaluation of all 6120 models results in the derivation of many models with similar settings and predictive performance. Thus, it may be possible to find models with equivalent predictivity by only evaluating a representative subset of the original 6120 models.

In this particular application, the MaxMin diversity selection algorithm proved to be an effective method for compiling this representative subset. Based on some set of descriptors, a diversity selection algorithm seeks to design a subset by selecting experiments that are mutually distant in the ex-perimental domain. This is done to promote coverage of the experimental space by a small number of observations.

3.8.1 The MaxMin algorithm MaxMin gets its name from the way that it selects new subset members. In its original implementation, subset selection is initiated by selecting the two points that are furthest apart. To select new subset members, the distance between candidate members and their nearest subset members is examined. The candidate point with the greatest nearest neighbor distance is then cho-

Page 40: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

40

sen for inclusion in the subset. The process of adding new subset members proceeds until the subset is of the desired size. One of the advantages with using the MaxMin algorithm is that specific candidate points can be included in the subset without modifying the overall design goals. In this work, it was prudent to begin the selection procedure with the default CoMFA model.

PLS scores derived using GIFI-PLS were used as Cartesian coordinates in MaxMin. PLS scores provide a simplified numerical representation of model settings and their corresponding predictive performance but also have the advantage of being centered and scaled. While the performance of the MaxMin algorithm written in various programming languages has been as-sessed,175 it was conveniently written in Perl in this work.

3.8.2 Compilation of a representative subset The MaxMin algorithm was used to compile a subset of 612 (10%) CoMFA models representative of the larger parent set. As subset members are added they grow closer together, eventually reaching the redundancy of the original parent set. Therefore, it is important to tabulate the order in which each sub-set member was added. This information enables assessment of the appropri-ate subset size. The subset size can conveniently be adjusted by excluding those members that were added last.

While 612 models were chosen from the original parent set of 6120 mod-els, that subset may not satisfy the main goal of the project. Here, the most important objective is to obtain the smallest subset possible that still contains models with high predictive abilities for all 9 data sets.

3.8.3 Assessment of subset size To avoid evaluating too many CoMFA models, the appropriate subset size has been estimated by comparing the best models included in subsets of various sizes with those of the full set of 6120 models. Thus, it is necessary to devise an unbiased metric to compare the performance of models across different data sets. Comparison of models derived using different methodol-ogy and data sets is a difficult task largely due to the data set dependence of many commonly used metrics.171 The model score defined in, Eq. 3 and based on a weighted sum of q2, r2

pred and r2, is useful for comparing models derived using the same data set, but does not allow comparison between data sets. For example, the best model from the COX-2 data set (model 2354) has a model score of 5.57. However, when model 2354 is used on the THR data set it receives a score of 7.78 even though there are roughly 2400 models with higher scores.

Fair comparison of models derived using different data sets was achieved by ranking models based on model score, conceptually similar to ranking procedures used elsewhere.176 Models were ranked in order of descending

Page 41: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

41

score, where the model with the highest score is ranked “1.” Model rank was used to approximate the appropriate subset size by finding the highest-ranking model for each data set, then calculating the average rank of these models. This will be referred to as the average best rank (ABR) of the sub-set. Here, the goal is to find the smallest subset with an ABR as close to “1” as possible. The smaller the ABR, the more similar in performance the best models are in both the subset and full parent set.

Figure 9 shows the relationship between ABR and subset size. The figure shows that ABR improves steadily up to a subset of roughly 300 models, beyond this point there is little improvement in ABR. Also of note is that improvement in ABR occurs in steps. As models are added to the subset, a point is reached where a model with a good rank for one or more data sets is added to the subset, resulting in a jump in ABR. For the 9 data sets studied, ABR reached a plateau of 5.9 when the subset contained 301 models. After this, there was no improvement in ABR until the subset contained 513 mod-els, where the ABR reached 5.6.

Figure 9. Relationship between average best rank (ABR) and subset size.

Evaluation of a diverse set of 301 CoMFA models allowed for the identifica-tion of CoMFA models with predictive performance similar to what was previously achieved when evaluating all 6120 models. Figures 10 and 11 show a comparison in q2 and r2

pred, respectively for models derived using default settings, settings optimized using the full parent set and settings op-timized using the diverse subset.

0

10

20

30

40

50

60

70

80

90

100

50 150 250 350 450 550 650

Subset Size

Ave

rage

Bes

t R

ank

Page 42: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

42

Figure 10. Comparison of q2 values for models derived using default set-tings and optimized settings from the diverse and full parent sets.

Figure 11. Comparison of r2pred values for models derived using default set-

tings and optimized settings from the diverse and full parent sets.

As indicated by Figures 10 and 11, the predictive statistics for models from the diverse and parent sets are virtually identical. There was essentially no loss in predictive performance associated with evaluation of 301 subset models (less than 5%) as compared with evaluation of all 6120 models from the parent set.

As may be expected, optimal models from the diverse subset were also similar to those of the parent set with regard to the settings used. Table 9 shows a comparison of optimal settings used in the diverse and parents sets.

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ACE AChE BZR COX-2 DHFR GPB Steroids THER THR

q2

Default CoMFA

Diverse Subset

Full Parent Set

-0.10

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

ACE AChE BZR COX-2 DHFR GPB Steroids THER THR

r2pred

Default CoMFA

Diverse Subset

Full Parent Set

Page 43: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

43

For brevity, several data sets have been omitted from Table 9. The data sets included were intended to give an overview of the difference in rank be-tween optimal models from the diverse and parent sets. A more complete version of the table can be found in Paper II of this thesis.

Table 9. Comparison of select models identified with diverse and full model sets.

Settinga ACEb ACEc AChEb AChEc r2

pred 0.71 0.71 0.72 0.73 q2 0.70 0.73 0.66 0.70 Score 8.24 8.39 8.11 8.34 Rank 17 1 3 1 Dielectric Function Distance Distance Distance Distance Drop Electrostatics No No ARUV No Electrostatic Maximum

45 45 15 15

Field Type Both Both Both Both H-Bonding No No No No Rep. vdW term 10 8 12 12 Steric Maximum 5 15 5 5 Switch Function No No No No Transform Indicator Indicator Indicator Indicator Vol. Av. Type Box Box None None Setting DHFRb DHFRc THRb THRc r2

pred 0.63 0.64 0.66 0.69 q2 0.67 0.68 0.71 0.71 Score 7.49 7.64 8.01 8.07 Rank 10 1 8 1 Dielectric Function Constant Distance Distance Constant Drop Electrostatics No ARUV No ARUV Electrostatic Maximum

30 5 5 5

Field Type Both Both Both Both H-Bonding No No No No Rep. vdW term 8 12 12 12 Steric Maximum 45 5 45 45 Switch Function No No No No Transform None Squared Squared None Vol. Av. Type Box Box None None a Underlined settings are those considered to be default. b Top-ranking model from subset c Top-ranking model from parent set.

Page 44: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

44

For the AChE data set, the two optimal models differed in rank by only two positions and settings were identical with the exception of the Drop Electro-statics parameter. The similarity of the models is further reflected in their predictive statistics, where they both show virtually identical internal and external predictivity. Subset optimal models for the DHFR and THR show a similar deviation from the rank of their respective parent set optimal models. They are both ranked roughly 10 positions lower than their parent set opti-mal models and therefore show similar predictive abilities. However, subset optimal models differ from the parent set optimal models more markedly in settings used than was the case for AChE. Of the 9 data sets studied, ACE differed most in rank from its parent set optimal model. Although it was ranked 17th, the subset optimal model differed in only 2 settings and still showed similar q2 and r2

pred. We were rewarded to find that the subset contained the top-ranked models

for the BZR and COX-2 data sets. It is interesting to note that for these data sets, it was possible to identify precisely the same model using the subset as was possible using the original parent set. The settings for these models can be found in Table 7.

Page 45: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

45

4. Modeling studies in HCV NS3 protease inhibitor design (Papers III and IV)

The hepatitis C virus currently represents one of the most serious threats to global health.14 This, combined with the high mutation rate and genetic di-versity177,178 of the virus has stimulated the search for new drugs. While re-search efforts have been devoted to attacking various stages of the virus life cycle, inhibition of the HCV NS3 protease is currently one of the most at-tractive. Papers III and IV of this thesis are devoted to the design of two novel classes of HCV NS3 protease inhibitors. In Paper III the more-common P2 proline residue is replaced with phenylglycine in an attempt to improve potency by allowing additional interactions in the protease active site. Paper IV focuses on reducing the peptide character of inhibitors by re-placing �-amino acids with their �-amino acid variants. The following dis-cussion will focus on the ways molecular modeling has been used to formu-late both quantitative and qualitative structure–activity relationships and aid inhibitor design.

4.1 Phenylglycine-based inhibitors The evaluation of phenylglycine as a P2 residue in HCV NS3 inhibitors was inspired by earlier preliminary work, where it was found to be better than proline in a series of tetrapeptide inhibitors. Figure 12 shows a comparison of the structures and potencies for the tetrapeptide proline- and phenylgly-cine-based compounds, showing a 23-fold more favorable potency for the P2 phenylglycine compound 2. This observation is in agreement with other published work, where phenylglycine was shown to be the best aromatic amino acid and similar in potency to 2-naphthylalanine and leucine.179

Page 46: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

46

Figure 12. Ki-values for tetrapeptide HCV NS3 protease inhibitors with proline (1) or phenylglycine (2) in the P2 position.

This considerable gain in potency indicates that inhibitors with phenylgly-cine in the P2 position are capable of interacting with the protease in ways not possible for the analogous proline-based inhibitor. Molecular docking was initially used to suggest probable ligand binding poses and to help pro-vide a qualitative assessment of protein–ligand interactions.

4.1.1 Molecular docking methodology Since biochemical testing is performed in a full-length HCV NS3 enzyme assay comprising both the helicase and protease domains, docking was also performed using the full-length NS3 protein. The NS3 protease/helicase crystal structure (PDB access code 1CU1)27 was refined and used to derive the active site used in docking. The crystal structure was subjected to con-strained minimization and all residues within 9 Å of the last 11 residues of the C-terminus were extracted for use as the docking active site. Further details regarding the derivation of the active site are available in the work of Rönn et al.180

The FLO+ docking suite181 was selected for modeling partly due to its treatment of protein flexibility, which is thought to more reasonably simulate protein–ligand interactions.182-186 A limited Monte Carlo search was per-formed to explore the accessible ligand conformational states within the active site. The FLO+ docking suite allows control over the flexibility of individual protein residues. R155 and K136 were allowed full conforma-tional flexibility without energy penalty while movement of all other resi-dues by more than 0.2 Å was penalized by 20 kJ/(mol/Å2). The 10 lowest-energy unique conformations from the Monte Carlo perturbation were re-tained and subjected to 50 steps of simulated annealing. To retain the ligand

N

ONHO

HN

OHO

OO

NH

SHOH

O

NH

OHN

ONH

OHO

O

HN

OSH

OH

O

1. Ki = 23 �M

2. Ki = 1.0 �M

Page 47: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

47

conformation obtained from the Monte Carlo step, an energy penalty (20 kJ/(mol Å2)) was applied in the simulated annealing step if the similarity distance between two sequential conformations differed by more than 0.2 Å. Due to the relatively featureless nature of the active site, several zero-order bonds were used to hold the ligand near the vicinity of the active site.

4.2.2 Full-length HCV NS3 protein active site The protease active site has been described as a relatively flat and featureless area on the surface of the protein,25,26 however, inclusion of the helicase do-main gives a more enclosed binding site at the interface of the protease and helicase domains.27 Figure 13 shows the relative positions of the protease and helicase domains as well as the position of the catalytic triad, the NS4 cofactor and C-terminus. Figure 14 shows a graphical depiction of the prote-ase and helicase residues defining the S1, S2 and S3 pockets. Near the site of catalysis, the P1 sidechain is encircled by K136, G137, S138 and S139 and the bottom of the S1 pocket is defined by the aromatic sidechain of F154. Residues from both the protease and helicase domains define the S2 pocket. In the orientation of the protein shown in Figure 14, the right and back sides of this pocket are defined by H57 and R155, respectively (protease domain), while the left side and ceiling are defined by Q526 and M485, respectively (helicase domain). Finally, the S3 pocket is located directly above the A157 residue between the protease and helicase domains.

Figure 13. Depiction of the relative positions of the helicase (green) and protease (blue) domains as well as that of the catalytic triad (magenta), NS4 cofactor (orange) and C-terminus (red). This figure was based on the crystal structure with PDB access code 1CU1.27

Page 48: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

48

Figure 14. Stereographic image showing key residues defining the S1, S2 and S3 pockets.

Serine proteases rely on three amino acids for hydrolysis of a peptide bond, which are commonly referred to as the catalytic triad. These include a his-tidine, an aspartic acid and a serine residue. The catalytic triad in the HCV NS3 protease is comprised of H57, D81 and S139. H57 and S139 can be seen in the figure but D81 is further from the S1 pocket and was omitted for clarity. The so-called oxyanion hole is another feature often referred to in HCV NS3 protease inhibitor design. The oxyanion hole is defined by the backbone NHs of G137 and S139187,188 and is an important binding site for the oxyanion in natural substrate cleavage.

4.2.3 SAR of phenylglycine-based inhibitors Docking of the tetrapeptide phenylglycine compound 2 indicated that it may participate in two interactions with the enzyme not possible in the proline-based analogue 1. The cyclic nature of the proline amino acid prevents the amide nitrogen from acting as a hydrogen bond donor. However, this is not the case for phenylglycine. As seen in Figure 15 the P2 NH is located close to the Q526 sidechain carbonyl moiety and oriented properly for hydrogen bonding to occur. The phenylglycine residue may also participate in a pi-pi stacking interaction with the imidazole sidechain of H57.

Page 49: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

49

Figure 15. Stereographic image of 1 (blue) and 2 (red) shown with the resi-dues H57 and Q526 (orange) of the HCV NS3 protein. A pi-pi stacking in-teraction is possible between the H57 imidazole sidechain and the phenyl-glycine P2 sidechain (centroid–centroid distance of 3.5 Å). Structures of both the inhibitor and amino acid residues were obtained from docking studies and the positions of the protein atoms were obtained from the docking re-sults of 2.

To explore the potential of phenylglycine as a P2 residue, a total of 26 tripep-tides were synthesized and tested. Various P3 (valine and tert-leucine) and P1 (norvaline, ACCA, vinyl-ACCA, and difluoro-Abu) sidechains and C-terminal groups (carboxylic acid and acyl sulfonamide) were tested in con-junction with a number of phenylglycine substituents, many of which have been used successfully in the past with proline-based inhibitors. Macrocycli-zation was also explored by forming 15-membered rings connecting the P1 and P3 sidechains. An overview of those structures and their measured activi-ties is presented in Table 10. For brevity, they will not all be reproduced here but may be found in Paper III of this thesis.

H57 H57

Q526Q526

2.17 Å 2.17 Å

Page 50: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

50

Table 10. Structure and potency of selected phenylglycine-based tripeptides.

NH

R1

HNO

HN

R2 R4

OR3

O

O

O

O

Cmpda R1 R2 R3 R4 Ki ± SD (�M)

3 (44a) NH

SO O

5.4 ± 0.3

4 (42)

N

Cl

NH

SO O

2.6 ± 0.1

5 (40a) N

Cl

NH

SO O

0.41 ± 0.09

6 (37) NO

NH

SO O

0.18 ± 0.04

7 (46) NO

NH

SO O

0.20 ± 0.04

8 (45)

NONH

SO O

0.28 ± 0.02

9 (36) NO

NH

SO O

1.2 ± 0.2

10 (39)

F

F

NONH

SO O

0.78 ± 0.1

11 (60)

NONH

SO O

0.076 ± 0.02

12 (61)

N

Cl

NH

SO O

0.074 ± 0.01

a Compound name shown in parentheses reflects the number assigned in paper III.

Page 51: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

51

Truncation of tetrapeptide 2 combined with use of tert-leucine, vinyl-ACCA, cyclopropyl acyl sulfonamide and methoxy as P3, P1, C-terminal and P2 sidechain extension groups, respectively, resulted in inhibitor 3 (Ki = 5.4 �M). Replacement of the P2 methoxy sidechain extension with 2-chloropyridin-6-yloxy and 3-chloroisoquinoline furnished 4 and 5, showing a 2- and 13-fold improvement in potency, respectively. A further 2-fold im-provement in potency was achieved with the 7-methoxy-2-phenylquinolin-4-yloxy P2 sidechain extension, providing 6 with a Ki of 0.18 �M and demon-strating a 30-fold improvement in potency over 3.

In an attempt to further improve potency, several P1 and P3 sidechains were explored. Although in other work P3 tert-leucine has been more suc-cessful than valine,189-191 valine was still tested in the P3 position. 6 was modified, substituting its P3 tert-leucine with valine, furnishing 7, however, no change in potency was observed. Further modifications were made, using norvaline in the P1 position as well as phenyl acyl sulfonamide in the C-terminal, furnishing 8 and demonstrating a 1.5-fold loss in potency. Other attempts at P1 optimization were unsuccessful, rendering inhibitors 9 (Ki = 1.2 �M) and 10 (Ki = 0.78 �M) with ACCA and difluoro-Abu P1 residues, respectively.

Inhibitors were cyclized by joining the P3 and P1 sidechains, forming a 15-member macrocycle, which has been effective in other HCV NS3 prote-ase inhibitors such as ciluprevir (Figure 1).192,193 This was successful in pro-ducing the most potent inhibitors in the series, 11 and 12, with Ki values of 0.076 and 0.074 �M, respectively, and representing a 2-fold improvement over 6.

In agreement with NMR studies on phenylglycine compounds,194 docking suggested that the phenylglycine sidechain of 2 probably lies perpendicular to the inhibitor backbone. In this conformation, the aromatic phenyl ring is parallel to the imidazole ring of H57 and within range for a pi-pi stacking interaction. Docking indicates that extension of the P2 sidechain with bulky groups, such as 7-methoxy-2-phenylquinolin-4-yloxy, may introduce some strain in the ether linker. Strain in the ether linker may be relieved by rota-tion of the phenylglycine sidechain to a conformation parallel to the inhibitor backbone. This strain, combined with the slightly larger size and flexibility of the P2 sidechain may account for why inhibitors in the phenylglycine se-ries did not reach the potency observed in similar P2-proline compounds.

4.2 �-Amino acid replacement Currently, almost all active NS3 protease inhibitors have been designed us-ing peptidomimetic techniques, using the natural substrate as a starting point. While all compounds known to have entered clinical trials have been de-signed using peptide truncation, non-natural amino acids, sidechain exten-

Page 52: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

52

sions and/or cyclizations (Figure 1), all have been designed using �-amino acid backbones. In an effort to further reduce the peptide character of NS3 protease inhibitors, we have designed a series of novel tripeptide inhibitors comprised of a mix of �- and �-amino acids.

�-Amino acids differ from their �-amino acid counterparts in the spacing between the amine and carboxyl termini. As seen in Figure 16, �-amino ac-ids have an extra methylene carbon between the termini and sidechain sub-stitution may occur on either of these carbons, allowing for the �2 and �3 variants. While the insertion of a methylene group into the peptide backbone introduces additional degrees of conformational freedom,195 it also provides an additional site for ligand optimization.196

Figure 16. Structure of �-, �2- and �3-amino acid backbone structures. The insertion of a �-amino acid into a peptide backbone has been successful in producing potent inhibitors resistant to proteolysis196-200 and has been used in an attempt to improve the selectivity of serine protease inhibitors.201 In the design of inhibitors of the HCV NS3 protease, Colarusso et al. have also published the structure of one compound containing a C-terminal �-amino acid.202 Several studies have also shown that exchange of native �-amino acids with �-variants results in ligands with improved resistance to cleav-age.203,204

The synthetic accessibility of �-, �-mixed peptides and their noted resis-tance to proteolysis led us to explore their potential in HCV NS3 protease inhibitor design. To explore the impact of �-amino acid insertion on inhibi-tory potency, a �-amino acid scan was performed, inserting one ��-amino acid into the P1, P2 or P3 positions of two tripeptide reference compounds. The reference inhibitors,180,205 shown in Figure 17, contain both carboxylic acid and acyl sulfonamide C-terminal groups and were previously found to be of reasonable potency.

NH O

NH

O

�3-Alanine�-Alanine

NH

O

�2-Alanine

Page 53: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

53

Figure 17. Structure and potency of �-amino acid reference inhibitors.

4.2.1 Hydrogen bonding network of HCV NS3 protease inhibitors The reference inhibitors 13 and 14 are thought to form the hydrogen-bonding network as depicted in Figure 18, which is consistent with hydrogen boding indicated in other work. Inhibitors of the HCV NS3 protease are thought to form an extended anti-parallel �-strand with the protease backbone, similar to that seen in the full-length protein crystal structure,27 allowing for hydro-gen bonding between the P3 residue and protease residue A157. Residues G137 and S139 define the so-called oxyanion hole,187,188 which is thought to be a key site for protein–ligand interaction. Hydrogen bonds have also been noted between the ligand and the R155 backbone.193,206

Figure 18. Protein–ligand hydrogen bonding interactions predicted for 13.

N

ONHO

O

O

N

O

O

HN

OH

ON

ONHO

O

O

N

O

O

HN

NH

O

SO O

13 Ki 0.32 �M 14 Ki 0.055 �M

N

ONHO

O

O

N

O

O

HN

OH

O

NH

ONH

NH2HN

NHO

HN

NH

HN

OHO

O

OHO

R155

S139

S138

A157

G137

Page 54: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

54

4.2.2 Biochemical test results Compounds 15-21 were tested in an HCV NS3 enzyme assay comprising the full-length (both protease and helicase) NS3 protein. Chemical structures and their biochemical test results (shown as Ki values) are shown in Table 11. Table 11. Structures and biochemical test results for tripeptides containing a �-amino acid.

Cmpd.a Scaffold R Ki ± SD

(�M)

15 (21) O

OH 18 ± 5

16 (24)

O

NH

SO

O

14 ± 7

17 (2)

N

ONHO

O

O

N

O

O

HN R

O

OHO 1.0 ± 0.6

18 (22) O

OH 61 ± 13

19 (25) N

ONHO

O

O

N

O

NH

O

R

O

NH

SO

O

1.8 ± 0.5

20 (23) O

OH 17 ± 3

21 (26) N

O

O

N

O

O

HN R

HN

OO

O

NH

SO

O

1.2 ± 0.4

a Compound name in parenthesis reflects the number assigned in paper IV

4.2.3 SAR of �-amino acid-containing inhibitors Biochemical testing indicated that an unfortunate result of �-amino replace-ment was an overall loss in potency as compared to the reference inhibitors 13 and 14. To help understand how changes in ligand structure led to differ-

Page 55: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

55

ences in test results, the inhibitors presented in Table 11 were subjected to flexible docking using the methodology given in Section 4.1.1.

Acyl sulfonamide inhibitors with �-amino acids in the P3 and P2 positions (21 and 19, respectively) were equipotent (Ki = 1.2 and 1.8 �M, respec-tively). However �-amino acid insertion in the P1 position in inhibitor 16 resulted in a 7-fold loss in potency (Ki = 14 �M).

Since replacement of an �-amino acid with a �-variant results in a longer, more flexible inhibitor, the use of C-terminal carboxylic acids was also ex-plored. Inhibitors comprising an acyl sulfonamide C-terminal moiety were generally more potent than their carboxylic acid counterparts, which is in agreement with previous work.180,207,208 Carboxylic acids with �-amino acids in the P3 (20, Ki = 17 �M) and P1 (15, Ki = 18 �M) positions were equipo-tent. �-amino acid insertion in the P2 position proved to be detrimental to potency, rendering 18, with a Ki = 61 �M and 4-fold less potent than 15 and 20.

Whereas incorporation of a �-amino acid in the P1 position in the acyl sul-fonamide series resulted in the greatest loss in potency, �-amino acid inser-tion in the P2 position was most detrimental to activity for the carboxylic acids. 18, with a measured activity of 61 �M, is approximately 4-fold lower in potency than 15 and 20. Docking studies indicated that insertion of the �-amino acid in the P2 position disrupted the hydrogen bond with R155, form-ing instead a hydrogen bond with the terminal NH of the K136 sidechain, which in turn disrupted the C-terminal hydrogen bonds in the oxyanion hole.

Although all �-amino acid-containing inhibitors were of lower potency than the reference inhibitors, docking indicated that the inhibitors represent-ing the greatest loss in potency (18 and 16) may have suffered from poor interactions in the oxyanion hole. To gain further understanding of the oxyanion hole interaction, 17 was designed, comprising an �-ketoacid C-terminal group. Docking indicated that the �-keto group would be positioned appropriate for interaction in the oxyanion hole, as seen in Figure 19.

Page 56: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

56

Figure 19. Depiction of the hydrogen-bonding network seen in docking studies of 17. The P3 valine residue is seen lying anti-parallel to A157 of the protease. The P1 NH is located close to the R155 carbonyl and the �-keto C-terminal group is seen hydrogen bonding with G137 and S139. Hydrogen bond distances varied between 2.8 and 3.4 Å. Due to the steric hindrance caused by the bulky cyclopropyl P1 sidechain, inhibitors such as 17 may not function as electrophilic inhibitors.207,209 Simi-lar compounds with a ketoamide C-terminal group and an �,�-disubstituted P1 sidechain have been shown to be of low potency.210,211 In contrast to the loss in potency observed in structurally similar electrophilic inhibitors, bio-chemical testing indicated an improvement in potency. 17 (Ki = 1.0 �M), with an �-keto acid C-terminal group and cyclopropyl P1 sidechain is roughly 18-fold more potent than the corresponding P1 �-amino acid-containing inhibitor 15. This improvement in potency supports the impor-tance of the oxyanion hole interaction and indicates that 17 likely binds re-versibly with the protease.

Page 57: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

57

4.2.4 CoMFA modeling CoMFA modeling was performed using all relevant tripeptide HCV NS3 protease inhibitors previously published by our research group180,208,212 as well as the �-amino acid-containing inhibitors under investigation. Molecu-lar docking provided compound alignment and 3-dimensional structures. Prior to use in CoMFA, structures were subjected to 10 steps of minimiza-tion using the Tripos force field56 and partial atomic charges were calculated using the Gasteiger-Hückel method.153,213 The default CoMFA region and a C-sp3 probe atom was used for all CoMFA models.

Although a total of 75 HCV NS3 inhibitors were available for CoMFA model derivation, preliminary investigations indicated that a local model, based only on proline-based inhibitors, showed better predictive perform-ance than the larger global model including both proline- and phenylglycine-based inhibitors. After removal of the phenylglycine-based inhibitors 54 compounds remained, comprising various P1, P3 and C-terminal groups. The structures and affinities of these inhibitors are available in the supplementary material of Paper IV. To find more suitable CoMFA parameters and thus improve predictive performance, our CoMFA parameter optimization meth-odology was used.214

Training and test set division was performed using hierarchical clustering using CoMFA fields and biological activity data (expressed as pKi ). Two-thirds of the compounds (36) were selected for use in the training set and the remaining one-third (18) of compounds were set aside as an external valida-tion set.

4.2.4.1 Comparison of default and enhanced CoMFA models Although it was possible to derive a satisfactory CoMFA model using de-fault parameters (q2 = 0.31, r2

pred = 0.56), predictive performance was en-hanced using our parameter optimization methodology (q2 = 0.48, r2

pred = 0.68). The improvement in predictive performance may be visualized in Figure 20, where the prediction of test set compounds using the default and optimized models is compared. The improved model differed from the de-fault model in the Electrostatic (“60”) and Steric (“15”) cutoffs, Transform (“Indicator”), Switch Function (“No”), Volume Averaging Type (“Box”) and the Drop Electrostatics (“No”) setting.

Page 58: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

58

Figure 20. Experimental versus predicted pKi values for test set compounds showing a comparison of the predictive quality of the default and improved CoMFA models. The placement of points relative to the line gives an indica-tion of the predictive accuracy of each model.

In addition to its merit as a predictive tool, visualization of CoMFA fields gives an indication of which ligand features are important for potency. Al-though CoMFA models are developed without knowledge of the protein structure, fields may be overlaid with the protein active site and provide valuable information for ligand optimization as well as model validation. Figure 21 shows the steric and electrostatic contour maps overlaid with the protein surface.

-3

-1

1

3

-3.0 -1.0 1.0 3.0

Predicted pK i

Exp

erim

enta

l p

Ki

CoMFA Opt.

CoMFA Def.

Page 59: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

59

Figure 21. Steric and electrostatic contours derived from our improved CoMFA model overlaid with the HCV NS3 protein surface and 17. Electro-static maps are shown in red and blue; red regions are those where the pres-ence of electronegative moieties are associated with high potency and blue surfaces indicate the opposite. Steric maps are shown in green and yellow; green surfaces indicate areas where steric bulk is favorable while yellow surfaces indicated the opposite.

Some notable features of Figure 21 include the red surface located between S139 and Q41 sidechains, the blue surface near the R155 backbone carbonyl and the green surfaces near the M485 and K136 sidechains. Docking showed that for C-terminal acyl sulfonamide compounds at least one of the sulfone oxygens is located in or near the red surface between the S139 and Q41 sidechains. The surface can be rationalized on the basis that compounds comprising C-terminal acyl sulfonamides are often of higher potency than the corresponding carboxylic acids. The blue region near the R155 backbone carbonyl and the P1 nitrogen may indicate the importance of hydrogen bond-ing between these groups. Finally, the green surfaces near M485 and K136 indicate the importance of hydrophobic interactions in these areas.

Page 60: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

60

4.2.5 Summary Although biochemical testing showed that the �-amino acid-containing in-hibitors produced in this study were of lower potency than their reference �-tripeptides, several inhibitors were of �M potency. Biochemical test results indicate the importance of oxyanion hole interactions, as demonstrated by �-ketoacid 17, where the addition of a carbonyl group in the �-position re-sulted in a 18-fold improvement over 15. (15 is comprised of a P1 �-amino acid and C-terminal carboxylic acid.) Docking indicated that, depending on the position of �-amino acid replacement, some ligand hydrogen bond do-nors or acceptors may be unsatisfied, which is known to be detrimental to potency.215 The loss in potency associated with �-amino acid insertion may also be due to their greater degree of conformational flexibility as compared to �-peptides. Docking also suggested that inhibitors were least sensitive to �-amino acid insertion in the P3 position. Information gained from docking and the subsequent CoMFA models should prove useful for future ligand optimization.

Page 61: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

61

5. Concluding remarks

This thesis has focused on the development of methodology for improving the predictive performance of CoMFA and modeling studies of HCV NS3 protease inhibitors. Development of improved CoMFA methodology:

� It has been possible to find CoMFA models with enhanced predic-

tive performance by finding suitable parameters. This was initially done by evaluating 6120 CoMFA models and found to consistently improve models for nine data sets.

� This methodology has been validated by various means and found to

be unlikely to result in spurious CoMFA models, indicating its gen-eral applicability.

� The CoMFA improvement methodology was refined by selecting a

diverse subset of the original 6120 CoMFA models using the Max-Min diversity selection algorithm. Results indicate that by evaluating only 5% (~350 models) of the original 6120 models similar im-provements in predictive performance were attainable for these 9 data sets.

Modeling of HCV NS3 protease inhibitors:

� Molecular docking investigations of the P2-phenylglycine based in-

hibitors aided in understanding their SAR. Although phenylglycine-based inhibitors may participate in two interactions not possible for analogous proline-based compounds, docking indicated that the lar-ger and more flexible phenylglycine group may not fit the protein as well.

� Docking studies on the �3-amino acid-substituted inhibitors sug-

gested that the P3 position is least sensitive to backbone modifica-tion. The study also stressed the importance of retaining electrostatic interactions in the oxyanion hole.

Page 62: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

62

� Docking also provided predicted bioactive conformations and the alignment necessary for the derivation of CoMFA models. Applica-tion of the CoMFA improvement methodology was successful in improving q2 from 0.31 to 0.48 and r2

pred from 0.56 to 0.68.

Page 63: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

63

6. Acknowledgements

I would like to express my sincere gratitude to the following individuals: Doc. Anders Karlén, my supervisor, for your patience, helpful advice and encouragement. Thanks also for providing such a nice modeling group! Dr. Anja Sandstöm, my co-supervisor, for your openness and appreciation for modeling and for getting me involved in so many interesting HCV pro-jects. Prof. Anders Hallberg, the department head, for providing a cheerful and open atmosphere at work. Gunilla Eriksson, the department secretary, for holding the entire department together with your great organizational skill. Dr. Wesley Schaal, for taking the time to introduce me to CoMFA, computer programming and Unix system administration. I’ve learned a lot from you and appreciate your efforts. Dr. Christian Sköld, my officemate, friend and colleague. Thanks for pa-tiently answering all of my stupid questions over the years and for all of your interesting “Christianisms” that lack all rational basis. One of these days I’ll find a way to annoy you. Dr. Robert Rönn: Thanks for all of your patient and helpful HCV coaching over the past few months! All other members of the Comp Chem group, both past and present: Susi, Daniel, Yogesh, Micael, Anna, Anneli, Patrik and Aparna. Good stuff, keep on truckin’. Sorin Srbu for your continuing cooperation in sys admin. Fun times. To the HCV group: Anja, Eva, Anna, Johanna, Pernilla and Robert for con-tinuing to provide a real application for molecular modeling.

Page 64: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

64

To the rest of OFK: Thanks for all of the fun parties, coffee breaks and inter-esting discussions. Prof. Joe Francisco for the wonderful introduction to computational chemis-try. Susan Schlachtenhaufen for inspiring me to explore other cultures and lan-guages. My father, for encouraging hard work, my mother for encouraging creativity, my brothers for all of the hilarity over the years and my grandma for the fantastic memories. Ralph and Gunilla for all the fun weekends, vacations and adventures in Sweden. To Cecilia for all of your help, love and support. I couldn’t have done it without you. To Clara for making life even more worthwhile.

Page 65: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

65

7. References

(1) Jorgensen, W. L. The Many Roles of Computation in Drug Discov-ery. Science 2004, 303, 1813-1818.

(2) Barril, X.; Soliva, R. Molecular Modelling. Mol. Biosyst. 2006, 2, 660-681.

(3) Tramontano, A. The role of molecular modelling in biomedical re-search. FEBS Lett. 2006, 580, 2928-2934.

(4) Brooijmans, N.; Kuntz, I. D. Molecular Recognition and Docking Algorithms. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 335-373.

(5) Sousa, S. F.; Fernandes, P. A.; Ramos, M. J. Protein-Ligand Dock-ing: Current Status and Future Challenges. Proteins: Struct., Funct., Bioinf. 2006, 65, 15-26.

(6) Shoichet, B. K.; McGovern, S. L.; Wei, B.; Irwin, J. J. Lead discov-ery using molecular docking. Curr. Opin. Chem. Biol. 2002, 6, 439-446.

(7) Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Ap-plications. Nat. Rev. Drug Disc. 2004, 3, 935-949.

(8) Warren, G. L.; Andrews, C. W.; Capelli, A.-M.; Clarke, B.; LaLonde, J.; Lambert, M. H.; Lindvall, M.; Nevins, N.; Semus, S. F.; Senger, S.; Tedesco, G.; Wall, I. D.; Woolven, J. M.; Peishoff, C. E.; Head, M. S. A Critical Assessment of Docking Programs and Scoring Functions. J. Med. Chem. 2006, 49, 5912-5931.

(9) Kubinyi, H. QSAR and 3D QSAR in drug design. Part 1: methodol-ogy. Drug Discov. Today 1997, 2, 457-467.

(10) Kubinyi, H. QSAR and 3D QSAR in drug design. Part 2: applica-tions and problems. Drug Discov. Today 1997, 2, 538-546.

(11) Prathipati, P.; Dixit, A.; Saxena, A. K. Computer-aided drug design: integration of structure-based and ligand-based approaches in drug design. Curr. Comput.-Aided Drug Des. 2007, 3, 133-148.

(12) Choo, Q. L.; Kuo, G.; Weiner, A. J.; Overby, L. R.; Bradley, D. W.; Houghton, M. Isolation of a cDNA Clone Derived from a Blood-borne Non-A, Non-B Viral Hepatitis Genome. Science 1989, 244, 359-362.

(13) Chisari, F. V. Unscrambling hepatitis C virus-host interactions. Na-ture 2005, 436, 930-932.

(14) Shepard, C. W.; Finelli, L.; Alter, M. J. Global Epidemiology of Hepatitis C Virus Infection. Lancet Infect. Dis. 2005, 5, 558-567:.

(15) Brown, R. S. Hepatitis C and Liver Transplantation. Nature 2005, 436, 973-978.

Page 66: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

66

(16) Kim, W. R. The Burden of Hepatitis C in the United States. Hepa-tology 2002, 36, S30-34.

(17) Pearlman, B. L. Hepatitis C Treatment Update. Am. J. Med. 2004, 117, 344-352.

(18) Houghton, M.; Abrignani, S. Prospects for a vaccine against the hepatitis C virus. Nature 2005, 436, 961-966.

(19) Pawlotsky, J.-M.; Chevaliez, S.; McHutchison, J. G. The Hepatitis C Virus Life Cycle as a Target for New Antiviral Therapies. Gastroen-terology 2007, 132, 1979-1998.

(20) Pawlotsky, J.-M. Treatment of hepatitis C: don't put all your eggs in one basket! Gastroenterology 2007, 132, 1611-1615.

(21) Bartenschlager, R.; Lohmann, V. Replication of Hepatitis C virus. J. Gen. Virol. 2000, 81, 1631-1648.

(22) Grakoui, A.; McCourt, D. W.; Wychowski, C.; Feinstone, S. M.; Rice, C. M. Characterization of the Hepatitis C Virus-Encoded Ser-ine Proteinase: Determination of Proteinase-Dependent Polyprotein Cleavage Sites. J. Virol. 1993, 67, 2832-2843.

(23) Bartenschlager, R.; Ahlborn-Laake, L.; Mous, J.; Jacobsen, H. Non-structural Protein 3 of the Hepatitis C Virus Encodes a Serine-Type Proteinase Required for Cleavage at the NS3/4 and NS4/5 Junctions. J. Virol. 1993, 67, 3835-3844.

(24) Tomei, L.; Failla, C.; Santolini, E.; De Francesco, R.; La Monica, N. NS3 is a Serine Protease Required for Processing of Hepatitis C Vi-rus Polyprotein. J. Virol. 1993, 67, 4017-4026.

(25) Kim, J. L.; Morgenstern, K. A.; Lin, C.; Fox, T.; Dwyer, M. D.; Landro, J. A.; Chambers, S. P.; Markland, W.; Lepre, C. A.; O'Mal-ley, E. T.; Harbeson, S. L.; Rice, C. M.; Murcko, M. A.; Caron, P. R.; Thomson, J. A. Crystal Structure of the Hepatitis C virus NS3 Protease Domain Complexed with a Synthetic NS4A Cofactor Pep-tide. Cell 1996, 87, 343-355.

(26) Love, R. A.; Parge, H. E.; Wickersham, J. A.; Hostomsky, Z.; Habuka, N.; Moomaw, E. W.; Adachi, T.; Hostomska, Z. The Crys-tal Structure of Hepatitis C Virus NS3 Proteinase Reveals a Trypsin-like Fold and a Structural Zinc Binding Site. Cell 1996, 87, 331-342.

(27) Yao, N.; Reichert, P.; Taremi, S. S.; Prosise, W. W.; Weber, P. C. Molecular Views of Viral Polyprotein Processing Revealed by the Crystal Structure of the Hepatitis C Virus Bifunctional Protease-Helicase. Structure 1999, 7, 1353-1363.

(28) Foy, E.; Li, K.; Wang, C.; Sumpter, R., Jr.; Ikeda, M.; Lemon, S. M.; Gale, M., Jr. Regulation of Interferon Regulatory Factor-3 by the Hepatitis C Virus Serine Protease. Science 2003, 300, 1145-1148.

(29) Foy, E.; Li, K.; Sumpter, R., Jr.; Loo, Y.-M.; Johnson, C. L.; Wang, C.; Fish, P. M.; Yoneyama, M.; Fujita, T.; Lemon, S. M.; Gale, M., Jr. Control of Antiviral Defenses Through Hepatitis C Virus Disrup-tion of Retinoic Acid-Inducible Gene-I Signaling. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 2986-2991.

Page 67: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

67

(30) Li, K.; Foy, E.; Ferreon, J. C.; Nakamura, M.; Ferreon, A. C. M.; Ikeda, M.; Ray, S. C.; Gale, M., Jr.; Lemon, S., M. Immune evasion by hepatitis C virus NS3/4A protease-mediated cleavage of the Toll-like receptor 3 adaptor protein TRIF. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 2992-2997.

(31) Johnson, C. L.; Owen, D. M.; Gale, M., Jr. Functional and Thera-peutic Analysis of Hepatitis C Virus NS3.4A Protease Control of Antiviral Immune Defense. J. Biol. Chem. 2007, 282, 10792-10803.

(32) Lamarre, D.; Anderson, P. C.; Bailey, M.; Beaulieu, P.; Bolger, G.; Bonneau, P.; Boes, M.; Cameron, D. R.; Cartier, M.; Cordingley, M. G.; Faucher, A.-M.; Goudreau, N.; Kawai, S. H.; Kukolj, G.; La-gace, L.; LaPlante, S. R.; Narjes, H.; Poupart, M.-A.; Rancourt, J.; Sentjens, R. E.; St. George, R.; Simoneau, B.; Steinmann, G.; Thibeault, D.; Tsantrizos, Y. S.; Weldon, S. M.; Yong, C.-L.; Llinas-Brunet, M. An NS3 Protease Inhibitor with Antiviral Effects in Hu-mans Infected with Hepatitis C Virus. Nature 2003, 426, 186-189.

(33) Lin, K.; Perni, R. B.; Kwong, A. D.; Lin, C. VX-950, A Novel Hepatitis C Virus (HCV) NS3-4A Protease Inhibitor, Exhibits Potent Antiviral Activities in HCV Replicon Cells. Antimicrob. Agents Chemother. 2006, 50, 1813-1822.

(34) Press Release at the Vertex Website: http://www.schering-plough.com/schering_plough/news/release.jsp?releaseID=983489 (Accessed April, 2007).

(35) Malcolm, B. A.; Liu, R.; Lahser, F.; Agrawal, S.; Belanger, B.; But-kiewicz, N.; Chase, R.; Gheyas, F.; Hart, A.; Hesk, D.; Ingravallo, P.; Jiang, C.; Kong, R.; Lu, J.; Pichardo, J.; Prongay, A.; Skelton, A.; Tong, X.; Venkatraman, S.; Xia, E.; Girijavallabhan, V.; Njoroge, F. G. SCH 503034, A Mechanism-Based Inhibitor of Hepatitis C Virus NS3 Protease, Suppresses Polyprotein Maturation and Enhances the Antiviral Activity of Alpha Interferon in Replicon Cells. Antimicrob. Agents Chemother. 2006, 50, 1013-1020.

(36) Press Release at the Schering-Plough Website: http://www.schering-plough.com/schering_plough/news/release.jsp?releaseID=983489 (Accessed April, 2007).

(37) Condroski, K. R.; Zhang, H.; Seiwart, S. D.; Ballard, J. A.; Bernat, B. A.; Brandhuber, B. J.; Andrews, S. W.; Josey, J. A.; Blatt, L. M. Structure-Based Design of Novel Isoindoline Inhibitors of HCV NS3/4A Protease and Binding Mode Analysis of ITMN-191 by X-ray Crystallography. Digestive Disease Week, May 20-25, Los Ange-les 2006, Poster T1794.

(38) Press Release at the InterMune Website: http://www.corporate-ir.net/ireye/ir_site.zhtml?ticker=ITMN&script=410&layout=6&item_id=983548 (Accessed May, 2007).

(39) Press Release at the Medivir Website: http://www.medivir.se/v3/en/ir_media/press_releases.aspx?id=121 (Accessed May, 2007).

Page 68: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

68

(40) Richardson, B. W. Lectures on Experimental and Practical Medi-cine. Medical Times and Gazette 1869, 2, 703.

(41) Free, S. M., Jr.; Wilson, J. W. A Mathematical Contribution to Structure-Activity Studies. J. Med. Chem. 1964, 7, 395-399.

(42) Hansch, C.; Fujita, T. �-�-�Analysis; A Method for the Correlation of Biological Activity and Chemical Structure. J. Am. Chem. Soc. 1964, 86, 1616-1626.

(43) Kubinyi, H. From Narcosis to Hyperspace: The History of QSAR. Quant. Struct.-Act. Relat. 2002, 21, 348-356.

(44) Green, S. M.; Marshall, G. R. 3D-QSAR: a current perspective. Trends Pharmacol. Sci. 1995, 16, 285-291.

(45) Oprea, T. I.; Waller, C. L. Theoretical and Practical Aspects of Three-Dimensional Quantitative Structure-Activity Relationships. Reviews in Computational Chemistry; Wiley-VCH: New York, 1997; pp 127-182.

(46) Doweyko, A. M. 3D-QSAR illusions. J. Comput.-Aided Mol. Des. 2004, 18, 587-596.

(47) Oprea, T. I. 3D QSAR Modeling in Drug Design. In Computational Medicinal Chemistry for Drug Discovery; Marcel Dekker: New York, 2004; pp 571-616.

(48) Simon, Z.; Badilescu, I.; Racovitan, T. Mapping of Dihydrofolate-Reductase Receptor Site by Correlation with Minimal Topological (Steric) Differences. J. Theor. Biol. 1977, 66, 485-495.

(49) Simon, Z.; Dragomir, N.; Plauchithiu, M. G.; Holban, S.; Glatt, H.; Kerek, F. Receptor site mapping for cardiotoxic aglycones by the minimal steric difference method. Eur. J. Med. Chem. 1980, 15, 521-527.

(50) Hopfinger, A. J. A QSAR Investigation of Dihydrofolate Reductase Inhibition by Baker Triazines Based Upon Molecular Shape Analy-sis. J. Am. Chem. Soc. 1980, 102, 7196-7206.

(51) Ghose, A. K.; Crippen, G. M. Use of Physicochemical Parameters in distance Geometry and Related Three-Dimensional Quantitative Structure-Activity Relationships: A Demonstration Using Es-cherichia Coli Dihydrofolate Reductase Inhibitors. J. Med. Chem. 1985, 28, 333-346.

(52) Cramer, R. D., III; Bunce, J. D. The DYLOMMS Method: Initial Results From A Comparative Study of Approaches to 3D QSAR. Pharmacochem. Libr. 1987, 10, 3-12.

(53) Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. J. SIAM J. Sci. Stat. Com-put. 1984, 5, 735.

(54) Cramer, R. D., III; Patterson, D. E.; Bunce, J. D. Comparative Mo-lecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 1988, 110, 5959-5967.

(55) Tripos Inc. SYBYL; 7.0-7.3 ed.: 1699 South Hanley Rd., St. Louis, Missouri, 63144, USA.

Page 69: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

69

(56) Clark, M.; Cramer, R. D., III; Van Opdenbosch, N. Validation of the General Purpose Tripos 5.2 Force Field. J. Comput. Chem. 1989, 10, 982-1012.

(57) Goodford, P. J. A Computational Procedure for Determining Ener-getically Favorable Binding Sites on Biologically Important Macro-molecules. J. Med. Chem. 1985, 28, 849-857.

(58) Kim, K. H.; Greco, G.; Novellino, E. A Critical Review of Recent CoMFA Applications. Perspect. Drug Discovery Des. 1998, 12/13/14, 257-315.

(59) Norinder, U. Recent Progress in CoMFA Methodology and Related Techniques. Perspect. Drug Discovery Des. 1998, 12/13/14, 25-39.

(60) Novellino, E.; Fattorusso, C.; Greco, G. Use of Comparative Mo-lecular Field Analysis and Cluster Analysis in Series Design. Phar-maceutica Acta Helvetiae 1995, 70, 149-154.

(61) Golbraikh, A.; Tropsha, A. Predictive QSAR Modeling Based on Diversity Sampling of Experimental Datasets for the Training and Test Set Selection. Mol. Diversity 2002, 5, 231-243.

(62) Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y.-D.; Lee, K.-H.; Tropsha, A. Rational Selection of Training and Test Sets for the Development of Validated QSAR Models. J. Comput.-Aided Mol. Des. 2003, 17, 241-253.

(63) Clark, R. D. Boosted Leave-Many-Out Cross-Validation: the Effect of Training and Test Set Diversity on PLS Statistics. J. Comput.-Aided Mol. Des. 2003, 17, 265-275.

(64) Leonard, J. T.; Roy, K. On Selection of Training and Test Sets for the Development of Predictive QSAR Models. QSAR Comb. Sci. 2006, 25, 235-251.

(65) Wang, R.; Gao, Y.; Liu, L.; Lai, L. All-Orientation Search and All-Placement Search in Comparative Molecular Field Analysis. J. Mol. Model. 1998, 4, 276-283.

(66) Cho, S. J.; Tropsha, A. Cross-Validated R2-Guided Region Selection for Comparative Molecular Field Analysis: A Simple Method To Achieve Consistent Results. J. Med. Chem. 1995, 38, 1060-1066.

(67) Pastor, M.; Cruciani, G.; Clementi, S. Smart Region Definition: A New Way to Improve the Predictive Ability and Interpretability of Three-Dimensional Quantitative Structure-Activity Relationships. J. Med. Chem. 1997, 40, 1455-1464.

(68) Tropsha, A.; Cho, S. J. Cross-Validated R2 Guided Region Selection for CoMFA Studies. Perspect. Drug Discovery Des. 1998, 12/13/14, 57-69.

(69) Bucholtz, E. C.; Tropsha, A. The Effect of Region Size on CoMFA Analyses. Med. Chem. Res. 1999, 9, 675-685.

(70) DePriest, S. A.; Mayer, D.; Naylor, C. B.; Marshall, G. R. 3D-QSAR of Angiotensin-Converting Enzyme and Thermolysin Inhibi-tors: A Comparison of CoMFA Models Based on Deduced and Ex-perimentally Determined Active Site Geometries. J. Am. Chem. Soc. 1993, 115, 5372-5384.

Page 70: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

70

(71) Klebe, G.; Abraham, U. On the Prediction of Binding Properties of Drug Molecules by Comparative Molecular Field Analysis. J. Med. Chem. 1993, 36, 70-80.

(72) Waller, C. L.; Oprea, T. I.; Giolitti, A.; Marshall, G. R. Three-Dimensional QSAR of Human Immunodeficiency Virus (I) Protease Inhibitors. 1. A CoMFA Study Employing Experimentally-Determined Alignment Rules. J. Med. Chem. 1993, 36, 4152-4160.

(73) Waller, C. L.; Marshall, G. R. Three-Dimensional Quantitative Structure-Activity Relationship of Angiotesin-Converting Enzyme and Thermolysin Inhibitors. II. A Comparison of CoMFA Models Incorporating Molecular Orbital Fields and Desolvation Free Ener-gies Based on Active-Analog and Complementary-Receptor-Field Alignment Rules. J. Med. Chem. 1993, 36, 2390-2403.

(74) Oprea, T. I.; Waller, C. L.; Marshall, G. R. Three-Dimensional Quantitative Structure-Activity Relationship of Human Immunode-ficiency Virus (I) Protease Inhibitors. 2. Predictive Power Using Limited Exploration of Alternate Binding Modes. J. Med. Chem. 1994, 37, 2206-2215.

(75) Kroemer, R. T.; Ettmayer, P.; Hecht, P. 3D-Quantitative Structure-Activity Relationships of Human Immunodeficiency Virus Type-1 Proteinase Inhibitors: Comparative Molecular Field Analysis of 2-Heterosubstituted Statine Derivatives-Implications for the Design of Novel Inhibitors. J. Med. Chem. 1995, 38, 4917-4928.

(76) Cho, S. J.; Serrano, G. M. L.; Bier, J.; Tropsha, A. Structure-Based Alignment and Comparative Molecular Field Analysis of Acetylcho-linesterase Inhibitors. J. Med. Chem. 1996, 39, 5064-5071.

(77) Berglund, A.; De Rosa, M. C.; Wold, S. Alignment of Flexible Molecules at their Receptor Site Using 3D Descriptors and Hi-PCA. J. Comput.-Aided Mol. Des. 1997, 11, 601-612.

(78) Lemmen, C.; Lengauer, T. Computational Methods for the Struc-tural Alignment of Molecules. J. Comput.-Aided Mol. Des. 2000, 14, 215-232.

(79) Lemmen, C.; Zimmermann, M.; Lengauer, T. Multiple Molecular Superpositioning as an Effective Tool for Virtual Database Screen-ing. Perspect. Drug Discovery Des. 2000, 20, 43-62.

(80) Golbraikh, A.; Bernard, P.; Chretien, J. R. Validation of Protein-Based Alignment in 3D Quantitative Structure-Activity Relation-ships With CoMFA Models. Eur. J. Med. Chem. 2000, 35, 123-136.

(81) Jewell, N. E.; Turner, D. B.; Willett, P.; Sexton, G. J. Automatic Generation of Alignments for 3D QSAR Analyses. J. Mol. Graph Model. 2001, 20, 111-121.

(82) Buolamwini, J. K.; Assefa, H. CoMFA and CoMSIA 3D QSAR and Docking Studies on Conformationally-Restrained Cinnamoyl HIV-1 Integrase Inhibitors: Exploration of a Binding Mode at the Active Site. J. Med. Chem. 2002, 45, 841-852.

(83) Chen, H.; Li, Q.; Yao, X.; Fan, B.; Yuan, S.; Panaye, A.; Doucet, J. P. 3D-QSAR and Docking Study of the Binding Mode of Steroids to

Page 71: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

71

Progesterone Receptor in Active Site. QSAR Comb. Sci. 2003, 22, 604-613.

(84) Medina-Franco, J. L.; Rodriguez-Morales, S.; Juarez-Gordiano, C.; Hernandez-Campos, A.; Castillo, R. Docking-Based CoMFA and CoMSIA Studies of Non-Nucleoside Reverse Transcriptase Inhibi-tors of the Pyridinone Derivative Type. J. Comput.-Aided Mol. Des. 2004, 18, 345-360.

(85) Tervo, A. J.; Nyroenen, T. H.; Roenkkoe, T.; Poso, A. Comparing the Quality and Predictiveness between 3D QSAR Models Obtained from Manual and Automated Alignment. J. Chem. Inf. Comput. Sci. 2004, 44, 807-816.

(86) Zhou, Z.; Madura, J. D. CoMFA 3D-QSAR Analysis of HIV-1 RT Nonnucleoside Inhibitors, TIBO Derivatives Based on Docking Conformation and Alignment. J. Chem. Inf. Comput. Sci. 2004, 44, 2167-2178.

(87) Prathipati, P.; Pandey, G.; Saxena, A. K. CoMFA and Docking Stud-ies on Glycogen Phosphorylase a Inhibitors as Antidiabetic Agents. J. Chem. Inf. Comput. Sci. 2005, 45, 136-145.

(88) Taha, M. O.; AlDamen, M. A. Effects of Variable Docking Condi-tions and Scoring Functions on Corresponding Protein-Aligned Comparative Molecular Field Analysis Models Constructed from Diverse Human Protein Tyrosine Phosphatase 1B Inhibitors. J. Med. Chem. 2005, 48, 8016-8034.

(89) Horwitz, J. P.; Massova, I.; Wiese, T. E.; Besler, B. H.; Corbett, T. H. Comparative molecular field analysis of the antitumor activity of 9H-thioxanthen-9-one derivatives against pancreatic ductal carci-noma 03. J. Med. Chem. 1994, 37, 781-786.

(90) Hocart, S. J.; Reddy, V.; Murphy, W. A.; Coy, D. H. Three-Dimensional Quantitative Structure-Activity Relationships of Soma-tostatin Analogs. 1. Comparative Molecular Field Analysis of Growth Hormone Release-Inhibiting Potencies. J. Med. Chem. 1995, 38, 1974-1989.

(91) Krystek, S. R., Jr.; Hunt, J. T.; Stein, P. D.; Stouch, T. R. Three-Dimensional Quantitative Structure-Activity Relationships of Sul-fonamide Endothelin Inhibitors. J. Med. Chem. 1995, 38, 659-668.

(92) Recanatini, M. Comparative Molecular Field Analysis of Non-Steroidal Aromatase Inhibitors Related to Fadrozole. J. Comput.-Aided Mol. Des. 1996, 10, 74-82.

(93) Kim, K. H.; Martin, Y. C. Direct Prediction of Linear Free Energy Substituent Effects from 3D Structures Using Comparative Molecu-lar Field Analysis. 1. Electronic Effects of Substituted Benzoic Ac-ids. J. Org. Chem. 1991, 56, 2723-2729.

(94) Belvisi, L.; Bravi, G.; Catalano, G.; Mabilia, M.; Salimbeni, A.; Scolastico, C. A 3D QSAR CoMFA study of non-peptide angio-tensin II receptor antagonists. J. Comput.-Aided Mol. Des. 1996, 10, 567-582.

Page 72: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

72

(95) Bureau, R.; Lancelot, J. C.; Prunier, H.; Rault, S. Conformational Analysis and 3D QSAR Study on Novel Partial Agonists of 5-HT3 Receptors. Quant. Struct.-Act. Relat. 1996, 15, 373-381.

(96) Kroemer, R. T.; Hecht, P.; Liedl, K. R. Different Electrostatic De-scriptors in Comparative Molecular Field Analysis: A Comparison of Molecular Electrostatic and Coulomb Potentials. J. Comput. Chem. 1996, 17, 1296-1308.

(97) Carrieri, A.; Altomare, C.; Barreca, M. L.; Contento, A.; Carotti, A.; Hansch, C. Papain Catalyzed Hydrolysis of Aryl Esters: A Compari-son of the Hansch, Docking and CoMFA Methods. Farmaco 1994, 49, 573-585.

(98) Yliniemela, A.; Konschin, H.; Neagu, C.; Pajunen, A.; Hase, T.; Brunow, G.; Teleman, O. Design and Synthesis of a Transition State Analog for the Ene Reaction between Maleimide and 1-Alkenes. J. Am. Chem. Soc. 1995, 117, 5120-5126.

(99) de Laszlo, S. E.; Glinka, T. W.; Greenlee, W. J.; Ball, R.; Nachbar, R. B.; Prendergast, K. The Design, Binding Affinity Prediction and Synthesis of Macrocyclic Angiotensin II AT1 and AT2 Receptor Antagonists. Bioorg. Med. Chem. Lett. 1996, 6, 923-928.

(100) Corelli, F.; Manetti, F.; Tafi, A.; Campiani, G.; Nacci, V.; Botta, M. Diltiazem-Like Calcium Entry Blockers: A Hypothesis of the Re-ceptor-Binding Site Based on a Comparative Molecular Field Analysis Model. J. Med. Chem. 1997, 40, 125-131.

(101) Hasegawa, K.; Arakawa, M.; Funatsu, K. Rational Choice of Bioac-tive Conformations Through Use of Conformation Analysis and 3-way Partial Least Squares Modeling. Chemomet. Intell. Lab. Sys. 2000, 50, 253-261.

(102) Norinder, U. Experimental Design Based 3-D QSAR Analysis of Steroid-Protein Interactions: Application to Human CBG Com-plexes. J. Comput.-Aided Mol. Des. 1990, 4, 381-389.

(103) Jain, A. N.; Koile, K.; Chapman, D. Compass: Predicting Biological Activities from Molecular Surface Properties. Performance Com-parisons on a Steroid Benchmark. J. Med. Chem. 1994, 37, 2315-2327.

(104) Klebe, G.; Abraham, U.; Mietzner, T. Molecular Similarity Indices in a Comparative Analysis (CoMSIA) of Drug Molecules to Corre-late and Predict their Biological Activity. J. Med. Chem. 1994, 37, 4130-4146.

(105) Wagener, M.; Sadowski, J.; Gasteiger, J. Autocorrelation of Molecu-lar Surface Properties for Modeling Corticosteroid Binding Globulin and Cytosolic Ah Receptor Activity by Neural Networks. J. Am. Chem. Soc. 1995, 117, 7769-7775.

(106) Norinder, U. 3D-QSAR Investigation of the Tripos Benchmark Steroids and Some Protein-Tyrosine Kinase Inhibitors of Styrene Type Using the TDQ Approach. J. Chemom. 1996, 10, 533-545.

Page 73: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

73

(107) Silverman, B. D.; Platt, D. E. Comparative Molecular Moment Analysis (CoMMA): 3D-QSAR without Molecular Superposition. J. Med. Chem. 1996, 39, 2129-2140.

(108) Pastor, M.; Cruciani, G.; McLay, I.; Pickett, S.; Clementi, S. GRid-INdependent Descriptors (GRIND): A Novel Class of Alignment-Independent Three-Dimensional Molecular Descriptors. J. Med. Chem. 2000, 43, 3233-3243.

(109) Cramer, R. D. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization. J. Med. Chem. 2003, 46, 374-388.

(110) Hasegawa, K.; Arakawa, M.; Funatsu, K. Simultaneous Determina-tion of Bioactive Conformations and Alignment Rules by Multi-Way PLS Modeling. Comput. Biol. Chem. 2003, 27, 211-216.

(111) Kellogg, G. E.; Kier, L. B.; Gaillard, P.; Hall, L. H. E-state fields: applications to 3D QSAR. J. Comput.-Aided Mol. Des. 1996, 10, 513-520.

(112) Kellogg, G. E. Finding Optimum Field Models for 3D QSAR. Med. Chem. Res. 1997, 7, 417-427.

(113) Melville, J. L.; Hirst, J. D. On the Stability of CoMFA Models. J. Chem. Inf. Comput. Sci. 2004, 44, 1294-1300.

(114) Bohacek, R. S.; McMartin, C. Definition and Display of Steric, Hy-drophobic, and Hydrogen Bonding Properties of Ligand Binding Sites in Proteins Using Lee and Richards Accessible Surface: Vali-dation of a High-Resolution Graphical Tool for Drug Design. J. Med. Chem. 1992, 35, 1671-1684.

(115) Sulea, T.; Oprea, T. I.; Muresan, S.; Chan, S. L. A Different Method for Steric Field Evaluation in CoMFA Improves Model Robustness. J. Chem. Inf. Comput. Sci. 1997, 37, 1162-1170.

(116) Kroemer, R. T.; Hecht, P.; Guessregen, S.; Liedl, K. R. Improving the Predictive Quality of CoMFA Models. Perspect. Drug Discovery Des. 1998, 12/13/14, 41-56.

(117) Kroemer, R. T.; Hecht, P. Replacement of Steric 6-12 Potential-Derived Interaction Energies by Atom-Based Indicator Variables in CoMFA Leads to Models of Higher Consistency. J. Comput.-Aided Mol. Des. 1995, 9, 205-212.

(118) Festing, M. F. W. Principles: The need for better experimental de-sign. Trends Pharmacol. Sci. 2003, 24, 341-345.

(119) Hahn, G. J. Designing experiments. II. Chem. Technol. 1975, 5, 561-562.

(120) Hahn, G. J. Designing experiments. I. Chem. Technol. 1975, 5, 496-498.

(121) Steinberg, D. M.; Hunter, W. G. Experimental Design: Review and Comment. Technometrics 1984, 26, 71-97.

(122) Araujo, P. W.; Brereton, R. G. Experimental design. I. Screening. Trends Anal. Chem. 1996, 15, 26-31.

(123) Araujo, P. W.; Brereton, R. G. Experimental design II. Optimization. Trends Anal. Chem. 1996, 15, 63-70.

Page 74: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

74

(124) Araujo, P. W.; Brereton, G. Experimental design. III. Quantification. Trends Anal. Chem. 1996, 15, 156-163.

(125) Lundstedt, T.; Seifert, E.; Abramo, L.; Thelin, B.; Nyström, A.; Pet-tersen, J.; Bergman, R. Experimental design and optimization. Chemomet. Intell. Lab. Sys. 1998, 42, 3-40.

(126) Hanrahan, G.; Lu, K. Application of Factorial and Response Surface Methodology in Modern Experimental Design and Optimization. Crit. Rev. Anal. Chem. 2006, 36, 141-151.

(127) Atkinson, A. C.; Bailey, R. A. One hundred years of the design of experiments on and off the pages of Biometrika. Biometrika 2001, 88, 53-97.

(128) Taylor, R. Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and Ag-rochemicals. J. Chem. Inf. Comput. Sci. 1995, 35, 59-67.

(129) Young, S. S.; Farmen, M.; Rusinko, A. I. Random Versus Rational. Which is Better for General Compound Screening? Network Sci. 1996.

(130) Pötter, T.; Matter, H. Random or Rational Design? Evaluation of Diverse Compound Subsets from Chemical Structure Databases. J. Med. Chem. 1998, 41, 478-488.

(131) Snarey, M.; Terrett, N. K.; Willett, P.; Wilton, D. J. Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph Model. 1998, 15, 372-385.

(132) Holliday, J. D.; Ranade, S. S.; Willett, P. A Fast Algorithm For Se-lecting Sets of Dissimilar Molecules From Large Chemical Data-bases. Quant. Struct.-Act. Relat. 1995, 14, 501-506.

(133) Andersson, P. M.; Sjostrom, M.; Wold, S.; Lundstedt, T. Strategies for subset selection of parts of an in-house chemical library. J. Chemom. 2001, 15, 353-369.

(134) Pascual, R.; Borrell, J. I.; Teixido, J. Analysis of Selection Method-ologies for Combinatorial Library Design. Mol. Diversity 2003, 6, 121-133.

(135) Kennard, R. W.; Stone, L. A. Computer Aided Design of Experi-ments. Technometrics 1969, 11, 137-148.

(136) Schulz-Gasch, T.; Stahl, M. Binding site characteristics in structure-based virtual screening: evaluation of current docking tools. J. Mol. Model. 2003, 9, 47-57.

(137) Kellenberger, E.; Rodrigo, J.; Muller, P.; Rognan, D. Comparative Evaluation of Eight Docking Tools for Docking and Virtual Screen-ing Accuracy. Proteins: Struct., Funct., Bioinf. 2004, 57, 225-242.

(138) Kontoyianni, M.; McClellan, L. M.; Sokol, G. S. Evaluation of Docking Performance: Comparative Data on Docking Algorithms. J. Med. Chem. 2004, 47, 558-565.

(139) Perola, E.; Walters, W. P.; Charifson, P. S. A Detailed Comparison of Current Docking and Scoring Methods on Systems of Pharmaceu-tical Relevance. Proteins: Struct., Funct., Bioinf. 2004, 56, 235-249.

Page 75: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

75

(140) Schulz-Gasch, T.; Stahl, M. Scoring functions for protein-ligand interactions: a critical perspective. Drug Discov. Today Technol. 2004, 1, 231-239.

(141) David, L.; Nielsen, P. A.; Hedstroem, M.; Norden, B. Scope and Limitation of Ligand Docking: Methods, Scoring Functions and Pro-tein Targets. Curr. Comput.-Aided Drug Des. 2005, 1, 275-306.

(142) Chen, H.; Lyne, P. D.; Giordanetto, F.; Lovell, T.; Li, J. On Evaluat-ing Molecular-Docking Methods for Pose Prediction and Enrich-ment Factors. J. Chem. Inf. Model. 2006, 46, 401-415.

(143) Sippl, W. Application of Structure-Based Alignment Methods for 3D QSAR Analyses. Methods Princ. Med. Chem. 2006, 32, 223-249.

(144) Muegge, I.; Enyedy, I. Docking and Scoring. Computational Me-dicinal Chemistry for Drug Discovery: New York, 2004; pp 405-436.

(145) Taylor, R. D.; Jewsbury, P. J.; Essex, J. W. A review of protein-small molecule docking methods. J. Comput.-Aided Mol. Des. 2002, 16, 151-166.

(146) Cole, J. C.; Murray, C. W.; Nissink, J. W. M.; Taylor, R. D.; Taylor, R. Comparing Protein-Ligand Docking Programs is Difficult. Pro-teins: Struct., Funct., Bioinf. 2005, 60, 325-332.

(147) Schaal, W. Computational Studies of HIV-1 Protease Inhibitors. In Comprehensive Summaries of Uppsala Dissertations From the Fac-ulty of Pharmacy; Uppsala University: Uppsala, Sweden, 2002; pp 88.

(148) Sutherland, J. J.; O'Brien, L. A.; Weaver, D. F. A Comparison of Methods for Modeling Quantitative Structure-Activity Relation-ships. J. Med. Chem. 2004, 47, 5541-5554.

(149) Coats, E. A. The CoMFA Steroids as a Benchmark Dataset for De-velopment of 3D QSAR Methods. Perspect. Drug Discovery Des. 1998, 12/13/14, 199-213.

(150) Gasteiger, J. Dataset of 31 Steroids Binding to the Corticosteroid Binding Globulin (CBG) Receptor; Gasteiger Research Group.

(151) Gasteiger, J.; Rudolph, C.; Sadowski, J. Automatic Generation of 3D Atomic Coordinates for Organic Molecules. Tetrahedron Comput. Methodol. 1990, 3, 537-547.

(152) Hinze, J.; Jaffe, H. H. Electronegativity. I. Orbital Electronegativity of Neutral Atoms. J. Am. Chem. Soc. 1962, 84, 540-546.

(153) Gasteiger, J.; Marsili, M. Iterative Partial Equalization of Orbital Electronegativity: A Rapid Access to Atomic Charges. Tetrahedron 1980, 36, 3219-3222.

(154) Gohlke, H.; Klebe, G. DrugScore Meets CoMFA: Adaptation of Fields for Molecular Comparison (AFMoC) or How to Tailor Knowledge-Based Pair-Potentials to a Particular Protein. J. Med. Chem. 2002, 45, 4153-4170.

(155) Böhm, M.; Stürzebecher, J.; Klebe, G. Three-Dimensional Quantita-tive Structure-Activity Relationship Analyses Using Comparative

Page 76: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

76

Molecular Field Analysis and Comparative Molecular Similarity In-dices Analysis To Elucidate Selectivity Differences of Inhibitors Binding to Trypsin, Thrombin, and Factor Xa. J. Med. Chem. 1999, 42, 458-477.

(156) Tropsha, A.; Gramatica, P.; Gombar, V. K. The Importance of Being Earnest: Validation is the Absolute Essential for Successful Applica-tion and Interpretation of QSPR Models. QSAR Comb. Sci. 2003, 22, 69-77.

(157) Kubinyi, H. Variable Selection in QSAR Studies. I. An Evolutionary Algorithm. Quant. Struct.-Act. Relat. 1994, 13, 285-294.

(158) Clark, M.; Cramer, R. D., III The Probability of Chance Correlation Using Partial Least Squares (PLS). Quant. Struct.-Act. Relat. 1993, 12, 137-145.

(159) Bush, B. L.; Nachbar, R. B., Jr. Sample-Distance Partial Least squares: PLS Optimized for Many variables, With Application to CoMFA. J. Comput.-Aided Mol. Des. 1993, 7, 587-619.

(160) Hawkins, D. M. The Problem of Overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1-12.

(161) Baroni, M.; Clementi, S.; Cruciani, G.; Costantino, G.; Riganelli, D.; Oberrauch, E. Predictive Ability of Regression Models. Part II. Se-lection of the Best Predictive PLS Model. J. Chemom. 1992, 6, 347-356.

(162) Cruciani, G.; Baroni, M.; Clementi, S.; Costantino, G.; Riganelli, D.; Skagerberg, B. Predictive Ability of Regression Models. Part I. Standard Deviation of Prediction Errors (SDEP). J. Chemom. 1992, 6, 335-346.

(163) Golbraikh, A.; Tropsha, A. Beware of q2. J. Mol. Graph Model. 2002, 20, 269-276.

(164) Hawkins, D. M.; Basak, S. C.; Mills, D. Assessing Model Fit by Cross-Validation. J. Chem. Inf. Comput. Sci. 2003, 43, 579-586.

(165) Wilcox, R. E.; Huang, W.-H.; Brusniak, M.-Y. K.; Wilcox, D. M.; Pearlman, R. S.; Teeter, M. M.; DuRand, C. J.; Wiens, B. L.; Neve, K. A. CoMFA-Based Prediction of Agonist Affinities at Recombi-nant Wild Type Versus Serine to Alanine Point Mutated D2 Dopa-mine Receptors. J. Med. Chem. 2000, 43, 3005-3019.

(166) Wold, S.; Eriksson, L. Statistical Validation of QSAR Results. Vali-dation Tools. Methods Princ. Med. Chem. 1995, 2, 309-318.

(167) Oloff, S.; Mailman, R. B.; Tropsha, A. Application of Validated QSAR Models of D1 Dopaminergic Antagonists for Database Min-ing. J. Med. Chem. 2005, 48, 7322-7332.

(168) van der Voet, H. Comparing the Predictive Accuracy of Models Using a Simple Randomization Test. Chemomet. Intell. Lab. Sys. 1994, 25, 313-323.

(169) Aptula, A. O.; Jeliazkova, N. G.; Schultz, T. W.; Cronin, M. T. D. The Better Predictive Model: High q2 For the Training Set or Low Root Mean Square Error of Prediction for the Test Set? QSAR Comb. Sci. 2005, 24, 385-396.

Page 77: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

77

(170) Schultz, T. W.; Netzeva, T. I.; Cronin, M. T. D. Evaluation of QSARs for ecotoxicity: A method for assigning quality and confi-dence. SAR QSAR Environ. Res. 2004, 15, 385-397.

(171) Kolossov, E.; Stanforth, R. The quality of QSAR models: problems and solutions. SAR QSAR Environ. Res. 2007, 18, 89-100.

(172) Hurst, J. R.; Heritage, T. W. Molecular Hologram QSAR: US Patent 5,751,605, 1996.

(173) Michailidis, G. d. L., J. The GIFI System of Descriptive Multivariate Analysis. Stat. Sci. 1998, 13, 307-336.

(174) Berglund, A.; Kettaneh, N.; Uppgard, L.-L.; Wold, S.; Bendwell, N.; Cameron, D. R. The GIFI Approach to Non-Linear PLS Modeling. J. Chemom. 2001, 15, 321-336.

(175) Schmuker, M.; Givehchi, A.; Schneider, G. Impact of Different Software Implementations on the Performance of the Maxmin Method for Diverse Subset Selection. Mol. Diversity 2004, 8, 421-425.

(176) Moore, D. R. J.; Breton, R. L.; MacDonald, D. B. A Comparison of Model Performance for Six Quantitative Structure-Activity Rela-tionship Packages that Predict Acute Toxicity to Fish. Environ. Toxicol. Chem. 2003, 22, 1799-1809.

(177) Simmonds, P. Genetic diversity and evolution of hepatitis C virus - 15 years on. J. Gen. Virol. 2004, 85, 3173-3188.

(178) Cristina, J. Hepatitis C virus: quasispecies dynamics, virus persis-tence and antiviral therapy. Expert Opin. Ther. Patents 2007, 17, 499-510.

(179) Ingallinella, P.; Altamura, S.; Bianchi, E.; Taliani, M.; Ingenito, R.; Cortese, R.; De Francesco, R.; Steinkuehler, C.; Pessi, A. Potent Peptide Inhibitors of Human Hepatitis C Virus NS3 Protease Are Obtained by Optimizing the Cleavage Products. Biochemistry 1998, 37, 8906-8914.

(180) Rönn, R.; Sabnis, Y. A.; Gossas, T.; Åkerblom, E.; Helena, D. U.; Hallberg, A.; Johansson, A. Exploration of acyl sulfonamides as carboxylic acid replacements in protease inhibitors of the hepatitis C virus full-length NS3. Bioorg. Med. Chem. 2006, 14, 544-559.

(181) McMartin, C.; Bohacek, R. S. QXP: Powerful, rapid computer algo-rithms for structure-based drug design. J. Comput.-Aided Mol. Des. 1997, 11, 333-344.

(182) Bursavich, M. G.; Rich, D. H. Designing Non-Peptide Peptidomi-metics in the 21st Century: Inhibitors Targeting Conformational En-sembles. J. Med. Chem. 2002, 45, 541-558.

(183) Carlson, H. A. Protein Flexibility is an Important Component of Structure-Based Drug Discovery. Curr. Pharm. Design 2002, 8, 1571-1578.

(184) Teague, S. J. Implications of Protein Flexibility for Drug Discovery. Nat. Rev. Drug Disc. 2003, 2, 527-541.

Page 78: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

78

(185) Cavasotto, C. N.; Abagyan, R. A. Protein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases. J. Mol. Biol. 2004, 337, 209-225.

(186) Alberts, I. L.; Todorov, N. P.; Kaellblad, P.; Dean, P. M. Ligand Docking and Design in a Flexible Receptor Site. QSAR Comb. Sci. 2005, 24, 503-507.

(187) Perni, R. B.; Pitlik, J.; Britt, S. D.; Court, J. J.; Courtney, L. F.; Deininger, D. D.; Farmer, L. J.; Gates, C. A.; Harbeson, S. L.; Levin, R. B.; Lin, C.; Lin, K.; Moon, Y.-C.; Luong, Y.-P.; O'Malley, E. T.; Rao, B. G.; Thomson, J. A.; Tung, R. D.; Van Drie, J. H.; Wei, Y. Inhibitors of Hepatitis C Virus NS3.4A Protease 2. Warhead SAR and Optimization. Bioorg. Med. Chem. Lett. 2004, 14, 1441-1446.

(188) Narjes, F.; Brunetti, M.; Colarusso, S.; Gerlach, B.; Koch, U.; Bi-asiol, G.; Fattori, D.; De Francesco, R.; Matassa, V. G.; Steinkuehler, C. �-Ketoacids Are Potent Slow Binding Inhibitors of the Hepatitis C Virus NS3 Protease. Biochemistry 2000, 39, 1849-1861.

(189) Attwood, M. R.; Bennett, J. M.; Campbell, A. D.; Canning, G. G. M.; Carr, M. G.; Conway, E.; Dunsdon, R. M.; Greening, J. R.; Jones, P. S.; Kay, P. B.; Handa, B. K.; Hurst, D. N.; Jennings, N. S.; Jordan, S.; Keech, E.; O'Brien, M. A.; Overton, H. A.; King-Underwood, J.; Raynham, T. M.; Stenson, K. P.; Wilkinson, C. S.; Wilkinson, T. C. I.; Wilson, F. X. The design and synthesis of potent inhibitors of hepatitis C virus NS3-4A proteinase. Antiviral Chem. Chemother. 1999, 10, 259-273.

(190) Llinas-Brunet, M.; Bailey, M. D.; Ghiro, E.; Gorys, V.; Halmos, T.; Poirier, M.; Rancourt, J.; Goudreau, N. A Systematic Approach to the Optimization of Substrate-Based Inhibitors of the Hepatitis C Vi-rus NS3 Protease: Discovery of Potent and Specific Tripeptide In-hibitors. J. Med. Chem. 2004, 47, 6584-6594.

(191) Victor, F.; Lamar, J.; Snyder, N.; Yip, Y.; Guo, D.; Yumibe, N.; Johnson, R. B.; Wang, Q. M.; Glass, J. I.; Chen, S.-H. P1 and P3 op-timization of novel bicycloproline P2 bearing tetrapeptidyl �-ketoamide based HCV protease inhibitors. Bioorg. Med. Chem. Lett. 2004, 14, 257-261.

(192) LaPlante, S. R.; Llinas-Brunet, M. Dynamics and structure-based design of drugs targeting the critical serine protease of the hepatitis C virus - from a peptidic substrate to BILN 2061. Curr. Med. Chem. Anti-Infect. Agents 2005, 4, 111-132.

(193) Tsantrizos, Y. S.; Bolger, G.; Bonneau, P.; Cameron, D. R.; Goudreau, N.; Kukolj, G.; LaPlante, S. R.; Llinas-Brunet, M.; Nar, H.; Lamarre, D. Macrocyclic Inhibitors of the NS3 Protease as Po-tential Therapeutic Agents of Hepatitis C Virus Infection. Angew. Chem., Int. Ed. 2003, 42, 1356-1360.

(194) Seco, J. M.; Quinoa, E.; Riguera, R. Boc-Phenylglycine: The Re-agent of Choice for the Assignment of the Absolute Configuration of

Page 79: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

79

�-Chiral Primary Amines by 1H NMR Spectroscopy. J. Org. Chem. 1999, 64, 4669-4675.

(195) DeGrado, W. F.; Schneider, J. P.; Hamuro, Y. The Twists and Turns of �-Peptides. J. Peptide Res. 1999, 54, 206-217.

(196) Steer, D. L.; Lew, R. A.; Perlmutter, P.; Smith, A. I.; Aguilar, M.-I. �-Amino Acids: Versatile Peptidomimetics. Curr. Med Chem. 2002, 9, 811-822.

(197) Lew, R. A.; Boulos, E.; Stewart, K. M.; Perlmutter, P.; Harte, M. F.; Bond, S.; Aguilar, M.-I.; Smith, A. I. Bradykinin Analogues with �-Amino Acid Substitutions Reveal Subtle Differences in Substrate Specificity Between the Endopeptidases EC 3.4.24.15 and EC 3.4.24.16. J. Pept. Sci. 2000, 6, 440-445.

(198) Steer, D. L.; Lew, R. A.; Perlmutter, P.; Smith, A. I.; Aguilar, M. I. Design and Synthesis of Inhibitors Incorporating �-Amino Acids of Metalloendopeptidase EC 3.4.24.15. J. Pept. Sci. 2000, 6, 470-477.

(199) Lew, R. A.; Boulos, E.; Stewart, K. M.; Perlmutter, P.; Harte, M. F.; Bond, S.; Reeve, S. B.; Norman, M. U.; Lew, M. J.; Aguilar, M.-I.; Smith, I. Substrate Analogues Incorporating �-Amino Acids: Poten-tial Application for Peptidase Inhibition. FASEB J. 2001, 15, 1664-1666, 1610.1096/fj.1600-0805fje.

(200) Steer, D.; Lew, R.; Perlmutter, P.; Smith, A. I.; Aguilar, M.-I. Inhibi-tors of Metalloendopeptidase EC 3.4.24.15 and EC 3.4.24.16 Stabi-lized against Proteolysis by the Incorporation of �-Amino Acids. Biochemistry 2002, 41, 10819-10826.

(201) Bromfield, K. M.; Quinsey, N. S.; Duggan, P. J.; Pike, R. N. Ap-proaches to Selective Peptidic Inhibitors of Factor Xa. Chem. Biol. Drug Des. 2006, 68, 11-19.

(202) Colarusso, S.; Gerlach, B.; Koch, U.; Muraglia, E.; Conte, I.; Stans-field, I.; Matassa, V. G.; Narjes, F. Evolution, Synthesis and SAR of Tripeptide �-Ketoacid Inhibitors of the Hepatitis C virus NS3/NS4A Serine Protease. Bioorg. Med. Chem. Lett. 2002, 12, 705-708.

(203) Cardillo, G.; Gentilucci, L.; Qasem, A. R.; Sgarzi, F.; Spampinato, S. Endomorphin-1 Analogues Containing �-Proline Are �-Opioid Receptor Agonists and Display Enhanced Enzymatic Hydrolysis Re-sistance. J. Med. Chem. 2002, 45, 2571-2578.

(204) Sagan, S.; Milcent, T.; Ponsinet, R.; Convert, O.; Tasseau, O.; Chas-saing, G.; Lavielle, S.; Lequin, O. Structural and Biological Effects of a �2- or �3-Amino Acid Insertion in a Peptide. Application to Mo-lecular Recognition of Substance P by the Neurokinin-1 Receptor. Eur. J. Biochem. 2003, 270, 939-949.

(205) Campbell, J. A.; Good, A. WO 02/060926, 2002. (206) Di Marco, S.; Rizzi, M.; Volpari, C.; Walsh, M. A.; Narjes, F.; Cola-

russo, S.; De Francesco, R.; Matassa, V. G.; Sollazzo, M. Inhibition of the Hepatitis C Virus NS3/4A Protease: The Crystal Structures of Two Protease-Inhibitor Complexes. J. Biol. Chem. 2000, 275, 7152-7157.

Page 80: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

80

(207) Poliakov, A.; Johansson, A.; Åkerblom, E.; Oscarsson, K.; Samuels-son, B.; Hallberg, A.; Danielson, U. H. Structure-activity relation-ships for the selectivity of hepatitis C virus NS3 protease inhibitors. Biochim. Biophys. Acta 2004, 1672, 51-59.

(208) Örtqvist, P.; Peterson, S. D.; Åkerblom, E.; Gossas, T.; Sabnis, Y. A.; Fransson, R.; Lindeberg, G.; Helena, D. U.; Karlén, A.; Sand-ström, A. Phenylglycine as a novel P2 scaffold in hepatitis C virus NS3 protease inhibitors. Bioorg. Med. Chem. 2007, 15, 1448-1474.

(209) Poliakov, A.; Sandström, A.; Åkerblom, E.; Danielson, U. H. Mechanistic studies of electrophilic protease inhibitors of full length hepatic C virus (HCV) NS3. J. Enzym. Inhib. Med. Chem. 2007, 22, 191-199.

(210) Han, W.; Hu, Z.; Jiang, X.; Wasserman, Z. R.; Decicco, C. P. Gly-cine �-Ketoamides as HCV NS3 Protease Inhibitors. Bioorg. Med. Chem. Lett. 2003, 13, 1111-1114.

(211) Venkatraman, S.; Bogen, S. L.; Arasappan, A.; Bennett, F.; Chen, K.; Jao, E.; Liu, Y.-T.; Lovey, R.; Hendrata, S.; Huang, Y.; Pan, W.; Parekh, T.; Pinto, P.; Popov, V.; Pike, R.; Ruan, S.; Santhanam, B.; Vibulbhan, B.; Wu, W.; Yang, W.; Kong, J.; Liang, X.; Wong, J.; Liu, R.; Butkiewicz, N.; Chase, R.; Hart, A.; Agrawal, S.; Ingra-vallo, P.; Pichardo, J.; Kong, R.; Baroudy, B.; Malcolm, B.; Guo, Z.; Prongay, A.; Madison, V.; Broske, L.; Cui, X.; Cheng, K.-C.; Hsieh, T. Y.; Brisson, J.-M.; Prelusky, D.; Korfmacher, W.; White, R.; Bogdanowich-Knipp, S.; Pavlovsky, A.; Bradley, P.; Saksena, A. K.; Ganguly, A.; Piwinski, J.; Girijavallabhan, V.; Njoroge, F. G. Dis-covery of (1R,5S)-N-[3-Amino-1-(cyclobutylmethyl)-2,3-dioxopropyl]- 3-[2(S)-[[[(1,1-dimethylethyl)amino]carbonyl]amino]-3,3-dimethyl-1-oxobutyl]- 6,6-dimethyl-3-azabicyclo[3.1.0]hexan-2(S)-carboxamide (SCH 503034), a Selective, Potent, Orally Bioavailable Hepatitis C Virus NS3 Protease Inhibitor: A Potential Therapeutic Agent for the Treatment of Hepatitis C Infection. J. Med. Chem. 2006, 49, 6074-6086.

(212) Rönn, R. Design and Synthesis of Inhibitors Targeting the Hepatitis C Virus NS3 Protease: Focus on C-Terminal Acyl Sulfonamides. In Comprehensive Summaries of Uppsala Dissertations From the Fac-ulty of Pharmacy; Uppsala University: Uppsala, Sweden, 2007; pp 79.

(213) Purcell, W. P.; Singer, J. A. A Brief Review and Table of Semiem-pirical Parameters Used in the Hückel Molecular Orbital Method. J. Chem. Eng. Data 1967, 12, 235-246.

(214) Peterson, S. D.; Schaal, W.; Karlén, A. Improved CoMFA Modeling by Optimization of Settings. J. Chem. Inf. Model. 2006, 46, 355-364.

(215) Fersht, A. R.; Shi, J. P.; Knill-Jones, J.; Lowe, D. M.; Wilkinson, A. J.; Blow, D. M.; Brick, P.; Carter, P.; Waye, M. M. Y.; Winter, G. Hydrogen bonding and biological specificity analyzed by protein engineering. Nature 1985, 314, 235-238.

Page 81: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease
Page 82: Improved CoMFA Modeling by Optimization of Settings170499/fulltext01.pdf · Improved CoMFA Modeling by Optimization of Settings Toward the Design of Inhibitors of the HCV NS3 Protease

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Pharmacy 61

Editor: The Dean of the Faculty of Pharmacy

A doctoral dissertation from the Faculty of Pharmacy, UppsalaUniversity, is usually a summary of a number of papers. A fewcopies of the complete dissertation are kept at major Swedishresearch libraries, while the summary alone is distributedinternationally through the series Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty ofPharmacy. (Prior to January, 2005, the series was publishedunder the title “Comprehensive Summaries of UppsalaDissertations from the Faculty of Pharmacy”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-8140

ACTA

UNIVERSITATIS

UPSALIENSIS

UPPSALA

2007