Most QSPR models are useful, but occasionally models are produced that not at all reliable.[1] The problems of reliability can validly be classified as (i) overfitting and (ii) models by chance. The latter problem can only be solved by subjective human judgment based on the justifiability of any assessment of the uncertainty of a particular prediction.[2, 3]

The problem of overfitting is one of the more common problems in the development of the any kind of model and QSPR is no exception. The problem is usually solved using objective validation criterions. The most common method of validation in chemometrics is crossvalidation. [4]

Referenses:

  1. Eriksson, L.; Johansson, E.; Muller, M.; Wold, S. On the selection of the training set in environmental QSAR analysis when compounds are clustered J. Chemometrics 2000, 14, 599-616.
  2. Stone M.; Jonathan P. Statistical Thinking and Technique for QSAR and related studies. 1. Genaral Theory J. Chemometrics 1993, 7, 455-475.
  3. Stone M.; Jonathan P. Statistical Thinking and Technique for QSAR and related studies. 2. Specific Methods J. Chemometrics 1994, 6, 1-20.
  4. Xu Q.-S.; Liang Y.-Z. Monte Carlo cross validation Chemom. Intell. Lab. Syst. 2001, 56, 1-11.

 

   
   
 
   
University of Florida 2001-2005.
All rights reserved