Statistical inference Overfitting






in statistics, inference drawn statistical model, has been selected via procedure. burnham & anderson, in much-cited text on model selection, argue avoid overfitting, should adhere principle of parsimony . authors state following.



overfitted models … free of bias in parameter estimators, have estimated (and actual) sampling variances needlessly large (the precision of estimators poor, relative have been accomplished more parsimonious model). spurious treatment effects tend identified, , spurious variables included overfitted models. … best approximating model achieved balancing errors of underfitting , overfitting.



overfitting more serious concern when there little theory available guide analysis, in part because there tend large number of models select from. book model selection , model averaging (2008) puts way.



given data set, can fit thousands of models @ push of button, how choose best? many candidate models, overfitting real danger. monkey typed hamlet writer?



regression

in regression, overfitting occurs frequently. in extreme case, if there p variables in linear regression p data points, fitted line go through every point. recent study suggests 2 observations per independent variable sufficient linear regression. logistic regression or cox proportional hazards models, there variety of rules of thumb (e.g. 5-9, 10 , 10-15 — guideline of 10 observations per independent variable known 1 in ten rule ). in process of regression model selection, mean squared error of random regression function can split random noise, approximation bias, , variance in estimate of regression function, , bias–variance tradeoff used overcome overfit models.








Comments

Popular posts from this blog

Life and work Ustad Mansur

Examples Wreath product

Kiev 35 mm cameras Kiev (brand)