*********************************************** * BOOTSTRAPPING MLR POPULATION R-SQUARE * *********************************************** * (C) Marta García-Granero 07/2008 * * send questions to: mgarciagranero@gmail.com * * Downloaded from: http://gjyp.nl/marta/ * * Feel free to use or modify this code, but * * acknowledge the author and the web site * *********************************************** * For the whole set of equations used, see Set #1, #2 & #5 at http://www.anthony-vba.kefra.com/vba/vba9.htm *. * Sample dataset (50 random cases from Rosner's dataset FEV.sav) *. SET LOCALE=ENGLISH. DATA LIST FREE/fev(F8.3) age(F8.0) hgt(F8.1) gender smoke (2 F8.0). BEGIN DATA 1.415 6 56.0 0 0 2.646 10 60.0 1 0 3.519 19 66.0 0 1 3.000 9 65.5 1 0 3.428 14 64.0 0 1 1.694 11 60.0 1 1 3.957 14 72.0 1 1 1.962 8 57.0 1 0 2.384 12 63.5 0 1 2.679 15 66.0 0 1 2.387 10 66.0 0 1 1.794 8 54.5 1 0 2.646 13 61.5 0 0 2.198 15 62.0 0 1 3.345 19 65.5 0 1 2.599 13 62.5 0 1 3.082 17 67.0 1 1 2.903 16 63.0 0 1 3.004 15 64.0 0 1 1.603 7 51.0 0 0 1.196 5 46.5 0 0 1.697 8 59.0 0 0 2.813 10 61.5 0 0 3.985 15 71.0 1 0 4.309 14 69.0 1 1 1.947 9 56.5 0 0 3.169 11 62.5 0 1 3.406 17 69.0 1 1 2.358 10 59.0 0 0 1.933 9 58.0 0 0 3.297 13 65.0 0 1 3.680 14 67.0 1 0 1.953 9 58.0 1 1 3.247 11 65.5 1 0 4.086 18 67.0 1 1 3.585 14 70.0 1 0 3.498 10 68.0 1 1 2.953 11 67.0 0 1 3.127 10 62.0 1 0 1.338 6 51.5 0 0 2.569 12 63.0 0 0 3.320 11 65.5 1 0 3.780 14 70.0 1 0 4.404 18 70.5 1 1 4.637 11 72.0 1 1 3.727 15 68.0 1 1 4.203 12 71.0 1 0 2.564 7 58.0 0 0 3.152 13 62.0 0 1 2.391 10 59.5 1 0 END DATA. VAR LABEL fev 'FEV (liters)' /age 'Age (years)' /hgt 'Height (inches)'. VALUE LABEL gender 0 'Female' 1 'Male' /smoke 0'Non Smoker' 1'Smoker'. DATASET NAME OriginalData. * Adding an interaction term to the model *. COMPUTE smokehgt=smoke*hgt. FORMAT smokehgt (F8). * Using SPSS *. REGRESSION /STATISTICS COEFF OUTS R ANOVA /DEPENDENT fev /METHOD=ENTER age hgt gender smoke smokehgt. * Sample R-square=0.846; Adjusted R-square=0.829 * Confidence interval for Rho-square: * Using R2.exe: Lower Limit: 0.713; Upper Limit: 0.900 (program by Steiger&Fouladi) * Using bootstrap: Lower Limit: 0.762; Upper Limit: 0.907 *. * Be patient: it takes around 6 seconds with a 2.0 GHz processor & 1 Gb RAM *. PRESERVE. SET MXLOOPS=20000. /* Should be at least equal to K *. MATRIX. PRINT /TITLE='BOOTSTRAP CONFIDENCE INTERVAL FOR RHO-SQUARE'. * Bootstraping conditions *. COMPUTE k=20000. /* Nr of bootsamples (if you increase it, increase also MXLOOPS) *. * Input data (DV goes first) with listwise deletion *. GET data /VAR=fev age hgt gender smoke smokehgt/MISSING=OMIT /NAME=vnames. COMPUTE vname=vnames(1). * Statistics *. COMPUTE n=NROW(data). COMPUTE p=NCOL(data). COMPUTE x={MAKE(n,1,1),data(:,(2:p))}. COMPUTE y=data(:,1). COMPUTE meany=CSUM(y)/n. COMPUTE b=INV(T(x)*x)*T(X)*y. COMPUTE ESS=T(b)*T(x)*y-n*meany&**2. COMPUTE TSS=T(Y)*y-n*meany&**2. COMPUTE rsquare=ESS/TSS. COMPUTE psquare=1-(1-rsquare)*(n-1)/(n-p). COMPUTE Fvalue=(rsquare/(p-1))/((1-rsquare)/(n-p)). COMPUTE Fsig=1-FCDF(Fvalue,(p-1),(n-p)). COMPUTE sampleP2=psquare. * Reports *. COMPUTE vnames={'Constant',vnames(2:p)}. PRINT /TITLE='*** SAMPLE STATISTICS ***'. PRINT {b} /FORMAT='F8.3' /RNAMES=vnames /TITLE='Unstandardized coefficients'. PRINT {rsquare,psquare} /FORMAT='F8.3' /CLABEL='R-Square','P-Square' /TITLE='Sample R-square & Adjusted (Population) Rho-square'. PRINT {Fvalue,Fsig} /FORMAT='F8.4' /CLABEL='F value','Sig.' /TITLE='R-square test (F value & Significance)'. * Bootstrapping P2 (Rho-square) *. COMPUTE bootP2= MAKE(k,1,0). COMPUTE bootsamp=MAKE(n,p,0). LOOP i=1 TO k. /* Extracting K bootstrap samples of *. . COMPUTE flipcoin=1+TRUNC(n*UNIFORM(n,1)). /* size n *. . COMPUTE bootsamp=data(flipcoin,:). * Statistics for every bootsample *. . COMPUTE x={MAKE(n,1,1),bootsamp(:,(2:p))}. . COMPUTE y=bootsamp(:,1). . COMPUTE meany=CSUM(y)/n. . COMPUTE b=INV(T(x)*x)*T(X)*y. . COMPUTE ESS=T(b)*T(x)*y-n*meany&**2. . COMPUTE TSS=T(Y)*y-n*meany&**2. . COMPUTE rsquare=ESS/TSS. . COMPUTE psquare=1-(1-rsquare)*(n-1)/(n-p). . COMPUTE bootP2(i)=psquare. END LOOP. * Gran mean of bootstrapped R-square *. COMPUTE mean=CSUM(bootP2)/k. * Bootstrap estimator of the standard error of the R-square *. COMPUTE BootSEM=SQRT((CSSQ(bootP2)-k&*(mean&**2))/(k-1)). PRINT {mean,BootSEM} /FORMAT='F8.3' /CLABEL='Mean-Rho','SE(Rho)*' /RNAME=vname /TITLE='Bootstrapped Statistics for Rho-square'. PRINT/TITLE='(*) Std. Deviation of bootstrapped Rho-squares'. * Ordered array: sorting algorithm by R Ristow & J Peck *. COMPUTE sortedP2=bootP2. COMPUTE sortedP2(GRADE(bootP2))=bootP2. * Empiric confidence interval (percentiles 2.5 & 97.5) *. COMPUTE lower1=sortedP2(k*0.025). COMPUTE upper1=sortedP2(1+k*0.975). * Parametric confidence intervals (BV1&BV2)*. COMPUTE z = 1.959964. COMPUTE lower2=mean-z*BootSEM. COMPUTE upper2=mean+z*BootSEM. COMPUTE lower3=sampleP2-z*BootSEM. COMPUTE upper3=sampleP2+z*BootSEM. * Reports *. PRINT /TITLE='******* BOOTSTRAP ********'. PRINT k /FORMAT='F8.0' /TITLE='Boostrapping conditions: k (Nr. reps.)'. PRINT {lower1,upper1;lower2,upper2;lower3,upper3} /FORMAT='F8.3' /CLABEL='Lower CL','Upper CL' /RLABEL='BP','BV1','BV2' /TITLE='95%CI: Non parametric (Pctiles. 2.5 & 97.5) & Parametric (Z based) BV1&BV2'. END MATRIX. RESTORE.