*********************************************** * BOOTSTRAPPING THE ARITHMETIC MEAN * *********************************************** * (C) Marta García-Granero 07/2008 * * send questions to: mgarciagranero@gmail.com * * Downloaded from: http://gjyp.nl/marta/ * * Feel free to use or modify this code, but * * acknowledge the author and the web site * *********************************************** * Sample dataset *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VARIABLE LABEL copper 'Urinary copper (µmol/24hr)'. DATASET NAME OriginalData. PRESERVE. * Initialize a random seed *. SET RNG=MT. SET MTINDEX=RANDOM. SET MXLOOPS=20000. MATRIX. GET data /VAR=ALL /NAMES=vname. * Sample statistics *. COMPUTE n=NROW(data). COMPUTE mean=CSUM(data)/n. COMPUTE variance=(CSSQ(data)-n&*(mean&**2))/(n-1). COMPUTE StdError=SQRT(variance/n). PRINT {mean,variance,StdError} /FORMAT='F8.3' /RNAME=vname /CLABEL='Mean','Variance','StdError' /TITLE='Sample statistics'. * Bootstrap samples & means *. COMPUTE k=20000. /* Number of bootsamples (change it if you want) *. COMPUTE bootmean=MAKE(k,1,0). COMPUTE bootsamp=MAKE(n,1,0). LOOP i=1 TO k. /* Extracting bootstrap samples *. - COMPUTE flipcoin=1+TRUNC(n*UNIFORM(n,1)). - COMPUTE bootsamp=data(flipcoin). - COMPUTE bootmean(i)=CSUM(bootsamp)/n. END LOOP. * Gran mean of bootstrapped means *. COMPUTE mean=CSUM(bootmean)/k. * Bootstrap estimator of the standard error of the mean *. COMPUTE BootSEM=SQRT((CSSQ(bootmean)-k&*(mean&**2))/(k-1)). PRINT {mean,BootSEM} /FORMAT='F8.3' /CLABEL='Mean','SDErr(*)' /RNAME=vname /TITLE='Bootstrapped Statistics'. PRINT/TITLE='(*) Std. Deviation of bootstrapped means'. * NP confidence interval *. * Ordered array: sorting algorithm by R Ristow & J Peck *. COMPUTE sortedbm=bootmean. COMPUTE sortedbm(GRADE(bootmean))=bootmean. COMPUTE lower1=sortedbm(k*0.025). COMPUTE upper1=sortedbm(1+k*0.975). * Parametric confidence interval *. LOOP tvalue95=1960 to 12706. - COMPUTE t95=tvalue95/1000. - COMPUTE onetail=1-tcdf(t95,(n-1)). - DO IF ABS(onetail-0.025) lt 0.00005. - BREAK. - END IF. END LOOP. COMPUTE lower2=mean-t95*BootSEM. COMPUTE upper2=mean+t95*BootSEM. PRINT {lower1,upper1;lower2,upper2} /FORMAT='F8.3' /CLABEL='Lower CL','Upper CL' /RLABEL='NonPar','Par' /TITLE='95%CI: Non parametric (percentiles 2.5 & 97.5) & Parametric (t based)'. SAVE bootmean /OUTFILE='C:\Temp\BootStrappedMeans.sav' /NAMES=vname. PRINT k /FORMAT='F8' /RLABEL='K=' /TITLE='K bootsampled means saved to C:\Temp\BootStrappedMeans.sav'. END MATRIX. RESTORE. GET FILE ='C:\Temp\BootStrappedMeans.sav' . DATASET NAME BootstrappedMeans. APPLY DICTIONARY /FROM OriginalData /SOURCE VARIABLES=ALL /FILEINFO /VARINFO VARLABEL. FREQUENCIES VARIABLES=copper /FORMAT=NOTABLE /HISTOGRAM NORMAL /STATISTICS=SKEWNESS KURTOSIS.