Octave has a great built-in function called

`nchoosek`

which makes shuffling a breeze. Called with scalar arguments it returns the value of the binomial coefficient, which is the number of ways you can choose k things from n things (k<=n). For fixed n, nchoosek is maximum when k=n/2. That is plotted below for n=2:24.Notice that's a log-log scale, as n increases it quickly becomes intractable to perform a complete permutation test.

Suppose we have two data vectors, and we want to know if they are from populations with different means. We can use the

`normrnd()`

function to draw 8 samples from a normal distribution with 0 mean and 8 samples from a normal distribution with a mean of 1.n=8;

x1 = normrnd(0,1,n,1);

x2 = normrnd(1,1,n,1);

perms = nchoosek( [x1;x2], n );

We stored all 12870 permutations of the vectors (16 choose 8) in the matrix

`perms`

. Now we use the built-in function `complement`

to find the difference of the means under each of those labellings.for(i=1:length(m1) )

comps(i,:) = complement( perms(i,:), [x1;x2] );

endfor

m1 = mean( perms, 2 ); % take row-wise averages, DIM=2

m2 = mean( comps, 2 ); % take row-wise averages, DIM=2

m_shuff = m1 - m2;

Now we can use the distribution of

`m_shuff`

to decide if the difference in the means of our two data vectors is significant.Octave script with all of these calculations: shuffle.m.

What about the bootstrap you say? That's really easy in Octave too:

ReplyDeletex = empirical_rnd( n, data );

Draws a bootstrap sample (that's with replacement) of size n from your data vector.