# randsampleFS

randsampleFS generates a random sample of k elements from the integers 1 to n (k<=n)

## Syntax

• y=randsampleFS(n,k)example
• y=randsampleFS(n,k,method)example
• y=randsampleFS(n,k,method,after2011b)example

## Description

 y =randsampleFS(n, k) randsampleFS with default options.

 y =randsampleFS(n, k, method) randsampleFS with optional argument set to method (2).

 y =randsampleFS(n, k, method, after2011b) randsampleFS with optional arguments set to method (3).

## Examples

expand all

### randsampleFS with default options.

default method (1) is used.

    randsampleFS(1000,10)

ans =

Columns 1 through 6

251         277         132         864         977         427

Columns 7 through 10

395        1000         971         152



### randsampleFS with optional argument set to method (2).

    method = 2;
randsampleFS(100,10,method)

ans =

69    79    89    99     9    19    29    39    49    59



### randsampleFS with optional arguments set to method (3).

    method = 3;
% Here, being nsel so big wrt nsamp, it is likely to obtain repetitions.
randsampleFS(100,10,method)


### randsampleFS Weighted Sampling Without Replacement.

Extract k=10 number in [-1000 -900] with gamma distributed weights.

     population = -1000:1:-900;
n = numel(population);
wgts = sort(random('gamma',0.3,2,n,1),'descend');

k=10;
y = randsampleFS(n,k,wgts);
sample  = population(y);

plot(wgts,'.r')
hold on;
text(y,wgts(y),'X');
title('Weight distribution with the extracted numbers superimposed')


## Input Arguments

### n — A vector of numbers will be selected from the integers 1 to n. Scalar, a positive integer.

Data Types: single|double

### k — The number of elements to be selected. Non negative integer.

Data Types: single|double

### method — Sampling methods. Scalar or vector.

Methods used to extract the subsets. See more about for details.

Default is method = 0.

- Scalar from 0 to 3 determining the method used to extract (without replacement) the random sample.

- Vector of weights: in such a case, a weighted sampling without replacement algorithm is applied using that vector of weights.

Example: randsampleFS(100,10,2) 

Data Types: single|double

### after2011b — MATLAB version flag. Logical.

Indicates if the MATLAB version in use is later than R2012a (7.14). Used to speed up computations in function subsets and, more in general, in simulation experiments which use randsampleFS intensively.

Example: randsampleFS(100,10,2,true) or  randsampleFS(100,10,2,~verLessThan('MATLAB','7.14'))

Data Types: logical

## Output Arguments

### y —A column vector of k values sampled at random from the integers 1:n. For methods 0, 1, 2 and weighted sampling the elements extracted are unique; For method 3 (included for historical reasons) there is no guarantee that the elements extracted are unique

Data Types - single|double.

The method=0 uses MATLAB function randperm. In old MATLAB releases randperm was slower than FSDA function shuffling, which is used in method 1 (for example, in R2009a - MATLAB 7.8 - randperm was at least 50 slower).

If method=1 the approach depends on the population and sample sizes:

- if $n < 1000$ and $k < n/(10 + 0.007n)$, that is if the population is relatively small and the desired sample is small compared to the population, we repeatedly sample with replacement until there are k unique values;

- otherwise, we do a random permutation of the population and return the first k elements.

The threshold $k < n/(10 + 0.007n)$ has been determined by simulation under MATLAB R2016b. Before, the threshold was $n < 4*k$.

If method=2 systematic sampling is used, where the starting point is random and the step is also random.

If method=3 random sampling is based on the old but well known Linear Congruential Generator (LCG) method. In this case there is no guarantee to get unique numbers. The method is included for historical reasons.

If method is a vector of n weights, then Weighted Sampling Without Replacement is applied. Our implementation follows Efraimidis and Spirakis (2006). MATLAB function datasample follows Wong and Easton (1980), which is also quite fast; note however that function datasample may be very slow if applied repetedly, for the large amount of time spent on options checking.

Remark on computation performances. Method=2 (systematic sampling) is by far the fastest for any practical population size $n$. For example, for $n \approx 10^6$ method=2 is two orders of magniture faster than method=1. With recent MATLAB releases (after R2011b) method = 0 (which uses compiled MATLAB function randperm) has comparable performances, at least for reasonably small $k$. In releases before 2012a, randperm was considerably slow.

## References

For Method 1. Fisher, R.A.; Yates, F. (1948) [1938]. Statistical tables for biological, agricultural and medical research (3rd ed.). London, Oliver & Boyd, pp. 26-27.

For Method 2. Cochran, William G. (1977). Sampling techniques (Third ed.). Wiley.

For Method 3. D. E. Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition. Addison-Wesley, 1997. Section 3.2.1: The Linear Congruential Method, pp. 10-26.

For Weighted Sampling Without Replacement:

Efraimidis, P.S. and Spirakis, P.G. (2006). Weighted random sampling with a reservoir.

Information Processing Letters, 97, 181-185, 2006;

Wong, C. K. and M. C. Easton. An Efficient Method for Weighted Sampling Without Replacement.

SIAM Journal of Computing 9(1), pp. 111-113, 1980.