wthin

WTHIN thin a uni/bi-dimensional dataset

Syntax

Description

Last modified 06-Feb-2016 Computes retention probabilities and bernoulli (0/1) weights on the basis of data density estimate.

example

Wt =wthin(X) Univariate thinning.

example

Wt =wthin(X, Name, Value) Bi-dimensional thinning.

example

[Wt, pretain] =wthin(___) Use of 'retainby' option.

example

[Wt, pretain, varargout] =wthin(___) Optional output Xt.

Examples

expand all

  • Univariate thinning.
  •     clear all; close all;
        % The dataset is bi-dimensional and contain two collinear groups with
        % regression structure. One group is dense, with 1000 units; the second
        % has 100 units. Thinning in done according to the density of the values
        % predicted by the OLS fit.
        x1 = randn(1000,1);
        x2 = 8 + randn(100,1);
        x = [x1 ; x2];
        y = 5*x + 0.9*randn(1100,1);
        b = [ones(1100,1) , x] \ y;
        yhat = [ones(1100,1) , x] * b;
        plot(x,y,'.',x,yhat,'--');
    
        % thinning over the predicted values
        [Wt,pretain] = wthin(yhat, 'retainby','comp2one');
    
        figure;
        plot(x(Wt,:),y(Wt,:),'k.',x(~Wt,:),y(~Wt,:),'r.');
        drawnow;
        axis manual;
        title('univariate thinning over predicted ols values')
        clickableMultiLegend(['Retained: ' num2str(sum(Wt))],['Thinned:   ' num2str(sum(~Wt))]);
    
    

  • Bi-dimensional thinning.
  • Same dataset, but thinning is done on the original bi-variate data.

       
        plot(x,y,'.');
    
        % thinning over the original bi-variate data
        [Wt2,pretain2] = wthin([x,y]);
    
        plot(x(Wt2,:),y(Wt2,:),'k.',x(~Wt2,:),y(~Wt2,:),'r.');
        drawnow;
        axis manual;
        title('bivariate thinning')
        clickableMultiLegend(['Retained: ' num2str(sum(Wt2))],['Thinned:   ' num2str(sum(~Wt2))]);
    

  • Use of 'retainby' option.
  • Since the thinning on the original bi-variate data with the default retention method ('inverse') removes too many units, let's try with the less conservative 'comp2one' option.

       
        plot(x,y,'.');
    
        % thinning over the original bi-variate data
        [Wt2,pretain2] = wthin([x,y], 'retainby','comp2one');
    
        plot(x(Wt2,:),y(Wt2,:),'k.',x(~Wt2,:),y(~Wt2,:),'r.');
        drawnow;
        axis manual
        clickableMultiLegend(['Retained: ' num2str(sum(Wt2))],['Thinned:   ' num2str(sum(~Wt2))]);
        title('"comp2one" thinning over the original bi-variate data');
        
    

  • Optional output Xt.
  • Same dataset, the retained data are also returned using varagout option.

       
        % thinning over the original bi-variate data
        [Wt2,pretain2,RetUnits] = wthin([x,y]);
        RetUnits
    

    Related Examples

  • thinning on the fishery dataset.
  •     load fishery;
        X=fishery.data;
        % some jittering is necessary because duplicated units are not treated
        % in tclustreg: this needs to be addressed
        X = X + 10^(-8) * abs(randn(677,2));
    
        % thinning over the original bi-variate data
        [Wt3,pretain3,RetUnits3] = wthin(X ,'retainby','comp2one');
        figure;
        plot(X(Wt3,1),X(Wt3,2),'k.',X(~Wt3,1),X(~Wt3,2),'rx');
        drawnow;
        axis manual
        clickableMultiLegend(['Retained: ' num2str(sum(Wt3))],['Thinned:   ' num2str(sum(~Wt3))]);
        title('"comp2one" thinning on the fishery dataset');
    
    

  • univariate thinning with less than 100 units.
  • As the first examp[le above, but with less than 100 units in the data.

        x1 = randn(85,1);
        x2 = 8 + randn(10,1);
        x = [x1 ; x2];
        y = 5*x + 0.9*randn(95,1);
        b = [ones(95,1) , x] \ y;
        yhat = [ones(95,1) , x] * b;
        plot(x,y,'.',x,yhat,'--');
    
        % thinning over the predicted values
        [Wt,pretain] = wthin(yhat, 'retainby','comp2one');
    
        plot(x(Wt,:),y(Wt,:),'k.',x(~Wt,:),y(~Wt,:),'r.');
        drawnow;
        axis manual
        title('univariate thinning over ols values predicted on a small dataset')
        clickableMultiLegend(['Retained: ' num2str(sum(Wt))],['Thinned:   ' num2str(sum(~Wt))]);
    
    

    Input Arguments

    expand all

    X — Input data. Vector or 2-column matrix.

    The structure contains the uni/bi-variate data to be thinned on the basis of a probability density estimate.

    Data Types: single| double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: ,

    bandwidth —bandwidth value.scalar.

    The bandwidth used to estimate the density. It can be estimated from the data using function bwe.

    Data Types - scalar

    Example:

    Data Types: scalar Example - bandwidth,0.35

    retainby —retention method.string.

    The function used to retain the observations. It can be:

    - 'comp2one', i.e. 1 - pdfe/max(pdfe)) - 'inverse' (default), i.e. (1 ./ pdfe) / max((1 ./ pdfe))) Data Types - char

    Example:

    Data Types: char Example - 'method','comp2one'

    Output Arguments

    expand all

    Wt —vector of Bernoulli weights. Vector

    Contains 1 for retained units and 0 for thinned units.

    Data Types - single | double.

    pretain —vector of retention probabilities. Vector

    These are the probabilities that each point in X will be retained, estimated using a gaussian kernel using function ksdensity.

    Data Types - single | double.

    varargout —Xt : vector of retained units. Vector

    It is X(Wt,:).

    Data Types - single | double.

    References

    A.W. Bowman and A. Azzalini (1997), "Applied Smoothing Techniques for Data Analysis," Oxford University Press.

    This page has been automatically generated by our routine publishFS