FowlkesMallowsIndex

FowlkesMallowsIndex computes the Fowlkes and Mallows index.

Syntax

  • ABk=FowlkesMallowsIndex(c1,c2)example
  • [ABk,Bk]=FowlkesMallowsIndex(___)example
  • [ABk,Bk,EBk]=FowlkesMallowsIndex(___)example
  • [ABk,Bk,EBk,VarBk]=FowlkesMallowsIndex(___)example

Description

Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.

This index can be used to compare either two cluster label sets or a cluster label set with a true label set. The formula of the adjusted Fowlkes-Mallows index (ABk) is given below

\[ ABk= \frac{\mbox{Bk- Expected value of Bk}}{\mbox{Max Index - Expected value of Bk}} \]

example

ABk =FowlkesMallowsIndex(c1, c2) FowlkesMallowsIndex (adjusted) with the two vectors as input.

example

[ABk, Bk] =FowlkesMallowsIndex(___) FM index (adjusted) with the contingency table as input.

example

[ABk, Bk, EBk] =FowlkesMallowsIndex(___) Compare FM (unadjusted) for iris data (true classification against tclust classification).

example

[ABk, Bk, EBk, VarBk] =FowlkesMallowsIndex(___) Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).

Examples

expand all

  • FowlkesMallowsIndex (adjusted) with the two vectors as input.
  • 
        % FowlkesMallowsIndex (adjusted) with the two vectors as input.
         c=[1 1;
            1 2
            2 1;
            2 2 ;
            2 2;
            2 3;
            3 3;
            3 3;
            3 3;
            3 3];
        % c1= numeric vector containing the labels of the first partition
        c1=c(:,1);
        % c1= numeric vector containing the labels of the second partition
        c2=c(:,2);
        FM=FowlkesMallowsIndex(c1,c2);
    

  • FM index (adjusted) with the contingency table as input.
  •     T=[1 1 0;
        1 2 1;
        0 0 4];
        FM=FowlkesMallowsIndex(T);
    

  • Compare FM (unadjusted) for iris data (true classification against tclust classification).
  •         load fisheriris
            % first partition c1 is the true partition
            c1=species;
            % second partition c2 is the output of tclust clustering procedure
            out=tclust(meas,3,0,100,'msg',0);
            c2=out.idx;
            [~,FM,EFM,VARFM]=FowlkesMallowsIndex(c1,c2);
    

  • Compare FM index (unadjusted) for iris data (exclude unassigned units from tclust).
  •         load fisheriris
            % first partition c1 is the true partition
            c1=species;
            % second partition c2 is the output of tclust clustering procedure
            out=tclust(meas,3,0.1,100,'msg',0);
            c2=out.idx;
            % Units inside c2 which contain number 0 are referred to trimmed observations
            noisecluster=0;
            [~,FM,EFM,VARFM]=RandIndexFS(c1,c2,noisecluster);
    

    Related Examples

  • FM index (unadjusted) for iris data with 3 groups coming from single linkage.
  • FM index between true and empirical classification

        load fisheriris
        d = pdist(meas);
        Z = linkage(d);
        C = cluster(Z,'maxclust',3);
        [AFM,FM,FMexp,FMvar]=FowlkesMallowsIndex(C,species);
        disp('FM index is equal to')
        disp(FM)
        disp('Expectation of FM index is')
        disp(FMexp)
        disp('Variance of FM index is')
        disp(FMvar)
        disp('Adjsuted FM index is equal to')
        disp(AFM)
    

  • Monitoring of (adjusted) FM index for iris data using true classification as benchmark.
  •     load fisheriris
        d = pdist(meas);
        Z = linkage(d);
        kk=1:15;
        % Produce agglomerative hierarchical cluster tree
        C = cluster(Z,'maxclust',kk);
        FM =zeros(length(kk)-1,1);
        for j=kk
            FM(j)=FowlkesMallowsIndex(C(:,j),species);
        end
        plot(kk,FM)
        xlabel('Number of groups')
        ylabel('Fowlkes and Mallows Index')
    

    Input Arguments

    expand all

    c1 — labels of first partition or contingency table. Numeric or character vector.

    A numeric or character vector containining the class labels of the first partition or a 2-dimensional numeric matrix which contains the cross-tabulation of cluster assignments.

    Data Types: single | double | char | logical

    Data Types: single| double

    c2 — labels of second partition. Numeric or character vector.

    A numeric or character vector containining the class labels of the second partition. The length of vector c2 must be equal to the length of vector c1. This second input is required just if c1 is not a 2-dimensional numeric matrix.

    Data Types: single | double | char | logical

    Data Types: single| double

    Output Arguments

    expand all

    ABk —Adjusted Fowlkes and Mallows index. Scalar

    A number between -1 and 1.

    The adjusted Fowlkes and Mallows index is the corrected-for-chance version of the Fowlkes and Mallows index.

    Bk —Value of the Fowlkes and Mallows index. Scalar

    A number between 0 and 1.

    EBk —Expectation of the Fowlkes and Mallows index. Scalar

    Expected value of the index computed under the null hypothesis of no-relation.

    VarBk —Variance of the Fowlkes and Mallows index. Scalar

    Variance of the index computed under the null hypothesis of no-relation.

    References

    Fowlkes E.B. and C.L. Mallows. (1983), A Method for Comparing Two Hierarchical Clusterings Author(s): Source: Journal of the American Statistical Association, Vol. 78, No. 383, pp. 553-569 http://en.wikipedia.org/wiki/Fowlkes-Mallows_index

    See Also

    This page has been automatically generated by our routine publishFS