APPROXIMATE ENVELOPES FOR FINDING

AN UNKNOWN NUMBER OF MULTIVARIATE

OUTLIERS IN LARGE DATA SETS

Anthony C. Atkinson | Marco Riani | Laurini Fabrizio |

The London School of Economics, | Dipartimento di Economia, | Dipartimento di Economia, |

London WC2A 2AE, UK | Università di Parma, | Università di Parma, |

UK | Italy | Italy |

a.c.atkinson@lse.ac.uk | mriani@unipr.it | fabrizio.laurini@unipr.it |

**Abstract **

We provide thresholds for the test statistic for multiple outliers in multivariate normal samples. Except in small problems, direct simulation of the required combinations of sample size, number of outliers, dimension of the observations and significance level is excessively time consuming. We find the thresholds by rescaling a paradigmatic curve found by simulation. Our method is illustrated on an example with 1,827 observations.

**Additional material (figures using unscaled distances)**

Figure 3 | ps | |

Figure 4 | ps | |

Figure 5 | ps | |

Figure 7 | ps | |

Figure 8 | ps | |

Figure 9 | ps |

**Additional material (scatter plot matrices of supermarket
data)**

First 3 variables before transformation | ps | |

First 3 variables after transformation | ps | |

First 3 variables after transformation with outliers highlighted | ps | |

Variables 4, 5 and 9 after transformation | ps | |

Variables 4, 5 and 9 after transformation with outliers highlighted | ps |

**Supermarket data (transformed data)**

Excel format | xls |

Splus format | sdd |

Last modified 02/06/2009 00.40.58