Estimation non paramétrique des quantiles de crue par la méthode des noyaux
P. F. Rasmussen
La détermination du débit de crue d'une période de retour donnée nécessite l'estimation de la distribution des crues annuelles. L'utilisation des distributions non paramétriques - comme alternative aux lois statistiques - est examinée dans cet ouvrage. Le principal défi dans l'estimation par la méthode des noyaux réside dans le calcul du paramètre qui détermine le degré de lissage de la densité non paramétrique. Nous avons comparé plusieurs méthodes et avons retenu la méthode plug-in et la méthode des moindres carrés avec validation croisée comme les plus prometteuses.
Plusieurs conclusions intéressantes ont été tirées de cette étude. Entre autres, pour l'estimation des quantiles de crue, il semble préférable de considérer des estimateurs basés directement sur la fonction de distribution plutôt que sur la fonction de densité. Une comparaison de la méthode plug-in à l'ajustement de trois lois statistiques a permis de conclure que la méthode des noyaux représente une alternative intéressante aux méthodes paramétriques traditionnelles.
Mots-clés : Crue, valeurs extrêmes, analyses statistiques, méthode des noyaux, paramètre de lissage, étude de comparaison
Nonparametric estimation of quantiles by the kernel method
Traditional flood frequency analysis involves the fitting of a statistical distribution to observed annual peak flows. The choice of statistical distribution is crucial, since it can have significant impact on design flow estimates. Unfortunately, it is often difficult to determine in an objective way which distribution is the most appropriate.
To avoid the inherent arbitrariness associated with the choice of distribution in parametric frequency analysis, one can employ a method based on nonparametric density estimation. Although potentially subject to larger standard error of quantile estimates, the use of nonparametric densities eliminates the need for selecting a particular distribution and the potential bias associated with a wrong choice.
The kernel method is a conceptually simple approach, similar in nature to a smoothed histogram. The critical parameter in kernel estimation is the smoothing parameter that determines the degree of smoothing. Methods for estimating the smoothing parameter have already been compared in a number of statistical papers. The novelty of our work is the particular emphasis on quantile estimation, in particular the estimation of quantiles outside the range of observed data. The flood estimation problem is unique in this sense and has been the motivating factor for this study.
Seven methods for estimating the smoothing parameter are compared in the paper. All methods are based on some goodness-of-fit measures. More specifically, we considered the least-squares cross-validation method, the maximum likelihood cross-validation method, Adamowski's (1985) method, a plug-in method developed by Altman and Leger (1995) and modified by the authors (Faucher et al., 2001), Breiman's goodness-of-fit criterion method (Breiman, 1977), the variable-kernel maximum likelihood method, and the variable-kernel least-squares cross-validation method.
The estimation methods can be classified according to whether they are based on fixed or variable kernels, and whether they are based on the goodness-of-fit of the density function or cumulative distribution function.
The quality of the different estimation methods was explored in a Monte Carlo study. Hundred (100) samples of sizes 10, 20, 50, and 100 were simulated from an LP3 distribution. The nonparametric estimation methods were then applied to each of the simulated samples, and quantiles with return period 10, 20, 50, 100, 200, and 1000 were estimated. Bias and root-mean square error of quantile estimates were the key figures used to compare methods. The results of the study can be summarized as follows :
1. Comparison of kernels. The literature reports that the kernel choice is relatively unimportant compared to the choice of the smoothing parameter. To determine whether this assertion also holds in the case of the estimation of large quantiles outside the range of data, we compared six different kernel candidates. We found no major differences between the biweight, the Normal, the Epanechnikov, and the EV1 kernels. However, the rectangular and the Cauchy kernel should be avoided.
2. Comparison of sample size. The quality of estimates, whether parametric or nonparametric, deteriorates as sample size decreases. To examine the degree of sensitivity to sample size, we compared estimates of the 200-year event obtained by assuming a GEV distribution and a nonparametric density estimated by maximum likelihood cross-validation. The main conclusion is that the root mean square error for the parametric model (GEV) is more sensitive to sample size than the nonparametric model.
3. Comparison of estimators of the smoothing parameter. Among the methods considered in the study, the plug-in method, developed by Altman and Leger (1995) and modified by the authors (Faucher et al. 2001), turned out to perform the best along with the least-squares cross-validation method which had a similar performance. Adamowski's method had to be excluded, because it consistently failed to converge. The methods based on variable kernels generally did not perform as well as the fixed kernel methods.
4. Comparison of density-based and cumulative distribution-based methods. The only cumulative distribution-based method considered in the comparison study was the plug-in method. Adamowski's method is also based on the cumulative distribution function, but was rejected for the reasons mentioned above. Although the plug-in method did well in the comparison, it is not clear whether this can be attributed to the fact that it is based on estimation of the cumulative distribution function. However, one could hypothesize that when the objective is to estimate quantiles, a method that emphasizes the cumulative distribution function rather than the density should have certain advantages.
5. Comparison of parametric and nonparametric methods. Nonparametric methods were compared with conventional parametric methods. The LP3, the 2-parameter lognormal, and the GEV distributions were used to fit the simulated samples. It was found that nonparametric methods perform quite similarly to the parametric methods. This is a significant result, because data were generated from an LP3 distribution so one would intuitively expect the LP3 model to be superior which however was not the case. In actual applications, flood distributions are often irregular and in such cases nonparametric methods would likely be superior to parametric methods.
Keywords: Annual peak flow, extreme values, statistical analysis, kernel method, smoothing parameter, comparison analysis
|Auteurs :||D. Faucher, P. F. Rasmussen et B. Bobée|
|Titre :||Estimation non paramétrique des quantiles de crue par la méthode des noyaux|
|Revue :||Revue des sciences de l'eau / Journal of Water Science, Volume 15, numéro 2, 2002, p. 515-541|
Tous droits réservés © Revue des sciences de l'eau, 2002