La notion d'appartenance partielle d'une station hydrométrique à une région hydrologique est modélisée par une fonction d'appartenance obtenue en appliquant les concepts de l'analyse floue. Les stations hydrométriques sont représentées dans des plans dont les axes sont des attributs hydrologiques et/ou physiographiques. Les régions hydrologiques sont considérées comme des sous-ensembles flous. Une méthode d'agrégation par cohérence (Iphigénie) permet d'établir des classes d'équivalence pour la relation floue "il n'y a pas d'incohérence entre les éléments d'une même classe": ce sont des classes d'équivalence qui représentent les régions floues. La fonction d'appartenance dans ce cas est stricte. Par opposition, la seconde méthode de type centres mobiles flous (ISODATA) permet d'attribuer un degré d'appartenance d'une station à une région floue dans l'intervalle [0,1]. Celle-ci reflète le degré d'appartenance de la station à un groupe donné (le nombre de groupes étant préalablement choisi de façon heuristique). Pour le cas traité (réseau hydrométrique tunisien, débits maximums annuels de crue), il s'avère cependant que le caractère flou des stations n'est pas très prononcé. Sur la base des agrégats obtenus par la méthode Iphigénie et des régions floues obtenues par ISODATA, est effectuée une estimation régionale des débits maximums de crue de période de retour 100 ans. Celle-ci est ensuite comparée à l'estimation régionale obtenue par la méthode de la région d'influence ainsi qu'à l'estimation utilisant les seules données du site, sous l'hypothèse que les populations parentes sont des lois Gamma à deux paramètres et Pareto à trois paramètres.
- 100-year flood,
- fuzzy sets,
A fuzzy approach to the delineation of region of influence for hydrometric stations
The concept of partial membership of a hydrometric station in a hydrologic region is modeled using fuzzy sets theory. Hydrometric stations are represented in spaces of hydrologic (coefficient of variation: CV, coefficient of skewness: CS, and their counterparts based on L- moments: L-CV and L-CS) and/or physiographic attributes (surface of watershed: S, specific flow: Qs=Qmoyen/S, and a shape index: Ic). Two fuzzy clustering methods are considered.
First a clustering method by coherence (Iphigénie) is considered. It is based on the principle of transitivity: if two pairs of stations (A,B) and (B,C) are known to be "close" to one another, then it is incoherent to state that A is "far" from C. Using a Euclidean distance, all pairs of stations are sorted from the closest pairs to the farthest. Then, the pairs of stations starting and ending this list are removed and classified respectively as "close" and "far". The process is then continued until an incoherence is detected. Clusters of stations are then determined from the graph of "close" stations. A disadvantage of Iphigénie is that crisp (non fuzzy) membership functions are obtained.
A second method of clustering is considered (ISODATA), which consists of minimizing fuzziness of clusters as measured by an objective function, and which can assign any degree of membership between 0 to 1 to a station to reflect its partial membership in a hydrologic region. It is a generalization of the classical method of mobile centers, in which crisp clusters minimizing entropy are obtained. When using Iphigénie, the number of clusters is determined automatically by the method, but for ISODATA it must be determined beforehand.
An application of both methods of clustering to the Tunisian hydrometric network (which consists of 39 stations, see Figure 1) is considered, with the objective of obtaining regional estimates of the flood frequency curves. Four planes are considered: P1: (Qs,CV), P2: (CS,CV), P3: (L-CS,L-CV), and P4: (S,Ic), based on a correlation study of the available variables (Table 1).
Figures 2, 3a, 4 and 5 show the clusters obtained using Iphigénie for planes P1 through P4. Estimates of skewness (CS) being quite biased and variable for small sample sizes, it was decided to determine the influence of sample size in the clusters obtained for P2. Figure 3b shows the clusters obtained when the network is restricted to the 20 stations of the network for which at least 20 observations of maximum annual flood are available. Fewer clusters are obtained than in Figure 3, but it can be observed that the structure is the same: additional clusters appearing in Figure 3 may be obtained by breaking up certain large clusters of Figure 3b. In Figure 3c, the sample size of each of the 39 stations of the network is plotted in the plane (CS,CV), to see if extreme estimated values of CS and CV were caused by small samples. This does not seem to be the case, since many of the most extreme points correspond to long series.
ISODATA was also applied to the network. Based on entropy criteria (Table 2, Figures 6a and 6b), the number of clusters for ISODATA was set to 4. It turns out that the groups obtained using ISODATA are not very fuzzy. The fuzzy groups determined by ISODATA are generally conditioned by only one variable, as shown by Figures 7a-7d, which respectively show the fuzzy clusters obtained for planes P1-P4. Only lines of iso-membership of level 0.9 were plotted to facilitate the analysis. For hydrologic spaces (P2 and P3), it is skewness (CS and L-CS) and for physiographic spaces (P1 and P4) it is surface (Qs and S).
Regionalization of the 100-year return period flood is performed based on the homogeneous groups obtained (using an index-flood method), and compared to the well-known region of influence (ROI) approach, both under the hypothesis of a 2-parameter Gamma distribution and a 3-parameter Pareto distribution. For the ROI approach, the threshold corresponding to the size of the ROI of a station is taken to be the distance at which an incoherence first appeared when applying Iphigénie. Correlation of the regional estimate with a local estimation for space P1 is 0.91 for Iphigénie and 0.85 both for ISODATA and the ROI approach. Relative bias of regional estimates of the 100-year flood based on P1 is plotted on Figures 9 (Gamma distribution) and Figure 10 (Pareto distribution). The three methods considered give similar results for a Gamma distribution, but Iphigénie estimates are less biased when a Pareto distribution is used. Thus Iphigénie appears superior, in this case, to ISODATA and ROI. Values of bias and standard error for all four planes are given for Iphigénie in Table 3.
Application of an index-flood regionalization approach at ungauged sites requires the estimation of mean flow (also called the flood index) from physiographic attributes. A regression study shows that the best explanatory variables are watershed surface S, the shape index Ic and the average slope of the river. In Figure 8, the observed flood index is plotted against the flood index obtained by regression. The correlation coefficient is 0.93.
Iphigénie and ISODATA could also be used in conjunction with other regionalization methods. For example, when using the ROI approach, it is necessary to, quite arbitrarily, determine the ROI threshold. It has been shown that this is a byproduct of the use of Iphigénie. ISODATA is most useful for pattern identification when the data is very fuzzy, unlike the example considered in this paper. But even in the case of the Tunisian network, its application gives indications as to which variables (skewness and surface) are most useful for clustering.