Drought-prone areas mapping using fuzzy c-means method in Gunungkidul district

Gunungkidul district is one of the districts in the Special Region of Yogyakarta that is frequently affected by drought disasters. The purpose of this study is to map drought-prone areas in Gunungkidul district using the fuzzy c-means method, making it easier for the government to allocate water-dropping assistance to drought-affected areas. The research variables include rainfall, soil type, infiltration, slope, and land use. The type of variables is an ordinal scale, so they must be transformed using the successive interval method before being analyzed using the fuzzy c-means method. The cluster validity indexes of the Xie and Beni index, partition coefficient, and modification partition coefficient were used to find the optimal k. The results of fuzzy c-means clustering revealed three clusters with a low level of vulnerability consisting of 7 sub-districts, a moderate level of vulnerability consisting of 8 sub-districts, and a high level of vulnerability consisting of 3 sub-districts. Rainfall, land use, soil type, infiltration, and slope were the drought hazard factors with the greatest to least effect in this study.


INTRODUCTION
The El Niño Southern Oscillation (ENSO) phenomenon is one of the causes of climate change in the world. Many studies have been conducted on the link of ENSO to climate anomalies throughout the world. Chiew et al. (1998) provided an overview of the relationship between ENSO and rainfall, drought, and streamflow in Australia. Baudoin et al. (2017) investigated how South Africa responded to drought over time due to the 2016 El Niño. Leitold et al. (2018) found that El Niño conditions accelerated canopy turnover in the central Amazon particularly during drought years. Ewbank et al. (2019) studied the resilience of rural communities in Nicaragua and Ethiopia to El Niñorelated drought prior to and during the agricultural season. Costa et al. (2021) identified rainfall and drought (2020) used the K-means algorithm to obtain the clustering of drought-prone areas based on the analysis of hotspots data in Riau, Indonesia. The K-means algorithm is widely used in various fields but its drawbacks include the need to specify the number of clusters in advance, sensitive to outliers, inability to deal with non-convex clusters of varying size and density, sensitive to scale of the data set, and different initial centroids producing difference results (Govender & Sivakumar, 2020). Moradi & Dariane (2015) demonstrated that evolving neural network (ENN) conditioned on fuzzy c-means (FCM) outperformed than K-means clustering based ENN and regular ENN. Alam & Paul (2020) reported that the fuzzy c-means (FCM) algorithm outperforms the K-means clustering in terms of cluster homogeneity rainfall gauge stations in Bangladesh. Hence, in this study the FCM is used to cluster droughtprone areas.
The FCM algorithm is a popular fuzzy clustering method based on fuzzy theory (Simhachalam & Ganesan, 2015). The FCM clustering method is a data clustering technique that determines the presence of data in a cluster based on different degrees of membership between 0 and 1 (Rahakbauw et al., 2017). In the fuzzy clustering approach, FCM is the most-known method and performs well in cluster detection (Pimentel & de Souza, 2016). Another advantage of the FCM is that it can show the relationship between different cluster patterns (Sharma & Kamal Borana, 2014). Although the FCM method is the most widely used, it has a weakness in determining the optimal number of clusters (Yang & Nataliani, 2017). Many cluster validity indices for fuzzy clustering algorithms have been proposed in the literature, such as the Xie Beni index (XBI) (Xie & Beni, 1991), partition coefficient (PC) (Bezdek, 1973), and modified partition coefficient (MPC) (Dave, 1996). Pertiwi & Kurniawan (2017) used FCM with validation indices such as the Xie and Beni index, partition coefficient, modified partition coefficient, and others to map flood-prone areas in Indonesia.
In statistical analysis, differences in data types greatly affect the choice of models or statistical tests. The measurement scale is a rule that is required to quantify data derived from measurement variables (Febtriko, 2017). The variable used in fuzzy logic must have a continuous value (Rizal & Hakim, 2015), whereas the variables in this study include ordinal scale. Therefore, they must be transformed from ordinal into interval scale. A method which is common to transform the ordinal scale into interval scale is successive interval method (Maranell, 2017). The purpose of the successive interval method is to convert ordinal scale into interval scale by changing the cumulative proportion of each variable in the category to the default normal curve value (Ningsih & Dukalang, 2019). Many studies used the successive interval method to transform the data in ordinal scale into interval scale (Herawaty ( 2014), Sofiani et al. (2017), Mardiana et al. (2020)).
The purpose of this study is to use the fuzzy c-means method to map the results of clustering the level of drought vulnerability in Gunungkidul district. The variables used in this study include rainfall, soil type, infiltration, slope, and land use; where all variables were on an ordinal scale. Hence, the successive interval method was employed to them prior to data analysis. The novel aspect of this paper is that it uses the FCM clustering method to group drought-prone areas in Gunungkidul district based on the transformed variables via the successive interval method.

Method Fuzzy C-Means Algorithm
Fuzzy c-means (FCM) is a data clustering technique that uses the degree of membership to determine the existence of each data in a cluster. The FCM was first introduced by Jim Bezdek in 1981. The basic concept of the FCM is to find the center of each cluster, which will represent the average location of each cluster. The cluster center in this initial stage is not accurate. Each data point has a degree of membership in each cluster. In order for the cluster center to be accurate, the cluster center and the degree of membership at each data point are repaired repeatedly. This iteration is based on the minimization of the objective function, which describes the distance from the data point to the center of the cluster and is weighted by the degree of membership of the data point. The output of fuzzy c-means is not a fuzzy inference system, but rather a series of cluster centers and degrees of membership for each data point (Kusumadewi & Purnomo, 2010). The following are the steps for the FCM algorithm (Bezdek, 1981). a. The input data to be clustered is in the form of a matrix of order × , where = the number of observations; and = the number of variables; = the th observation ( =1,2,…, ) from the attribute ( =1,2,…, ). b. Determine the number of clusters ( ≥ 2); the weighting exponent ( ∈ [1, ∞), for example = 2); maximum iteration (MaxIter); initial iteration ( = 1); smallest error( = a very small positive value). c. Determine the partition matrix (degree of membership) at random.
where is a partition matrix or a degree of membership; =∑ =1 = 1, =1,2,…, ; =1,2,…, . d. Calculate the cluster centers ( ) for each cluster using the equation: where is a cluster center, =1,2,…, c; =1,2,…, ; ∑ =1 is an element of the partition matrix ( =1,2,…, ; =1,2,…, ); is sample data ( =1,2,…, ; =1,2,…,p); is the weighting exponent. Then it can be obtained a matrix as follows: e. Calculate the distance between the object and the center of the cluster ( ), is the distance between the object and the center of the cluster ( =1,2,…, ; =1,2,…, ). The common distance used is the Euclidean distance. f. Calculates the objective function ( ) in the th iteration by using an equation: is the element of the partition matrix ( =1,2,…, ; =1,2,…, ). g. Fix the degree of membership of each object in each cluster (repair partition matrix) by using equations h. Determine a criterion for stopping automatically that is the objective function of the last iteration minus the objective function of the previous iteration. If | − −1 | < or > MaxIter, then it stops. If it does not stop, then repeat the steps in point (d to h).

Cluster Validity Index
The clustering results are then validated with the cluster validity index. Three cluster validity indexes used in this study are explained as follows. a. Xie Beni (XB) index Xie Beni (XB) index aims to calculate the ratio of the total variation within the group and the separation of groups. The lower the XB value, the better the group partition. The XB index formula is defined below (Xie & Beni, 1991), where is the membership value ( = 1,2, … , ; = 1,2, … , ); is the weighting exponent ∈ [1, ∞), for example =2; = number of objects; = number of clusters; 2 ( − ) is the distance between objects to the cluster center ( =1,2,…, ; =1,2,…, ). b. Partition Coefficient (PC) The Partition Coefficient aims to evaluate the membership value of each cluster regardless of the data. The PC value ranges between 0 and 1. The PC value that is close to 1 indicates the better cluster. The PC formula is defined below (Bezdek, 1973), is the membership value ( = 1,2, … , ; = 1,2, … , ); is the number of objects; and is the number of clusters. c. Modification Partition Coefficient (MPC) Modification Partition Coefficient aims to overcome the shortcomings of the PC, which has a tendency to a monotone value or dependence on c, so the PC is modified to MPC (Dave, 1996) by using the equation below, where is the number of clusters. The range of MPC value is between 0 and 1. The higher the MPC value, the better the cluster (Dave, 1996).

RESULTS AND DISCUSSION
This study maps drought-prone areas in Gunungkidul district using a geographic information system based on the results of FCM clustering. The data used in this study were transformed from the ordinal to interval scale using the successive interval method. The transformed data in Table 1 were used as the input matrix. The clustering uses the FCM algorithm with the conditions of = 3, = 2, = 10 9 , = 1, and = 1000. The input data is then processed using R Studio software with packages including ppclust (Cebeci et al., 2020), factoextra (Kassambara & Mundt, 2020), dplyr (Wickham et al., 2021), cluster (Maechler et al., 2021), fclust (Ferraro et al., 2019), psych (Revelle, 2021) and use the functions res.fcm <-fcm(x, centers = 3); res.fcm2<-ppclust2(res.fcm, "kmeans"); fviz_cluster(res.fcm2, data = x, ellipse.type = "convex", palette = "jco", repel = TRUE); res.fcm4<-ppclust2(res.fcm, "fclust"); idxpc <-PC(res.fcm4$U); paste("Partition Coefficient : ",idxpc); idxmpc <-MPC(res.fcm4$U); paste("Modified Partition Coefficient : ",idxmpc); clust=FKM(Data[,1:(ncol(Data))], k=3,m=2, stand = 0); xb=XB(clust$Xca, clust$U, clust$H, clust$m). Table 2 shows the cluster center at = 3. The cluster centers in Table 2 are used to repair the partition matrix U (membership degrees) with the equation (6), resulting in Table 3. The objective function value is calculated using equation (5). The objective function value in the 62nd iteration (last iteration) is 4.713489170 and the previous iteration is 4.713489170, so the difference is 0.0000000 or less than 0.000000001. According to Table 3, cluster 1 has 7 districts, cluster 2 has 3 districts, and cluster 3 has 8 districts.  The hazard category could be determined based on the cluster mean value and the characteristics of the variables in each cluster. The cluster mean is obtained from the average degree of membership in each cluster. Drought hazard factors with the greatest to lowest effect are rainfall, land use, distance to water sources, soil texture, and soil surface temperature (Darojati et al., 2015). According to Darojati et al. (2015) and the results of the clusters mean in Figure 3, the order of the drought hazard factors that have the greatest to lowest effect in this study are rainfall with a weight of 5, land use with a weight of 4, soil type with a weight of 3, infiltration with a weight of 2, slope with a weight of 1. The hazard category can be calculated from the multiplication between the cluster mean value and the weight of each variable, then added together to produce the hazard weight as in Table 5. The percentage of each category of variables can be used to analyze cluster characteristics (Table 6).

Cluster 1
Cluster 1 includes Wonosari, Paliyan, Ponjong, Playen, Semanu, Rongkop, and Karangmojo sub-districts. Cluster 1 is a non-drought prone area. Table 6 shows that the most frequent rainfall vulnerability has a percentage of 44%, with an average of 1000-1500 mm/year. The area is dominated by Mediterranean soil types of 47% and grumusol of 47%. The infiltration rate in the area in cluster 1 has a comparable percentage with slow category of 50% and moderate 50%. The slope in this area has a gentle slope because it has a percentage of 46%. The land use in cluster 1 is 10% protected forest and 6% conservation forest, so there is still a place for water absorption and storage. As a result, the sub-districts in Cluster 1 are classified as not prone to drought disaster.

Cluster 2
The sub-districts in cluster 2 include Semin, Ngawen, and Nglipar. Table 6 shows that the percentage of rainfall prone is < 1000 mm/year by 50%, and 1000-1500 mm/year by 50%. The area is dominated by Mediterranean soil types by 60%. The infiltration rate in cluster 2 has a 100% slow infiltration. The slope in this area is dominated by gentle slopes with the percentage of 60%. The land use in cluster 2 is 8% protected forest. Therefore, the subdistricts in cluster 2 are classified as prone to drought disaster.

Cluster 3
Sub-districts in cluster 3 include Purwosari, Girisubo, Saptosari, Patuk, Tepus, Tanjungsari, Gedangsari, and Panggang. These sub-districts are located in quite drought-prone areas. Table 6 shows that the rainfall was dominated by the vulnerable of 1500-2000 mm/year by 53%. The area is also dominated by Mediterranean soil types of 67%. The infiltration rate in cluster 3 has a 100% slow infiltration. The slope in this area is dominated by a moderate slope with the percentage of 47%. The land use in cluster 2 is 3% protected forest and 5% conservation forest. Therefore, the area in cluster 3 is classified as a quite prone to drought disaster.  7  23  3  23  8  21  Rainfed rice fields  0  0  0  0  4  11  Irrigation paddy  7  23  3  23  8  21  Garden  7  23  3  23  7  18  Production forest  5  16  3  23  8  21  Conservation forest  2  6  0  0  2  5  Protected forest  3  10  1  8  1  3  Sum  31  100  13  100  38  100 The results of the hazard category calculation and variable characteristics show that cluster 2 is the most dangerous level of vulnerability, followed by cluster 3, and cluster 1. Figure 4 is the clustering results as visualized by using a Geographic Information System (GIS). The Semin, Ngawen, and Nglipar sub-districts are particularly having high level of vulnerability. This result was in line with Sigit (2016a) that these three sub-districts were frequently affected by water scarcity, resulting in frequent droughts. The Purwosari, Girisubo, Saptosari, Patuk, Tepus, Tanjungsari, Gedangsari, and Panggang sub-districts have a fairly high level of vulnerability to drought. The Wonosari, Paliyan, Ponjong, Playen, Semanu, Rongkop, and Karangmojo sub-districts are the areas with low vulnerability to drought. A part of this result was supported by Fahmi (2016) that the Wonosari and Karangmojo sub-districts had good water absorption which might cause the low vulnerability level to drought.  (Kurniawan, 2016) and (Ika, 2016) 2. Drought incident in 2019 from social service water dropping data (Dinas Sosial DIY, 2019) Table 7 shows the differences in the results of FCM clustering with the news of drought events in 2016. The FCM method shows that Semin and Nglipar sub-districts have a high level of drought vulnerability, despite the fact that the news states that there is no drought (Kurniawan (2016) and (Ika, 2016)). The FCM method indicated that the sub-districts of Saptosari, Patuk, Tanjungsari, and Gedangsari have a medium level of vulnerability, but the news indicated that the areas did not experience drought. The FCM found that Rongkop sub-district has a low level of drought vulnerability, which contradicts the news, which states that the area is prone to drought. Table 7 also shows that the results of FCM clustering and drought news in 2016 have changed when compared to drought events based on social service dropping data. Some areas did not experience drought in 2016, but there was a demand for water dropping in 2019 and a drought occurred.