Abstract:
The distribution of potential wetlands is crucial for the rational planning, effective management, and efficient conservation of wetland ecosystems. Wetlands are among the most productive and valuable ecosystems on Earth, providing essential ecological services such as water purification, climate regulation, flood control, and biodiversity support. However, wetlands worldwide are under significant threat due to climate change, urbanization, agricultural expansion, and other human activities, leading to their rapid degradation and loss. Therefore, understanding and simulating the distribution of potential wetlands is not only a scientific challenge but also a critical step toward sustainable wetland management and restoration. Despite the growing interest in using machine learning algorithms for ecological modeling, the optimization of these algorithms for simulating potential wetland distribution remains underdeveloped. This gap in knowledge motivated our study, which was conducted in the northeastern region of China. Northeast China is home to some of the most extensive and ecologically significant wetland resources in the country, including the Sanjiang Plain and the Songnen Plain. These wetlands are vital habitats for migratory birds and endangered species, and they play a crucial role in regional hydrological cycles and carbon sequestration. However, wetland loss and degradation in this region have been severe in recent decades, driven by agricultural expansion and urbanization. Thus, there is an urgent need to develop robust models to predict and understand the distribution of potential wetlands in this ecologically sensitive area. To address this challenge, our study leveraged geographic big data and a variety of machine learning algorithms, including random forest (RF), support vector machine (SVM), deep neural network (DNN), and extreme gradient boosting (XGBoost). These algorithms were chosen for their ability to handle high-dimensional data, capture complex nonlinear relationships, and provide reliable predictions. We integrated a comprehensive set of environmental factors, such as hydrology, soil properties, vegetation types, and topographic features, to construct a potential wetland distribution simulation system. This approach allowed us to simulate the distribution of potential wetlands in northeast China and identify the environmental conditions most conducive to wetland formation. The results of our study demonstrated that all four machine learning algorithms performed satisfactorily in simulating potential wetland distribution, with AUC values exceeding 0.69, indicating strong predictive capabilities. Among these algorithms, the random forest model achieved the highest accuracy, with an overall accuracy of 85.57% and a Kappa coefficient of 0.71. These metrics suggest that the random forest algorithm is particularly well-suited for modeling complex ecological systems like wetlands, where multiple environmental factors interact in nonlinear ways. Our findings revealed that the potential wetland area in northeast China is approximately 128 790 km². These potential wetlands are most likely to form in regions characterized by annual precipitation ranging from 400 to 600 mm, semi-hydric soils, and vegetation dominated by swamps and meadows. These environmental conditions align well with the known ecological requirements for wetland formation, such as adequate water availability, suitable soil moisture retention, and vegetation adapted to wet conditions. The results of this study provide a vital data foundation for the evaluation and conservation of wetlands in northeast China and even across the country. By identifying areas with high potential for wetland formation, our model offers valuable insights for prioritizing wetland restoration efforts and designing effective management strategies. Furthermore, the methodology developed in this study can be adapted and applied to other regions, contributing to global wetland conservation efforts. In conclusion, this research highlights the importance of integrating geographic big data with advanced machine learning techniques to address complex ecological challenges and support sustainable wetland management.