چكيده لاتين
In many studies, the data under investigation are not mutually independent but instead exhibit spatial dependence, arising from the geographic locations of the observations. Such data, where spatial relationships are of central importance, are referred to as spatial data. The analysis of spatial data poses unique challenges due to spatial heterogeneity and autocorrelation, which necessitate the use of advanced analytical methods. With the growing availability of spatial and satellite remote sensing data, the limitations of traditional methods in handling spatial dependence have become increasingly evident. Recently, machine learning algorithms particularly random forest have been proposed as powerful alternatives for spatial prediction. However, standard implementations often lack a spatial component and tend to ignore the inherent spatial correlation present in geographic data.
This dissertation introduces and evaluates a series of random forest approaches for analyzing spatially dependent data. At first, the random forest spatial interpolation method is introduced, and its performance is evaluated using both simulated datasets and real-world precipitation and temperature data, and is compared with alternative interpolation methods such as nearest neighbor, trend surface analysis, inverse distance weighting, ordinary kriging, standard random forest, and spatial random forest. Subsequently, a model of the random forest, termed the geographical random forest, is proposed for spatial dependence analysis. This method serves both as a predictive tool and as an exploratory approach for modeling population distribution based on remote sensing variables. Finally, the extension of the geographic random forest model is presented as geographic weighted random forest, which increases the prediction accuracy by applying spatial weighting to the data. Also, the importance of spatial weighting of observations in local models is examined and demonstrated. The challenges related to bandwidth selection and its optimization with accuracy size criteria are also raised. The performance of the geographic weighted random forest will also be compared with other presented methods such as geographic weighted regression, geographical random forest, random forest and ordinary least square.