The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.
Evaporation is a very important process; it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, evaporation is considered as a complex and nonlinear phenomenon to model. Thus, machine learning methods have gained popularity in this realm. In the present study, four machine learning methods of Gaussian Process Regression (GPR), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Regression (SVR) were used to predict the pan evaporation (PE). Meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W), and sunny hours (S) collected from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The results of this study showed that at Gonbad-e Kavus, Gorgan and Bandar Torkman stations, GPR with RMSE of 1.521 mm/day, 1.244 mm/day, and 1.254 mm/day, KNN with RMSE of 1.991 mm/day, 1.775 mm/day, and 1.577 mm/day, RF with RMSE of 1.614 mm/day, 1.337 mm/day, and 1.316 mm/day, and SVR with RMSE of 1.55 mm/day, 1.262 mm/day, and 1.275 mm/day had more appropriate performances in estimating PE values. It was found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W and S had the most accurate predictions and were proposed for precise estimation of PE. The findings of the current study indicated that the PE values may be accurately estimated with few easily measured meteorological parameters.
A novel combination of the ant colony optimization algorithm (ACO)and computational fluid dynamics (CFD) data is proposed for modeling the multiphase chemical reactors. The proposed intelligent model presents a probabilistic computational strategy for predicting various levels of three-dimensional bubble column reactor (BCR) flow. The results prove an enhanced communication between ant colony prediction and CFD data in different sections of the BCR.