54 Informatik
Refine
Institute
Keywords
- Maschinelles Lernen (8)
- Machine learning (6)
- OA-Publikationsfonds2020 (4)
- machine learning (4)
- Deep learning (3)
- big data (3)
- OA-Publikationsfonds2018 (2)
- OA-Publikationsfonds2019 (2)
- clustering (2)
- data science (2)
A Hybrid Clustering and Classification Technique for Forecasting Short-Term Energy Consumption
(2018)
Electrical energy distributor companies in Iran have to announce their energy demand at least three 3-day ahead of the market opening. Therefore, an accurate load estimation is highly crucial. This research invoked methodology based on CRISP data mining and used SVM, ANN, and CBA-ANN-SVM (a novel hybrid model of clustering with both widely used ANN and SVM) to predict short-term electrical energy demand of Bandarabbas. In previous studies, researchers introduced few effective parameters with no reasonable error about Bandarabbas power consumption. In this research we tried to recognize all efficient parameters and with the use of CBA-ANN-SVM model, the rate of error has been minimized. After consulting with experts in the field of power consumption and plotting daily power consumption for each week, this research showed that official holidays and weekends have impact on the power consumption. When the weather gets warmer, the consumption of electrical energy increases due to turning on electrical air conditioner. Also, con-sumption patterns in warm and cold months are different. Analyzing power consumption of the same month for different years had shown high similarity in power consumption patterns. Factors with high impact on power consumption were identified and statistical methods were utilized to prove their impacts. Using SVM, ANN and CBA-ANN-SVM, the model was built. Sine the proposed method (CBA-ANN-SVM) has low MAPE 5 1.474 (4 clusters) and MAPE 5 1.297 (3 clusters) in comparison with SVM (MAPE 5 2.015) and ANN (MAPE 5 1.790), this model was selected as the final model. The final model has the benefits from both models and the benefits of clustering. Clustering algorithm with discovering data structure, divides data into several clusters based on similarities and differences between them. Because data inside each cluster are more similar than entire data, modeling in each cluster will present better results. For future research, we suggest using fuzzy methods and genetic algorithm or a hybrid of both to forecast each cluster. It is also possible to use fuzzy methods or genetic algorithms or a hybrid of both without using clustering. It is issued that such models will produce better and more accurate results.
This paper presents a hybrid approach to predict the electric energy usage of weather-sensitive loads. The presented methodutilizes the clustering paradigm along with ANN and SVMapproaches for accurate short-term prediction of electric energyusage, using weather data. Since the methodology beinginvoked in this research is based on CRISP data mining, datapreparation has received a gr eat deal of attention in thisresear ch. Once data pre-processing was done, the underlyingpattern of electric energy consumption was extracted by themeans of machine learning methods to precisely forecast short-term energy consumption. The proposed approach (CBA-ANN-SVM) was applied to real load data and resulting higher accu-racy comparing to the existing models.
2018 American Institute of Chemical Engineers Environ Prog, 2018
https://doi.org/10.1002/ep.12934
The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.
The classical Internet of things routing and wireless sensor networks can provide more precise monitoring of the covered area due to the higher number of utilized nodes. Because of the limitations in shared transfer media, many nodes in the network are prone to the collision in simultaneous transmissions. Medium access control protocols are usually more practical in networks with low traffic, which are not subjected to external noise from adjacent frequencies. There are preventive, detection and control solutions to congestion management in the network which are all the focus of this study. In the congestion prevention phase, the proposed method chooses the next step of the path using the Fuzzy decision-making system to distribute network traffic via optimal paths. In the congestion detection phase, a dynamic approach to queue management was designed to detect congestion in the least amount of time and prevent the collision. In the congestion control phase, the back-pressure method was used based on the quality of the queue to decrease the probability of linking in the pathway from the pre-congested node. The main goals of this study are to balance energy consumption in network nodes, reducing the rate of lost packets and increasing quality of service in routing. Simulation results proved the proposed Congestion Control Fuzzy Decision Making (CCFDM) method was more capable in improving routing parameters as compared to recent algorithms.
In this work, molecular separation of aqueous-organic was simulated by using combined soft computing-mechanistic approaches. The considered separation system was a microporous membrane contactor for separation of benzoic acid from water by contacting with an organic phase containing extractor molecules. Indeed, extractive separation is carried out using membrane technology where complex of solute-organic is formed at the interface. The main focus was to develop a simulation methodology for prediction of concentration distribution of solute (benzoic acid) in the feed side of the membrane system, as the removal efficiency of the system is determined by concentration distribution of the solute in the feed channel. The pattern of Adaptive Neuro-Fuzzy Inference System (ANFIS) was optimized by finding the optimum membership function, learning percentage, and a number of rules. The ANFIS was trained using the extracted data from the CFD simulation of the membrane system. The comparisons between the predicted concentration distribution by ANFIS and CFD data revealed that the optimized ANFIS pattern can be used as a predictive tool for simulation of the process. The R2 of higher than 0.99 was obtained for the optimized ANFIS model. The main privilege of the developed methodology is its very low computational time for simulation of the system and can be used as a rigorous simulation tool for understanding and design of membrane-based systems.
Highlights are, Molecular separation using microporous membranes. Developing hybrid model based on ANFIS-CFD for the separation process, Optimization of ANFIS structure for prediction of separation process
This research aims to model soil temperature (ST) using machine learning models of multilayer perceptron (MLP) algorithm and support vector machine (SVM) in hybrid form with the Firefly optimization algorithm, i.e. MLP-FFA and SVM-FFA. In the current study, measured ST and meteorological parameters of Tabriz and Ahar weather stations in a period of 2013–2015 are used for training and testing of the studied models with one and two days as a delay. To ascertain conclusive results for validation of the proposed hybrid models, the error metrics are benchmarked in an independent testing period. Moreover, Taylor diagrams utilized for that purpose. Obtained results showed that, in a case of one day delay, except in predicting ST at 5 cm below the soil surface (ST5cm) at Tabriz station, MLP-FFA produced superior results compared with MLP, SVM, and SVM-FFA models. However, for two days delay, MLP-FFA indicated increased accuracy in predicting ST5cm and ST 20cm of Tabriz station and ST10cm of Ahar station in comparison with SVM-FFA. Additionally, for all of the prescribed models, the performance of the MLP-FFA and SVM-FFA hybrid models in the testing phase was found to be meaningfully superior to the classical MLP and SVM models.
Coronary Artery Disease Diagnosis: Ranking the Significant Features Using a Random Trees Model
(2020)
Heart disease is one of the most common diseases in middle-aged citizens. Among the vast number of heart diseases, coronary artery disease (CAD) is considered as a common cardiovascular disease with a high death rate. The most popular tool for diagnosing CAD is the use of medical imaging, e.g., angiography. However, angiography is known for being costly and also associated with a number of side effects. Hence, the purpose of this study is to increase the accuracy of coronary heart disease diagnosis through selecting significant predictive features in order of their ranking. In this study, we propose an integrated method using machine learning. The machine learning methods of random trees (RTs), decision tree of C5.0, support vector machine (SVM), and decision tree of Chi-squared automatic interaction detection (CHAID) are used in this study. The proposed method shows promising results and the study confirms that the RTs model outperforms other models.
The longitudinal dispersion coefficient (LDC) plays an important role in modeling the transport of pollutants and sediment in natural rivers. As a result of transportation processes, the concentration of pollutants changes along the river. Various studies have been conducted to provide simple equations for estimating LDC. In this study, machine learning methods, namely support vector regression, Gaussian process regression, M5 model tree (M5P) and random forest, and multiple linear regression were examined in predicting the LDC in natural streams. Data sets from 60 rivers around the world with different hydraulic and geometric features were gathered to develop models for LDC estimation. Statistical criteria, including correlation coefficient (CC), root mean squared error (RMSE) and mean absolute error (MAE), were used to scrutinize the models. The LDC values estimated by these models were compared with the corresponding results of common empirical models. The Taylor chart was used to evaluate the models and the results showed that among the machine learning models, M5P had superior performance, with CC of 0.823, RMSE of 454.9 and MAE of 380.9. The model of Sahay and Dutta, with CC of 0.795, RMSE of 460.7 and MAE of 306.1, gave more precise results than the other empirical models. The main advantage of M5P models is their ability to provide practical formulae. In conclusion, the results proved that the developed M5P model with simple formulations was superior to other machine learning models and empirical models; therefore, it can be used as a proper tool for estimating the LDC in rivers.
In this study, machine learning methods of artificial neural networks (ANNs), least squares support vector machines (LSSVM), and neuro-fuzzy are used for advancing prediction models for thermal performance of a photovoltaic-thermal solar collector (PV/T). In the proposed models, the inlet temperature, flow rate, heat, solar radiation, and the sun heat have been considered as the input variables. Data set has been extracted through experimental measurements from a novel solar collector system. Different analyses are performed to examine the credibility of the introduced models and evaluate their performances. The proposed LSSVM model outperformed the ANFIS and ANNs models. LSSVM model is reported suitable when the laboratory measurements are costly and time-consuming, or achieving such values requires sophisticated interpretations.
FCS-MBFLEACH: Designing an Energy-Aware Fault Detection System for Mobile Wireless Sensor Networks
(2019)
Wireless sensor networks (WSNs) include large-scale sensor nodes that are densely distributed over a geographical region that is completely randomized for monitoring, identifying, and analyzing physical events. The crucial challenge in wireless sensor networks is the very high dependence of the sensor nodes on limited battery power to exchange information wirelessly as well as the non-rechargeable battery of the wireless sensor nodes, which makes the management and monitoring of these nodes in terms of abnormal changes very difficult. These anomalies appear under faults, including hardware, software, anomalies, and attacks by raiders, all of which affect the comprehensiveness of the data collected by wireless sensor networks. Hence, a crucial contraption should be taken to detect the early faults in the network, despite the limitations of the sensor nodes. Machine learning methods include solutions that can be used to detect the sensor node faults in the network. The purpose of this study is to use several classification methods to compute the fault detection accuracy with different densities under two scenarios in regions of interest such as MB-FLEACH, one-class support vector machine (SVM), fuzzy one-class, or a combination of SVM and FCS-MBFLEACH methods. It should be noted that in the study so far, no super cluster head (SCH) selection has been performed to detect node faults in the network. The simulation outcomes demonstrate that the FCS-MBFLEACH method has the best performance in terms of the accuracy of fault detection, false-positive rate (FPR), average remaining energy, and network lifetime compared to other classification methods.
A novel combination of the ant colony optimization algorithm (ACO)and computational fluid dynamics (CFD) data is proposed for modeling the multiphase chemical reactors. The proposed intelligent model presents a probabilistic computational strategy for predicting various levels of three-dimensional bubble column reactor (BCR) flow. The results prove an enhanced communication between ant colony prediction and CFD data in different sections of the BCR.