Meteorology and weather forecasting are crucial for predicting future
climate conditions. Forecasts can be helpful when they provide information
that can assist people in making better decisions. People today use big data
to analyze social media information accurately, including those who rely on
the weather forecast. Recent years have seen the widespread use of machine
learning and deep learning for managing messages on social media sites like
Twitter. In this study, authors analyzed weather-related text in Indonesia
based on the searches made on Twitter. A total of three machine learning
algorithms were examined: support vector machine (SVM), multinomial
logistic regression (MLR), and multinomial Naive Bayes (MNB), as well as
the pretrained bidirectional encoder representations of transformers (BERT),
which was fine-tuned over multiple layers to ensure effective classification.
The accuracy of the BERT model, calculated using the F1-score of 99%, was
higher than that of any other machine learning method. Those results have
been incorporated into a web-based weather information system. The
classification result was mapped using Esri Maps application programming
interface (API) based on the geolocation of the data.
Real-time monitoring system for weather and air pollutant measurement with HT...journalBEEI
This article discusses devising an IoT system to monitor weather parameters and gas pollutants in the air along with anHTML web-based application. Weather parameters measured include; speed and direction of the wind, rainfall, air temperature and humidity, barometric pressure, and UV index. On the other side, the gases measured are; ammonia, hydrogen, methane, ozone, carbon monoxide, and carbon dioxide. This article is introducing a technique to send all parameter data. All parameters read by each sensor are converted into a string then joined into a string dataset, where this dataset is sent to the server periodically. On the UI side, the dataset that has been downloaded from the server-parsed for processing and then displayed. This system uses Google Firebase as a real-time database server for sensor data. Also, using the GitHub platform as a web hosting. The web application uses the HTML programming platform. The results of this study indicate that the device operates successfully to provide information about the weather and gases condition as real-time data.
Experimental of vectorizer and classifier for scrapped social media dataTELKOMNIKA JOURNAL
In this study, we used several classifiers and vectorizers to see their effect on processing social media data. In this study, the classifiers used were random forest, logistic regression, Bernoulli Naive Bayes (NB), and support vector clustering (SVC). Random forests are used to reduce spatial complexity, and also to minimize errors. Logistic regression is a method with a statistical model whose basic form uses a logistic function to represent the binary dependent variable. Then, the Naive Bayes function uses binary elements and SVC which has so far given good results rivals other guided learning. Our tests use social media data. Based on the tests that have been carried out on classifier variations and vectorizer variations, it was found that the best classifier is a linear regression algorithm based on predictive adaptive compared to the random forest method based on decision trees, probability-based Bernoulli NB and SVC which work by clustering. Meanwhile, from the test results on the count vectorizer, term frequency-inverse document frequency (TFIDF), and hashing, the best accuracy is achieved on the TFIDF vectorizer. In this case, it means that the TFIDF vectorizer has a better value in presenting word feature dimensions.
Intelligent aquaculture system for pisciculture simulation using deep learnin...nooriasukmaningtyas
The project aims to develop an intelligent system for simulating pisciculture in Taal Lake in the Philippines through geographical information system and deep learning algorithm. Records of 2018-2020 from the database of Bureau of fisheries and aquatic resources IV-A-protected area management board (BFAR IVA-PAMB) was collected for model development. Deep learning algorithm model was developed and integrated to the system for time series analysis and simulation. Different technologies including tensorflow.js were used to successfully developed the intelligent system. It is found on this paper that recurrent neural network (RNN) is a good deep learning algorithm for predicting pisciculture in Taal lake. Further, it is also shown in the initial visualization of the system that barangay Sampaloc in Taal has highest rate of fish production in Taal while Tilapia nilotica sp. is the major product of the latter.
An internet of things ecosystem for planting of coriander (Coriandrum sativum...IJECEIAES
The internet of things (IoT) is a network of physical devices and is becoming a major area of innovation for computer-based systems. Agriculture is one of the areas which could be improved by utilizing this technology ranging from farming techniques to production efficiency. The objective of this research is to design an IoT to monitor local vegetable (Coriander; Coriandrum sativum L.) growth via sensors (light, humidity, temperature, water level) and combine with an automated watering system. This would provide planters with the ability to monitor field conditions from anywhere at any time. In this research, a group of local vegetables including coriander, cilantro, and dill weed were experimented. The prototype system consists of several smart sensors to accurately monitor the mentioned vegetable growth from seedling stage to a fully grown plant which will ensure the highest production levels from any field environment. Three different types coriander were measured under these parameters: height, trunk width, and leaf width. The result showed that IoT ecosystem for planting different types of coriander could produce effective and efficient plant growth and ready for harvest with a shorter time than conventional method.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
As people freely express their opinions toward a product on Twitter streams without being bound
by time, visualizing time pattern of customers emotional behavior can play a crucial role in decisionmaking.
We analyze how emotions are fluctuated in pattern and demonstrate how we can explore it into
useful visualizations with an appropriate framework. We manually customized the current framework in
order to improve a state-of-the-art of crawling and visualizing Twitter data. The data, post or update on
status on the Twitter website about iPhone, was collected from U.S.A, Japan, Indonesia, and Taiwan by
using geographical bounding-box and visualized it into two-dimensional heat map, interactive stream
graph, and context focus via brushing visualization. The results show that our proposed system can
explore uniqueness of temporal pattern of customers emotional behavior.
Implementation of Integration VaaMSN and SEMAR for Wide Coverage Air Quality ...TELKOMNIKA JOURNAL
The current air quality monitoring system cannot cover a large area, not real-time and has not
implemented big data analysis technology with high accuracy. The purpose of an integration Mobile
Sensor Network and Internet of Things system is to build air quality monitoring system that able to monitor
in wide coverage. This system consists of Vehicle as a Mobile Sensors Network (VaaMSN) as edge
computing and Smart Environment Monitoring and Analytic in Real-time (SEMAR) cloud computing.
VaaMSN is a package of air quality sensor, GPS, 4G Wi-Fi modem and single board computing. SEMAR
cloud computing has a time-series database for real-time visualization, Big Data environment and analytics
use the Support Vector Machines (SVM) and Decision Tree (DT) algorithm. The output from the system
are maps, table, and graph visualization. The evaluation obtained from the experimental results shows that
the accuracy of both algorithms reaches more than 90%. However, Mean Square Error (MSE) value of
SVM algorithm about 0.03076293, but DT algorithm has 10x smaller MSE value than SVM algorithm.
Remote sensing and geographic information systems technics for spatial-based...IJECEIAES
Indonesia's land-use and land-cover change (LULCC) is a global concern. The relocation plan of the capital city of Indonesia to East Kalimantan will be becoming an environmental issue. Knowing the latest land cover change modeling and prediction research is essential for fundamental knowledge in spatial planning and policies for regional development. Five articles related to integrated technology of geographic information systems (GIS) and remote sensing for spatial modeling were reviewed and compared using nine variables: title, journal (ranks), keywords, objectives, data sources, variables, location, method, and main findings. The results show that the variables that significantly affect LULCC are height, slope, distance from the road, and distance from the built-up area. The artificial neural network-based cellular automata (ANN-CA) method could be the best approach to model the LULCC. Furthermore, by the current availability of global multi-temporal and multi-sensor remote sensing data, the LULCC modeling study can be limitless.
Real-time monitoring system for weather and air pollutant measurement with HT...journalBEEI
This article discusses devising an IoT system to monitor weather parameters and gas pollutants in the air along with anHTML web-based application. Weather parameters measured include; speed and direction of the wind, rainfall, air temperature and humidity, barometric pressure, and UV index. On the other side, the gases measured are; ammonia, hydrogen, methane, ozone, carbon monoxide, and carbon dioxide. This article is introducing a technique to send all parameter data. All parameters read by each sensor are converted into a string then joined into a string dataset, where this dataset is sent to the server periodically. On the UI side, the dataset that has been downloaded from the server-parsed for processing and then displayed. This system uses Google Firebase as a real-time database server for sensor data. Also, using the GitHub platform as a web hosting. The web application uses the HTML programming platform. The results of this study indicate that the device operates successfully to provide information about the weather and gases condition as real-time data.
Experimental of vectorizer and classifier for scrapped social media dataTELKOMNIKA JOURNAL
In this study, we used several classifiers and vectorizers to see their effect on processing social media data. In this study, the classifiers used were random forest, logistic regression, Bernoulli Naive Bayes (NB), and support vector clustering (SVC). Random forests are used to reduce spatial complexity, and also to minimize errors. Logistic regression is a method with a statistical model whose basic form uses a logistic function to represent the binary dependent variable. Then, the Naive Bayes function uses binary elements and SVC which has so far given good results rivals other guided learning. Our tests use social media data. Based on the tests that have been carried out on classifier variations and vectorizer variations, it was found that the best classifier is a linear regression algorithm based on predictive adaptive compared to the random forest method based on decision trees, probability-based Bernoulli NB and SVC which work by clustering. Meanwhile, from the test results on the count vectorizer, term frequency-inverse document frequency (TFIDF), and hashing, the best accuracy is achieved on the TFIDF vectorizer. In this case, it means that the TFIDF vectorizer has a better value in presenting word feature dimensions.
Intelligent aquaculture system for pisciculture simulation using deep learnin...nooriasukmaningtyas
The project aims to develop an intelligent system for simulating pisciculture in Taal Lake in the Philippines through geographical information system and deep learning algorithm. Records of 2018-2020 from the database of Bureau of fisheries and aquatic resources IV-A-protected area management board (BFAR IVA-PAMB) was collected for model development. Deep learning algorithm model was developed and integrated to the system for time series analysis and simulation. Different technologies including tensorflow.js were used to successfully developed the intelligent system. It is found on this paper that recurrent neural network (RNN) is a good deep learning algorithm for predicting pisciculture in Taal lake. Further, it is also shown in the initial visualization of the system that barangay Sampaloc in Taal has highest rate of fish production in Taal while Tilapia nilotica sp. is the major product of the latter.
An internet of things ecosystem for planting of coriander (Coriandrum sativum...IJECEIAES
The internet of things (IoT) is a network of physical devices and is becoming a major area of innovation for computer-based systems. Agriculture is one of the areas which could be improved by utilizing this technology ranging from farming techniques to production efficiency. The objective of this research is to design an IoT to monitor local vegetable (Coriander; Coriandrum sativum L.) growth via sensors (light, humidity, temperature, water level) and combine with an automated watering system. This would provide planters with the ability to monitor field conditions from anywhere at any time. In this research, a group of local vegetables including coriander, cilantro, and dill weed were experimented. The prototype system consists of several smart sensors to accurately monitor the mentioned vegetable growth from seedling stage to a fully grown plant which will ensure the highest production levels from any field environment. Three different types coriander were measured under these parameters: height, trunk width, and leaf width. The result showed that IoT ecosystem for planting different types of coriander could produce effective and efficient plant growth and ready for harvest with a shorter time than conventional method.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
As people freely express their opinions toward a product on Twitter streams without being bound
by time, visualizing time pattern of customers emotional behavior can play a crucial role in decisionmaking.
We analyze how emotions are fluctuated in pattern and demonstrate how we can explore it into
useful visualizations with an appropriate framework. We manually customized the current framework in
order to improve a state-of-the-art of crawling and visualizing Twitter data. The data, post or update on
status on the Twitter website about iPhone, was collected from U.S.A, Japan, Indonesia, and Taiwan by
using geographical bounding-box and visualized it into two-dimensional heat map, interactive stream
graph, and context focus via brushing visualization. The results show that our proposed system can
explore uniqueness of temporal pattern of customers emotional behavior.
Implementation of Integration VaaMSN and SEMAR for Wide Coverage Air Quality ...TELKOMNIKA JOURNAL
The current air quality monitoring system cannot cover a large area, not real-time and has not
implemented big data analysis technology with high accuracy. The purpose of an integration Mobile
Sensor Network and Internet of Things system is to build air quality monitoring system that able to monitor
in wide coverage. This system consists of Vehicle as a Mobile Sensors Network (VaaMSN) as edge
computing and Smart Environment Monitoring and Analytic in Real-time (SEMAR) cloud computing.
VaaMSN is a package of air quality sensor, GPS, 4G Wi-Fi modem and single board computing. SEMAR
cloud computing has a time-series database for real-time visualization, Big Data environment and analytics
use the Support Vector Machines (SVM) and Decision Tree (DT) algorithm. The output from the system
are maps, table, and graph visualization. The evaluation obtained from the experimental results shows that
the accuracy of both algorithms reaches more than 90%. However, Mean Square Error (MSE) value of
SVM algorithm about 0.03076293, but DT algorithm has 10x smaller MSE value than SVM algorithm.
Remote sensing and geographic information systems technics for spatial-based...IJECEIAES
Indonesia's land-use and land-cover change (LULCC) is a global concern. The relocation plan of the capital city of Indonesia to East Kalimantan will be becoming an environmental issue. Knowing the latest land cover change modeling and prediction research is essential for fundamental knowledge in spatial planning and policies for regional development. Five articles related to integrated technology of geographic information systems (GIS) and remote sensing for spatial modeling were reviewed and compared using nine variables: title, journal (ranks), keywords, objectives, data sources, variables, location, method, and main findings. The results show that the variables that significantly affect LULCC are height, slope, distance from the road, and distance from the built-up area. The artificial neural network-based cellular automata (ANN-CA) method could be the best approach to model the LULCC. Furthermore, by the current availability of global multi-temporal and multi-sensor remote sensing data, the LULCC modeling study can be limitless.
Indonesia is currently carrying out an industrial revolution 4.0. This
revolution discusses the application of technology in the industrial sector,
one of which is the agricultural sector. In addition to discussing the
application of technology, this revolution also supports the use of renewable
energy sources and one of them is the application of solar energy. The
application of technology in the agricultural sector is expected to help
farmers in maintaining crops to reduce the possibility of crop failure. The
existence of this statement makes researchers conduct research in the design
and construction of systems with internet of things (IoT) technology and
utilize solar energy sources as energy sources for the system. The IoT
system will utilize the ATmega328P+ESP8266 RobotDyn microcontroller
by utilizing the DHT22, MD0127, soil moisture sensor, and BH1750FVI
sensors and sending data to Thingspeak by utilizing the internet network
with HTTP communication protocols. The system can monitor ecological
factors in gardens with a fairly good degree of accuracy and the utilization of
solar energy can run the system properly.
India is one of the most vulnerable developing countries to suffer very often from various natural
disasters, namely drought, flood, cyclone, earthquake, landslide, forest fire etc. Which strike causing a
devastating impact on human life, economy and environment, Though it is almost impossible to fully recoup
the damage caused by the disasters, it is possible to minimize the potential risks by developing early warning
strategies. The recent advancements in space technology and satellite remote sensing playing a crucial role
in efficient mitigation of disasters. There is a desperate need of establishing early warning systems in order
to raise alerts for taking preventive measures before a natural hazard occurs. One of the dangerous natural
hazards for a country like India having a long coastal line is tsunami. On 26th December 2004, the Indian
coastline experienced the most devastating tsunami in recorded history and Tsunami inundation in coastal
zones caused damage to buildings, infrastructures, as well as properties and poses the threat to lives.
The ITEWS comprises a real-time network of seismic stations, Bottom Pressure Recorders (BPR),
tide gauges and 24 X 7 operational tsunami warning centre to detect tsunamigenic earthquakes, to monitor
tsunamis and to provide timely advisories to vulnerable community by means of latest communication
methods with back-end support of a pre-run scenario database and Decision Support System (DSS).The
National Tsunami Early Warning Centre at INCOIS is operational since October 2007. The Indian National
Tsunami Early Warning Centre started exchanging service level-I earthquake information basically
qualitative tsunami advisory about tsunamigenic potential. India is now geared up to provide service level-II
bulletins for the Indian Ocean region (based as tsunami numerical modeling and with use of open ocean
propagation tsunami scenario data base.) service level-III in this level inundation vulnerability mapping for
identified vulnerable regions.
Service level-I was operated in 2007.Service level-II was operated in 2011.Service level-III is
initiating now. Here we are using sources for this one is spatial data sets of open ocean propagation tsunami
scenario data base. Spatial data sets of coastal inundation modeling inputs: i) identification of high
vulnerable coastal regions from MHVM (Multi-Hazard Vulnerability Map), ii) high resolution coastal
topography, iii) bathymetry data, iv) observation networks.) Spatial layers of inundation model and
respective grids. Centralized spatial data base server. Data reception, data processing, tsunami
vulnerability mapping, inundation decision making and tsunami advisory generation .
Short-term wind speed forecasting system using deep learning for wind turbine...IJECEIAES
It is very important to accurately detect wind direction and speed for wind energy that is one of the essential sustainable energy sources. Studies on the wind speed forecasting are generally carried out for long-term predictions. One of the main reasons for the long-term forecasts is the correct planning of the area where the wind turbine will be built due to the high investment costs and long-term returns. Besides that, short-term forecasting is another important point for the efficient use of wind turbines. In addition to estimating only average values, making instant and dynamic short-term forecasts are necessary to control wind turbines. In this study, short-term forecasting of the changes in wind speed between 1-20 minutes using deep learning was performed. Wind speed data was obtained instantaneously from the feedback of the emulated wind turbine's generator. These dynamically changing data was used as an input of the deep learning algorithm. Each new data from the generator was used as both test and training input in the proposed approach. In this way, the model accuracy and enhancement were provided simultaneously. The proposed approach was turned into a modular independent integrated system to work in various wind turbine applications. It was observed that the system can predict wind speed dynamically with around 3% error in the applications in the test setup applications.
Implementation of environmental monitoring based on KAA IoT platformjournalBEEI
Wireless sensor network (WSN) is a key to access the internet of things (IoT). The popularity of IoT and the prediction that there will be more devices connected to the Internet cause difficulties in integrating and making connected devices. The problem of IoT implementation are the lack of real-time data collection, processing, and the inability to provide continuous monitoring. To overcome these problems, this paper proposes an IoT device for monitoring environmental conditions through the IoT KAA platform that can be monitored anywhere and anytime in real time. The end device node consists of several sensors such as as temperature, humidity, carbon monoxide (CO) and carbon dioxide (CO2) sensors. The collected data from the end device node will be transmitted via a communication based on IEEE 802.15.4 to Raspberry Pi gateway, then sent to the KAA cloud server and saved into the database. The environmental data can be accessed via a web-based sensor application. We Analize the performance evaluation in terms of transaction, availability, data transfer, response time, transaction rate, throughput, and concurrency. The experimental result shows that the use of KAA IoT platform is better than that without platform.
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...CSITiaesprime
This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. This study aims to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiments can change along with the development of Indonesian tourism itself.
Intelligent flood disaster warning on the fly: developing IoT-based managemen...journalBEEI
The number of natural disasters occurring yearly is increasing at an alarming rate which has caused a great concern over the well-being of human lives and economy sustenance. The rainfall pattern has also been affected and this has caused immense amount of flood cases in recent times. Flood disasters are damaging to economy and human lives. Yearly, millions of people are affected by floods in Asia alone. This has brought the attention of the government to develop a flood forecasting method to reduce flood casualties. In this article, a flood mitigation method will be evaluated which incorporates a miniaturized flow, water level sensor and pressure gauge. The data from the two sensors are used to predict flood status using a 2-class neural network. Real-time monitoring of the data from the sensor into Thingspeak channel were possible with the use of NodeMCU ESP8266. Furthermore, Microsoft’s Azure Machine Learning (AzureML) has built-in 2-class neural network which was used to predict flood status according to predefine rule. The prediction model has been published as Web services through AzureML service and it enables prediction as new data are available. The experimental result showed that using 3 hidden layers has the highest accuracy of 98.9% and precision of 100% when 2-class neural network
is used.
Weather monitoring and forecasting are very important in agricultural sectors. There are several data need to be collected in real-time to support weather monitoring and forecasting systems, such as temperature, humidity, air pressure, wind speed, wind direction, and rainfall. The purpose of this research to develop a real-time weather monitoring system using a parallel computation approach and analyze the computational performance (i.e., speed up and efficiency) using the ARIMA model. The developed system wireless has been implemented on sensor networks (WSN) platform using Arduino and Raspberry Pi devices and web-based platform for weather visualization and monitoring. The experimental data used in our research work is a set of weather data acquired and collected from January until March 2017 in Bogor area. The result of this research is that the speed up of the using eight processors computation three times faster than using a single processor, with the efficiency of 50%.
Water monitoring and analytic based ThingSpeak IJECEIAES
Diseases associated with bad water have largely reported cases annually leading to deaths, therefore the water quality monitoring become necessary to provide safe water. Traditional monitoring includes manual gathering of samples from different points on the distributed site, and then testing in laboratory. This procedure has proven that it is ineffective because it is laborious, lag time and lacks online results to enhance proactive response to water pollution. Emergence of the Internet of Things (IoT) and step towards the smart life poses the successful using of IoT. This paper presents a water quality monitoring using IoT based ThingSpeak platform that provides analytic tools and visualization using MATLAB programming. The proposed model is used to test water samples using sensor fusion technique such as TDS and Turbidity, and then uploading data online to ThingSpeak platform to monitor and analyze. The system notifies authorities when there are water quality parameters out of a predefined set of normal values. A warning will be notified to user by IFTTT protocol.
Android mobile application for wildfire reporting and monitoringriyaniaes
Peat fires cause major environmental problems in Central Kalimantan Province, Indonesia and threaten human health and effect the social-economic sector. The lack of peat fire detection systems is one factor that causing these reoccurring fires. Therefore, in this study, we develop an Android mobile platform application and a web-based application to support the citizen-volunteers who want to contribute wildfires reports, and the decision-makers who wish to collect, visualize, and evaluate these wildfires reports. In this paper, the global navigation satellite system (GNSS) and a global position system (GPS) sensor from a smartphone’s camera, is a useful tool to show the potential fire and smoke’s close-range location. The exchangeable image (EXIF) file image and GPS metadata captured by a mobile phone can store and supply raw observation to our devices and sent it to the data center through global internet communication. This work’s results are the proposed application easy-to-use to monitoring potential peat fire by location and data activity. This paper focuses on developing an application for the mobile platform for peat fire reporting and a web-based application to collect peat fire location for decision-makers. Our main objective is to detect the potential and spread of fire in peatlands as early as possible by utilizing community reports using smartphones.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
More Related Content
Similar to Twitter-based classification for integrated source data of weather observations
Indonesia is currently carrying out an industrial revolution 4.0. This
revolution discusses the application of technology in the industrial sector,
one of which is the agricultural sector. In addition to discussing the
application of technology, this revolution also supports the use of renewable
energy sources and one of them is the application of solar energy. The
application of technology in the agricultural sector is expected to help
farmers in maintaining crops to reduce the possibility of crop failure. The
existence of this statement makes researchers conduct research in the design
and construction of systems with internet of things (IoT) technology and
utilize solar energy sources as energy sources for the system. The IoT
system will utilize the ATmega328P+ESP8266 RobotDyn microcontroller
by utilizing the DHT22, MD0127, soil moisture sensor, and BH1750FVI
sensors and sending data to Thingspeak by utilizing the internet network
with HTTP communication protocols. The system can monitor ecological
factors in gardens with a fairly good degree of accuracy and the utilization of
solar energy can run the system properly.
India is one of the most vulnerable developing countries to suffer very often from various natural
disasters, namely drought, flood, cyclone, earthquake, landslide, forest fire etc. Which strike causing a
devastating impact on human life, economy and environment, Though it is almost impossible to fully recoup
the damage caused by the disasters, it is possible to minimize the potential risks by developing early warning
strategies. The recent advancements in space technology and satellite remote sensing playing a crucial role
in efficient mitigation of disasters. There is a desperate need of establishing early warning systems in order
to raise alerts for taking preventive measures before a natural hazard occurs. One of the dangerous natural
hazards for a country like India having a long coastal line is tsunami. On 26th December 2004, the Indian
coastline experienced the most devastating tsunami in recorded history and Tsunami inundation in coastal
zones caused damage to buildings, infrastructures, as well as properties and poses the threat to lives.
The ITEWS comprises a real-time network of seismic stations, Bottom Pressure Recorders (BPR),
tide gauges and 24 X 7 operational tsunami warning centre to detect tsunamigenic earthquakes, to monitor
tsunamis and to provide timely advisories to vulnerable community by means of latest communication
methods with back-end support of a pre-run scenario database and Decision Support System (DSS).The
National Tsunami Early Warning Centre at INCOIS is operational since October 2007. The Indian National
Tsunami Early Warning Centre started exchanging service level-I earthquake information basically
qualitative tsunami advisory about tsunamigenic potential. India is now geared up to provide service level-II
bulletins for the Indian Ocean region (based as tsunami numerical modeling and with use of open ocean
propagation tsunami scenario data base.) service level-III in this level inundation vulnerability mapping for
identified vulnerable regions.
Service level-I was operated in 2007.Service level-II was operated in 2011.Service level-III is
initiating now. Here we are using sources for this one is spatial data sets of open ocean propagation tsunami
scenario data base. Spatial data sets of coastal inundation modeling inputs: i) identification of high
vulnerable coastal regions from MHVM (Multi-Hazard Vulnerability Map), ii) high resolution coastal
topography, iii) bathymetry data, iv) observation networks.) Spatial layers of inundation model and
respective grids. Centralized spatial data base server. Data reception, data processing, tsunami
vulnerability mapping, inundation decision making and tsunami advisory generation .
Short-term wind speed forecasting system using deep learning for wind turbine...IJECEIAES
It is very important to accurately detect wind direction and speed for wind energy that is one of the essential sustainable energy sources. Studies on the wind speed forecasting are generally carried out for long-term predictions. One of the main reasons for the long-term forecasts is the correct planning of the area where the wind turbine will be built due to the high investment costs and long-term returns. Besides that, short-term forecasting is another important point for the efficient use of wind turbines. In addition to estimating only average values, making instant and dynamic short-term forecasts are necessary to control wind turbines. In this study, short-term forecasting of the changes in wind speed between 1-20 minutes using deep learning was performed. Wind speed data was obtained instantaneously from the feedback of the emulated wind turbine's generator. These dynamically changing data was used as an input of the deep learning algorithm. Each new data from the generator was used as both test and training input in the proposed approach. In this way, the model accuracy and enhancement were provided simultaneously. The proposed approach was turned into a modular independent integrated system to work in various wind turbine applications. It was observed that the system can predict wind speed dynamically with around 3% error in the applications in the test setup applications.
Implementation of environmental monitoring based on KAA IoT platformjournalBEEI
Wireless sensor network (WSN) is a key to access the internet of things (IoT). The popularity of IoT and the prediction that there will be more devices connected to the Internet cause difficulties in integrating and making connected devices. The problem of IoT implementation are the lack of real-time data collection, processing, and the inability to provide continuous monitoring. To overcome these problems, this paper proposes an IoT device for monitoring environmental conditions through the IoT KAA platform that can be monitored anywhere and anytime in real time. The end device node consists of several sensors such as as temperature, humidity, carbon monoxide (CO) and carbon dioxide (CO2) sensors. The collected data from the end device node will be transmitted via a communication based on IEEE 802.15.4 to Raspberry Pi gateway, then sent to the KAA cloud server and saved into the database. The environmental data can be accessed via a web-based sensor application. We Analize the performance evaluation in terms of transaction, availability, data transfer, response time, transaction rate, throughput, and concurrency. The experimental result shows that the use of KAA IoT platform is better than that without platform.
Trends in sentiment of Twitter users towards Indonesian tourism: analysis wit...CSITiaesprime
This research analyzes the sentiment of Twitter users regarding tourism in Indonesia using the keyword "wonderful Indonesia" as the tourism promotion identity. This study aims to gain a deeper understanding of the public sentiment towards "wonderful Indonesia" through social media data analysis. The novelty obtained provides new insights into valuable information about Indonesian tourism for the government and relevant stakeholders in promoting Indonesian tourism and enhancing tourist experiences. The method used is tweet analysis and classification using the K-nearest neighbor (KNN) algorithm to determine the positive, neutral, or negative sentiment of the tweets. The classification results show that the majority of tweets (65.1% out of a total of 14,189 tweets) have a neutral sentiment, indicating that most tweets with the "wonderful Indonesia" tagline are related to advertising or promoting Indonesian tourism. However, the percentage of tweets with positive sentiment (33.8%) is higher than those with negative sentiment (1.1%). This study also achieved training results with an accuracy rate of 98.5%, precision of 97.6%, recall of 98.5%, and F1-score of 98.1%. However, reassessment is needed in the future as Twitter users' sentiments can change along with the development of Indonesian tourism itself.
Intelligent flood disaster warning on the fly: developing IoT-based managemen...journalBEEI
The number of natural disasters occurring yearly is increasing at an alarming rate which has caused a great concern over the well-being of human lives and economy sustenance. The rainfall pattern has also been affected and this has caused immense amount of flood cases in recent times. Flood disasters are damaging to economy and human lives. Yearly, millions of people are affected by floods in Asia alone. This has brought the attention of the government to develop a flood forecasting method to reduce flood casualties. In this article, a flood mitigation method will be evaluated which incorporates a miniaturized flow, water level sensor and pressure gauge. The data from the two sensors are used to predict flood status using a 2-class neural network. Real-time monitoring of the data from the sensor into Thingspeak channel were possible with the use of NodeMCU ESP8266. Furthermore, Microsoft’s Azure Machine Learning (AzureML) has built-in 2-class neural network which was used to predict flood status according to predefine rule. The prediction model has been published as Web services through AzureML service and it enables prediction as new data are available. The experimental result showed that using 3 hidden layers has the highest accuracy of 98.9% and precision of 100% when 2-class neural network
is used.
Weather monitoring and forecasting are very important in agricultural sectors. There are several data need to be collected in real-time to support weather monitoring and forecasting systems, such as temperature, humidity, air pressure, wind speed, wind direction, and rainfall. The purpose of this research to develop a real-time weather monitoring system using a parallel computation approach and analyze the computational performance (i.e., speed up and efficiency) using the ARIMA model. The developed system wireless has been implemented on sensor networks (WSN) platform using Arduino and Raspberry Pi devices and web-based platform for weather visualization and monitoring. The experimental data used in our research work is a set of weather data acquired and collected from January until March 2017 in Bogor area. The result of this research is that the speed up of the using eight processors computation three times faster than using a single processor, with the efficiency of 50%.
Water monitoring and analytic based ThingSpeak IJECEIAES
Diseases associated with bad water have largely reported cases annually leading to deaths, therefore the water quality monitoring become necessary to provide safe water. Traditional monitoring includes manual gathering of samples from different points on the distributed site, and then testing in laboratory. This procedure has proven that it is ineffective because it is laborious, lag time and lacks online results to enhance proactive response to water pollution. Emergence of the Internet of Things (IoT) and step towards the smart life poses the successful using of IoT. This paper presents a water quality monitoring using IoT based ThingSpeak platform that provides analytic tools and visualization using MATLAB programming. The proposed model is used to test water samples using sensor fusion technique such as TDS and Turbidity, and then uploading data online to ThingSpeak platform to monitor and analyze. The system notifies authorities when there are water quality parameters out of a predefined set of normal values. A warning will be notified to user by IFTTT protocol.
Android mobile application for wildfire reporting and monitoringriyaniaes
Peat fires cause major environmental problems in Central Kalimantan Province, Indonesia and threaten human health and effect the social-economic sector. The lack of peat fire detection systems is one factor that causing these reoccurring fires. Therefore, in this study, we develop an Android mobile platform application and a web-based application to support the citizen-volunteers who want to contribute wildfires reports, and the decision-makers who wish to collect, visualize, and evaluate these wildfires reports. In this paper, the global navigation satellite system (GNSS) and a global position system (GPS) sensor from a smartphone’s camera, is a useful tool to show the potential fire and smoke’s close-range location. The exchangeable image (EXIF) file image and GPS metadata captured by a mobile phone can store and supply raw observation to our devices and sent it to the data center through global internet communication. This work’s results are the proposed application easy-to-use to monitoring potential peat fire by location and data activity. This paper focuses on developing an application for the mobile platform for peat fire reporting and a web-based application to collect peat fire location for decision-makers. Our main objective is to detect the potential and spread of fire in peatlands as early as possible by utilizing community reports using smartphones.
Similar to Twitter-based classification for integrated source data of weather observations (20)
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
K-centroid convergence clustering identification in one-label per type for di...IAESIJAI
Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC3 I system with the most important features and remain the same accuracy.
Plant leaf detection through machine learning based image classification appr...IAESIJAI
Since maize is a staple diet for people, especially vegetarians and vegans, maize leaf disease has a significant influence here on the food industry including maize crop productivity. Therefore, it should be understood that maize quality must be optimal; yet, to do so, maize must be safeguarded from several illnesses. As a result, there is a great demand for such an automated system that can identify the condition early on and take the appropriate action. Early disease identification is crucial, but it also poses a major obstacle. As a result, in this research project, we adopt the fundamental k-nearest neighbor (KNN) model and concentrate on building and developing the enhanced k-nearest neighbor (EKNN) model. EKNN aids in identifying several classes of disease. To gather discriminative, boundary, pattern, and structurally linked information, additional high-quality fine and coarse features are generated. This information is then used in the classification process. The classification algorithm offers high-quality gradient-based features. Additionally, the proposed model is assessed using the Plant-Village dataset, and a comparison with many standard classification models using various metrics is also done.
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
In this work, we propose a novel backbone search method for object detection for applications in intrusion warning systems. The goal is to find a compact model for use in embedded thermal imaging cameras widely used in intrusion warning systems. The proposed method is based on faster region-based convolutional neural network (Faster R-CNN) because it can detect small objects. Inspired by EfficientNet, the sought-after backbone architecture is obtained by finding the most suitable width scale for the base backbone (ResNet50). The evaluation metrics are mean average precision (mAP), number of parameters, and number of multiply–accumulate operations (MACs). The experimental results showed that the proposed method is effective in building a lightweight neural network for the task of object detection. The obtained model can keep the predefined mAP while minimizing the number of parameters and computational resources. All experiments are executed elaborately on the person detection in intrusion warning systems (PDIWS) dataset.
Deep learning method for lung cancer identification and classificationIAESIJAI
Lung cancer (LC) is calming many lives and is becoming a serious cause of concern. The detection of LC at an early stage assists the chances of recovery. Accuracy of detection of LC at an early stage can be improved with the help of a convolutional neural network (CNN) based deep learning approach. In this paper, we present two methodologies for Lung cancer detection (LCD) applied on Lung image database consortium (LIDC) and image database resource initiative (IDRI) data sets. Classification of these LC images is carried out using support vector machine (SVM), and deep CNN. The CNN is trained with i) multiple batches and ii) single batch for LC image classification as non cancer and cancer image. All these methods are being implemented in MATLAB. The accuracy of classification obtained by SVM is 65%, whereas deep CNN produced detection accuracy of 80% and 100% respectively for multiple and single batch training. The novelty of our experimentation is near 100% classification accuracy obtained by our deep CNN model when tested on 25 Lung computed tomography (CT) test images each of size 512×512 pixels in less than 20 iterations as compared to the research work carried out by other researchers using cropped LC nodule images.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
Embedded artificial intelligence system using deep learning and raspberrypi f...IAESIJAI
Melanoma is a kind of skin cancer that originates in melanocytes responsible for producing melanin, it can be a severe and potentially deadly form of cancer because it can metastasize to other regions of the body if not detected and treated early. To facilitate this process, Recently, various computer-assisted low-cost, reliable, and accurate diagnostic systems have been proposed based on artificial intelligence (AI) algorithms, particularly deep learning techniques. This work proposed an innovative and intelligent system that combines the internet of things (IoT) with a Raspberry Pi connected to a camera and a deep learning model based on the deep convolutional neural network (CNN) algorithm for real-time detection and classification of melanoma cancer lesions. The key stages of our model before serializing to the Raspberry Pi: Firstly, the preprocessing part contains data cleaning, data transformation (normalization), and data augmentation to reduce overfitting when training. Then, the deep CNN algorithm is used to extract the features part. Finally, the classification part with applied Sigmoid Activation Function. The experimental results indicate the efficiency of our proposed classification system as we achieved an accuracy rate of 92%, a precision of 91%, a sensitivity of 91%, and an area under the curve- receiver operating characteristics (AUC-ROC) of 0.9133.
Deep learning based biometric authentication using electrocardiogram and irisIAESIJAI
Authentication systems play an important role in wide range of applications. The traditional token certificate and password-based authentication systems are now replaced by biometric authentication systems. Generally, these authentication systems are based on the data obtained from face, iris, electrocardiogram (ECG), fingerprint and palm print. But these types of models are unimodal authentication, which suffer from accuracy and reliability issues. In this regard, multimodal biometric authentication systems have gained huge attention to develop the robust authentication systems. Moreover, the current development in deep learning schemes have proliferated to develop more robust architecture to overcome the issues of tradition machine learning based authentication systems. In this work, we have adopted ECG and iris data and trained the obtained features with the help of hybrid convolutional neural network- long short-term memory (CNN-LSTM) model. In ECG, R peak detection is considered as an important aspect for feature extraction and morphological features are extracted. Similarly, gabor-wavelet, gray level co-occurrence matrix (GLCM), gray level difference matrix (GLDM) and principal component analysis (PCA) based feature extraction methods are applied on iris data. The final feature vector is obtained from MIT-BIH and IIT Delhi Iris dataset which is trained and tested by using CNN-LSTM. The experimental analysis shows that the proposed approach achieves average accuracy, precision, and F1-core as 0.985, 0.962 and 0.975, respectively.
Hybrid channel and spatial attention-UNet for skin lesion segmentationIAESIJAI
Melanoma is a type of skin cancer which has affected many lives globally. The American Cancer Society research has suggested that it a serious type of skin cancer and lead to mortality but it is almost 100% curable if it is detected and treated in its early stages. Currently automated computer vision-based schemes are widely adopted but these systems suffer from poor segmentation accuracy. To overcome these issue, deep learning (DL) has become the promising solution which performs extensive training for pattern learning and provide better classification accuracy. However, skin lesion segmentation is affected due to skin hair, unclear boundaries, pigmentation, and mole. To overcome this issue, we adopt UNet based deep learning scheme and incorporated attention mechanism which considers low level statistics and high-level statistics combined with feedback and skip connection module. This helps to obtain the robust features without neglecting the channel information. Further, we use channel attention, spatial attention modulation to achieve the final segmentation. The proposed DL based scheme is instigated on publically available dataset and experimental investigation shows that the proposed Hybrid Attention UNet approach achieves average performance as 0.9715, 0.9962, 0.9710.
Photoplethysmogram signal reconstruction through integrated compression sensi...IAESIJAI
The transmission of photoplethysmogram (PPG) signals in real-time is extremely challenging and facilitates the use of an internet of things (IoT) environment for healthcare- monitoring. This paper proposes an approach for PPG signal reconstruction through integrated compression sensing and basis function aware shallow learning (CSBSL). Integrated-CSBSL approach for combined compression of PPG signals via multiple channels thereby improving the reconstruction accuracy for the PPG signals essential in healthcare monitoring. An optimal basis function aware shallow learning procedure is employed on PPG signals with prior initialization; this is further fine-tuned by utilizing the knowledge of various other channels, which exploit the further sparsity of the PPG signals. The proposed method for learning combined with PPG signals retains the knowledge of spatial and temporal correlation. The proposed Integrated-CSBSL approach consists of two steps, in the first step the shallow learning based on basis function is carried out through training the PPG signals. The proposed method is evaluated using multichannel PPG signal reconstruction, which potentially benefits clinical applications through PPG monitoring and diagnosis.
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Multi-channel microseismic signals classification with convolutional neural n...IAESIJAI
Identifying and classifying microseismic signals is essential to warn of mines’ dangers. Deep learning has replaced traditional methods, but labor-intensive manual identification and varying deep learning outcomes pose challenges. This paper proposes a transfer learning-based convolutional neural network (CNN) method called microseismic signals-convolutional neural network (MS-CNN) to automatically recognize and classify microseismic events and blasts. The model was instructed on a limited sample of data to obtain an optimal weight model for microseismic waveform recognition and classification. A comparative analysis was performed with an existing CNN model and classical image classification models such as AlexNet, GoogLeNet, and ResNet50. The outcomes demonstrate that the MS-CNN model achieved the best recognition and classification effect (99.6% accuracy) in the shortest time (0.31 s to identify 277 images in the test set). Thus, the MS-CNN model can efficiently recognize and classify microseismic events and blasts in practical engineering applications, improving the recognition timeliness of microseismic signals and further enhancing the accuracy of event classification.
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...IAESIJAI
Efficient and accurate coronavirus disease (COVID-19) surveillance necessitates robust identification of individuals wearing face masks. This research introduces the sophisticated face mask dataset (SFMD), a comprehensive compilation of high-quality face mask images enriched with detailed annotations on mask types, fits, and usage patterns. Leveraging cutting-edge deep learning models—EfficientNet-B2, ResNet50, and MobileNet-V2—, we compare SFMD against two established benchmarks: the real-world masked face dataset (RMFD) and the masked face recognition dataset (MFRD). Across all models, SFMD consistently outperforms RMFD and MFRD in key metrics, including accuracy, precision, recall, and F1 score. Additionally, our study demonstrates the dataset's capability to cultivate robust models resilient to intricate scenarios like low-light conditions and facial occlusions due to accessories or facial hair.
Transfer learning for epilepsy detection using spectrogram imagesIAESIJAI
Epilepsy stands out as one of the common neurological diseases. The neural activity of the brain is observed using electroencephalography (EEG). Manual inspection of EEG brain signals is a slow and arduous process, which puts heavy load on neurologists and affects their performance. The aim of this study is to find the best result of classification using the transfer learning model that automatically identify the epileptic and the normal activity, to classify EEG signals by using images of spectrogram which represents the percentage of energy for each coefficient of the continuous wavelet. Dataset includes the EEG signals recorded at monitoring unit of epilepsy used in this study to presents an application of transfer learning by comparing three models Alexnet, visual geometry group (VGG19) and residual neural network ResNet using different combinations with seven different classifiers. This study tested the models and reached a different value of accuracy and other metrics used to judge their performances, and as a result the best combination has been achieved with ResNet combined with support vector machine (SVM) classifier that classified EEG signals with a high success rate using multiple performance metrics such as 97.22% accuracy and 2.78% the value of the error rate.
Deep neural network for lateral control of self-driving cars in urban environ...IAESIJAI
The exponential growth of the automotive industry clearly indicates that self-driving cars are the future of transportation. However, their biggest challenge lies in lateral control, particularly in urban bottlenecking environments, where disturbances and obstacles are abundant. In these situations, the ego vehicle has to follow its own trajectory while rapidly correcting deviation errors without colliding with other nearby vehicles. Various research efforts have focused on developing lateral control approaches, but these methods remain limited in terms of response speed and control accuracy. This paper presents a control strategy using a deep neural network (DNN) controller to effectively keep the car on the centerline of its trajectory and adapt to disturbances arising from deviations or trajectory curvature. The controller focuses on minimizing deviation errors. The Matlab/Simulink software is used for designing and training the DNN. Finally, simulation results confirm that the suggested controller has several advantages in terms of precision, with lateral deviation remaining below 0.65 meters, and rapidity, with a response time of 0.7 seconds, compared to traditional controllers in solving lateral control.
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...IAESIJAI
Recently, cardiovascular diseases (CVDs) have become a rapidly growing problem in the world, especially in developing countries. The latter are facing a lifestyle change that introduces new risk factors for heart disease, that requires a particular and urgent interest. Besides, cardiomegaly is a sign of cardiovascular diseases that refers to various conditions; it is associated with the heart enlargement that can be either transient or permanent depending on certain conditions. Furthermore, cardiomegaly is visible on any imaging test including Chest X-Radiation (X-Ray) images; which are one of the most common tools used by Cardiologists to detect and diagnose many diseases. In this paper, we propose an innovative deep learning (DL) model based on an attention module and MobileNet architecture to recognize Cardiomegaly patients using the popular Chest X-Ray8 dataset. Actually, the attention module captures the spatial relationship between the relevant regions in Chest X-Ray images. The experimental results show that the proposed model achieved interesting results with an accuracy rate of 81% which makes it suitable for detecting cardiomegaly disease.
Efficient commodity price forecasting using long short-term memory modelIAESIJAI
Predicting commodity prices, particularly food prices, is a significant concern for various stakeholders, especially in regions that are highly sensitive to commodity price volatility. Historically, many machine learning models like autoregressive integrated moving average (ARIMA) and support vector machine (SVM) have been suggested to overcome the forecasting task. These models struggle to capture the multifaceted and dynamic factors influencing these prices. Recently, deep learning approaches have demonstrated considerable promise in handling complex forecasting tasks. This paper presents a novel long short-term memory (LSTM) network-based model for commodity price forecasting. The model uses five essential commodities namely bread, meat, milk, oil, and petrol. The proposed model focuses on advanced feature engineering which involves moving averages, price volatility, and past prices. The results reveal that our model outperforms traditional methods as it achieves 0.14, 3.04%, and 98.2% for root mean square error (RMSE), mean absolute percentage error (MAPE), and R-squared (R2 ), respectively. In addition to the simplicity of the model, which consists of an LSTM single-cell architecture that reduced the training time to a few minutes instead of hours. This paper contributes to the economic literature on price prediction using advanced deep learning techniques as well as provides practical implications for managing commodity price instability globally.
1-dimensional convolutional neural networks for predicting sudden cardiacIAESIJAI
Sudden cardiac arrest (SCA) is a serious heart problem that occurs without symptoms or warning. SCA causes high mortality. Therefore, it is important to estimate the incidence of SCA. Current methods for predicting ventricular fibrillation (VF) episodes require monitoring patients over time, resulting in no complications. New technologies, especially machine learning, are gaining popularity due to the benefits they provide. However, most existing systems rely on manual processes, which can lead to inefficiencies in disseminating patient information. On the other hand, existing deep learning methods rely on large data sets that are not publicly available. In this study, we propose a deep learning method based on one-dimensional convolutional neural networks to learn to use discrete fourier transform (DFT) features in raw electrocardiogram (ECG) signals. The results showed that our method was able to accurately predict the onset of SCA with an accuracy of 96% approximately 90 minutes before it occurred. Predictions can save many lives. That is, optimized deep learning models can outperform manual models in analyzing long-term signals.
A deep learning-based approach for early detection of disease in sugarcane pl...IAESIJAI
In many regions of the nation, agriculture serves as the primary industry. The farming environment now faces a number of challenges to farmers. One of the major concerns, and the focus of this research, is disease prediction. A methodology is suggested to automate a process for identifying disease in plant growth and warning farmers in advance so they can take appropriate action. Disease in crop plants has an impact on agricultural production. In this work, a novel DenseNet-support vector machine: explainable artificial intelligence (DNet-SVM: XAI) interpretation that combines a DenseNet with support vector machine (SVM) and local interpretable model-agnostic explanation (LIME) interpretation has been proposed. DNet-SVM: XAI was created by a series of modifications to DenseNet201, including the addition of a support vector machine (SVM) classifier. Prior to using SVM to identify if an image is healthy or un-healthy, images are first feature extracted using a convolution network called DenseNet. In addition to offering a likely explanation for the prediction, the reasoning is carried out utilizing the visual cue produced by the LIME. In light of this, the proposed approach, when paired with its determined interpretability and precision, may successfully assist farmers in the detection of infected plants and recommendation of pesticide for the identified disease.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Twitter-based classification for integrated source data of weather observations
1. IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 12, No. 1, March 2023, pp. 271~283
ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i1.pp271-283 271
Journal homepage: http://ijai.iaescore.com
Twitter-based classification for integrated source data of
weather observations
Kartika Purwandari1,2
, Tjeng Wawan Cenggoro1,2
, Join Wan Chanlyn Sigalingging3
,
Bens Pardamean1,4
1
Bioinformatics and Data Science Research Center, Bina Nusantara University, Jakarta, Indonesia
2
Department of Computer Science, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia
3
Database Center Division of BMKG, Meteorological, Climatological, and Geophysical Agency, Jakarta, Indonesia
4
Department of Computer Science, BINUS Graduate Program-Master of Computer Science Program, Bina Nusantara University,
Jakarta, Indonesia
Article Info ABSTRACT
Article history:
Received Nov 15, 2021
Revised Jul 13, 2022
Accepted Aug 11, 2022
Meteorology and weather forecasting are crucial for predicting future
climate conditions. Forecasts can be helpful when they provide information
that can assist people in making better decisions. People today use big data
to analyze social media information accurately, including those who rely on
the weather forecast. Recent years have seen the widespread use of machine
learning and deep learning for managing messages on social media sites like
Twitter. In this study, authors analyzed weather-related text in Indonesia
based on the searches made on Twitter. A total of three machine learning
algorithms were examined: support vector machine (SVM), multinomial
logistic regression (MLR), and multinomial Naive Bayes (MNB), as well as
the pretrained bidirectional encoder representations of transformers (BERT),
which was fine-tuned over multiple layers to ensure effective classification.
The accuracy of the BERT model, calculated using the F1-score of 99%, was
higher than that of any other machine learning method. Those results have
been incorporated into a web-based weather information system. The
classification result was mapped using Esri Maps application programming
interface (API) based on the geolocation of the data.
Keywords:
Classification
Deep learning
Geolocation
Machine learning
Natural language processing
Transfer learning
Weather
This is an open access article under the CC BY-SA license.
Corresponding Author:
Kartika Purwandari
Bioinformatics and Data Science Research Center, Bina Nusantara University
Jakarta, Indonesia
Email: kartika.purwandari@binus.edu
1. INTRODUCTION
Indonesia has a sea area of 6.22% of its relative area. The Indonesian territory is therefore
characterized by a marine climate [1]. Global warming has led to the change of climate, especially during the
dry and rainy seasons. The dry season lasts longer since it lasts longer, whereas the rainy season is shorter
and occurs at a different time [2], [3]. The characteristics of multiple physical mechanisms and the dynamic
nature of rainfall make it difficult to determine its consistency [4]. Intergovernmental Panel on Climate
Change (IPCC) points out that climate change will require adaptation to environmental, social, and economic
factors. Climate often changes in Indonesia because of its tropical location. Government mandates providing
real-time weather data to support community activities [5], [6].
The advancements in technology have already led to progress in disseminating information; most of
the information that the community receives comes from social media [7]. The government distributes
publications in a variety of ways to meet the information needs of the public. Furthermore, the public aims to
2. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283
272
stay informed about what happens around them, especially in relation to relevant events [8]. Twitter is used
by people worldwide to access different types of information, including all kinds of information on Twitter.
Monitoring topics and events is made easier with a structured combination of search parameters on a Twitter
channel. We implemented the geolocation by using available application programming interface (APIs) and
web services. Using existing APIs, location-specific terms were detected in a tweet. Social media platforms
are continually generating and delivering information in real-time from various sources to users. Topics,
hashtags, geographic location, language are extracted from tweets. In addition to scraping followers, likes,
and retweets, a Python package called Twitter intellegence tools (Twint) allows users to identify their
followers.
A variety of Twitter accounts, mostly related to information, have emerged all over the world in the
last few years, most notably in Indonesia [9], [10]. The platform can be used to track public discussions about
several issues that have been shared via Twitter. Data and information from Twitter have been used for
classification tasks in a number of projects [11], [12]. Using the K-nearest neighbor (KNN) algorithm, a
potential company's employees can be identified by their personalities. KNN identified the Myers-Briggs
type indicator (MBTI) categories based on character classifications for potential employees from tweets [13].
Deep learning enhances the performance of various fields. Among the data types covered are images
[14], [15], time series data [16], sounds [17], and text [18], [19]. Due to its time requirements and costs,
bidirectional encoder representations from transformers (BERT) presents a challenge when used to classify
large datasets, but it is generally still used because it is relatively inexpensive to train. Thus, the author used
the BERT algorithm, which can only learn from datasets containing at least 256 characters [20]. In this study,
we investigated whether sentiment analysis in texts can be classified using BERT-base. Using the Pontiki
dataset, known as the laptop dataset [21], BERT, developed by Alexander Rietzler and fine-tuned with
several layers, has been successful in detecting sentiment. A machine learning classification method was
developed by the author using the support vector machine (SVM) technique before the BERT method, which
provided 93% accuracy. As a result, other machine learning algorithms, such as multinomial Naive Bayes
(MNB) and multinomial logistic regression (MLR), did not achieve highly accurate predictions when applied
to Twitter data about weather conditions [22].
Data collected by the Meteorology, Climatology, and Geophysics Agency (BMKG) can be obtained
from a number of sources. In Figure 1, it can be seen that the BMKG collects data and integrates them with
each other to provide information on meteorology, climatology, and geophysics. The integration capabilities
of BMKG can be enhanced by implementing a big data system that integrates multiple data sources. The first
step to gathering weather data is to use automatic surface air instruments like automatic weather stations
(AWS) and automatic rain gauges (ARG). An ARG is an instrument that measures rainfall. The two methods
of recording data using this tool are manually (non-recording) and automatically (self-recording). In addition
to rainfall information, weather forecasts also require data such as temperature, wind speed, and air humidity.
Data can be obtained from AWS.
Figure 1. Global observing system on meteorology, climatology, and geophysics
3. Int J Artif Intell ISSN: 2252-8938
Twitter-based classification for integrated source data of weather observations (Kartika Purwandari)
273
The national weather service (NWS) forecasts and issues warnings for weather and hydrologic
conditions in the United States, its territories, and adjacent waters and oceans, for the purpose of protecting
lives and property and enhancing the nation's economy. As a complementary service, the NWS delivers
Twitter feeds as a means of enhancing the reach of its information. In addition to disseminating
environmental information, NWS will engage in outreach and education to increase awareness of weather
conditions.
In this paper, we propose a machine learning method to integrate real-time weather data about
Indonesia to support data diversity. Data from Twitter is used as the basis for this machine learning process.
According to the Twitter location data, the crawled data is geolocated and entered into the database. The
paper is organized firstly how weather information was collected using Twitter, followerd by a description of
the methodology used to analyze the data and summarizes the results and discusses them in detail.
2. METHOD
A Twitter framework for providing weather information can be seen in Figure 2. Figure 2 illustrates
how data is stored in a database and reported in real-time to netizens through the android application. In the
text preprocessing phase, uniform resource locators (URLs) are removed and unused words are eliminated,
including stop words in the Indonesian language. Special characters are also removed. Authors determine the
class based on the label generated during the training process and weather consultant in the classification
phase. Geolocation filling is done based on the name of the district or city aforementioned inside a tweet.
Figure 2. Integrated source data for the weather information system
2.1. Dataset
GetOldTweets3 was used to crawl tweets for the dataset. It is a Python 3 library for retrieving old
tweets. According to Table 1, tweets were crawled from January to May 2019 that contained keywords
derived from Indonesian (which were already translated into English). An Indonesian tweet is marked by a
code of the language 'id'. A total of 506 tweets have been labeled. This experiment divided 20% of the total
dataset into testing and control groups according to the Pareto ratio. We obtained 404 images for training and
102 images for testing from this process.
4. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283
274
Table 1. Keywords for each class
Class Keywords
Cloudy Thick clouds, cloudy clouds, dark clouds
Sunny The body gets wet of sweat, bright light, ill, clear, hot
Rainy Rainy, rain, rainfall
Heavy rain Lightning, thunderstorm, thunder, soaking wet
Light rain Light rain, spatter, drizzle, spatter
In all, five classes of data were analyzed, namely "light rain", "heavy rain", "rainy", "sunny" and
"cloudy". Figure 3 summarizes how these labels were distributed. Due to the similarity of the keywords for
"rainy", "heavy rain" and "light rain", the amount of data in these classes is lower than the amount of data in
the "sunny" class.
Figure 3. Data distribution
2.2. Pre-processing
Lowercase is applied to tweets. As shown in Figure 4, the following are removed from content:
excessive newline characters and whitespace, URLs, Twitter and Instagram formatting, and non-American
Standard Code for Information Interchange (ASCII) letters. Tweets containing emojis are translated using
116 emoji symbols formed in a .txt file, while tweets that contain slang words are transformed by 2,879 slang
words written in a text file. Tokens are added to the beginning of each BERT text for classification [CLS].
Figure 4. Comparison of tweet before and after text preprocessing
2.3. Feature extraction for machine learning algorithms
TfidfVectorizer is a machine learning algorithm that is based on term frequency–inverse document
frequency (TF-IDF) and specifically processes words in a document [23]. By using this method, the inverse
document frequency of a word (term) can be tracked [24]. The TF is calculated by counting how many words
are in the word. The IDF method answers this question by determining which side of a document has more
5. Int J Artif Intell ISSN: 2252-8938
Twitter-based classification for integrated source data of weather observations (Kartika Purwandari)
275
weight. In other words, TF and IDF play a preliminary round match and determine the winner. To calculate
the weight (W) of each document against keywords, the IDF TF algorithm uses the (1),
𝑊𝑑𝑡 = 𝑇𝑓𝑑𝑡 ∗ 𝐼𝑑𝑓𝑑𝑡 (1)
𝑊𝑑𝑡 = the weight of document d against word t.
𝑇𝑓𝑑𝑡 = the frequency of occurrence of term 𝑖 in document 𝑗 divided by the total terms in document 𝑗,
explained in the (2),
𝑇𝑓𝑑𝑡 =
𝑓𝑑(𝑖)
𝑓𝑑(𝑗)
(2)
𝐼𝑑𝑓𝑑𝑡 is the function to reduce the weight of a term if its appearance is scattered throughout the document as
spelled out in (3).
𝐼𝑑𝑓𝑑𝑡 = 𝑙𝑜𝑔 (
𝑁
𝑑𝑓𝑡+1
) (3)
𝑑𝑓𝑡 = |{𝑑 ∈ 𝐷 ∶ 𝑡 ∈ 𝑑}| is the number of documents containing term t and 𝑁 is the total number of
documents in the corpus, N = |D|. Adding 1 to avoid dividing by 0 if 𝑑𝑓𝑡 is not present in the corpus [25].
2.4. Classification method based on machine learning approaches
SVMs are supervised learning classification methods. In the SVM method, the original training data is
mapped into a higher dimension using nonlinear mappings. The goal of this technique is to find the best separator
function (hyperplane) to separate pairs of objects among all possible functions. In general, the best hyperplane
can be defined as a line connecting two classes of objects. Using an SVM, the best equivalent hyperplane is
constructed by maximizing the margins or distances between two different sets of classes [26], [27].
Naive Bayes with multinomial structures is a development of the Naive Bayes method which uses
Naive Bayes to determine a probability value as to how often a word appears in a sentence. It affects the
probability value according to the frequency with which that word appears in a sentence. However, there is a
problem if a word is not included in any class in the Naive Bayes multinomial method. Probabilities 0 or zero
are affected by this [28].
The scikit-learn Python package provides the Laplace smoothing method that avoids zero
probabilities. As long as the 𝛼 value is greater than 0, this method works by adding the 𝛼 value. This value is
set to 1.
𝑃(𝑐𝑗) =
𝑐𝑜𝑢𝑛𝑡(𝑤𝑖,𝑐𝑗)+𝛼
𝑐𝑜𝑢𝑛𝑡(𝑐𝑗)+|𝑉|
(4)
where 𝑃(𝑐𝑗) is the probability value of word 𝑖 against class 𝑗, count (𝑤𝑖, 𝑐𝑗) is the value of occurrence of
word 𝑖 in class 𝑗, and α is the value of Laplace smoothing (default as 𝛼 = 1). Then, count (𝑐𝑗) is the number
of members of class 𝑗 and |𝑉| is the number of members of the entire class without doubling.
In machine learning, MLR, also called softmax regression, is a method of separating classes of
feature vectors from several classes. This method generalizes the logistic regression classification scheme for
solving multiclass problems [29]. The main difference between the methods is the activation function. In
MLR, sigmoid activation functions are used, while softmax activation functions are used in logistic
regression. The scikit-learn logistic regression package in Python can be set up to use MLR by selecting
multi-class as "multinomial".
2.5. BERT as a deep learning method
BERT is a two-way method based on a transformer architecture, replacing long short-term memory
(LSTM) and gated recurrent units (GRU) in a sequential way with an attention approach that is faster.
Additionally, the method was pre-trained to perform two unsupervised tasks, including modeling the masked
language and predicting the next sentence. The pre-trained BERT method is utilized to perform downstream
tasks like sentiment classification, intent detection, and question answering [30].
Documents may be classified according to multiple labels or classes simultaneously and
independently, as indicated by multi-label classification. The multi-label classification has numerous real-
world applications, such as categorizing businesses or assigning multiple genres to a film [31]. It can be used
in customer service to determine multiple intentions for a customer email [32].
6. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283
276
BERT-Base has a vocabulary of 30,522 words. Tokenization consists of splitting input text into
tokens within a vocabulary. WordPiece tokenization is used by BERT for words that are not in its
vocabulary. The outside words are gradually subdivided into sub-words and then represented by groups of
sub-words [33].
2.6. Fine-tuning BERT
BERT is a network architecture that has been trained using large datasets from a wide variety of
articles in multiple languages. Consequently, rather than train the BERT layer, which already has very good
weights, researchers need to fine-tune the BERT layer for text classification [34]. Figure 5 depicts the input
layer of the BERT method used to feed pre-processed tweets, followed by one dense layer employing tanh
activation function, two dropout layers 0.5, and one output layer employing softmax activation function and
cross-entropy loss. BERT pre-trains are fine-tuned by adding two dropouts (0.5), one dense layer, and one
output layer. It is intended to stop overfitting by adding two dropout layers. Overfitting is when a model is
too successful as a result of the training process, but it has the disadvantage that it is too dependent on
training data, so the results are incorrect when new data is provided for classification [35]. In this model, 10
epochs were used with batch sizes of 5 and a sequence length based on the length of the dictionary from a
tweet, which is the maximum for the previously trained model. AdamW optimizer was used with a learning
rate of 3 e-5.
Figure 5. BERT fine-tuning model
2.7. Evaluation metrics
Confusion matrices are commonly used for calculating accuracy. The confusion matrix provides
information on the comparison between the results generated by the model (system) and those actually
7. Int J Artif Intell ISSN: 2252-8938
Twitter-based classification for integrated source data of weather observations (Kartika Purwandari)
277
generated [36]. As shown in Table 2, there are 4 terms representing the classification results. TP, TN, FP, and
FN represent true positive, true negative, false positive, and false negative, respectively. A test is conducted
on F1-score, recall, and precision values in order to determine the accuracy of the results. Models are
evaluated based on their F1-score, as they perform well on imbalanced datasets [37]. In (5) and (6), F1-score
can be calculated for each class that offers the same weighting for recall and precision.
𝐹1 − 𝑠𝑐𝑜𝑟𝑒 =
(2∗(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙))
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙)
∗ 100% (5)
There is a weighted F1-score in which recall and precision can be assigned different weightings.
𝐹𝛽 =
(1+𝛽2)∗(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙)
((𝛽2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)+𝑅𝑒𝑐𝑎𝑙𝑙))
∗ 100% (6)
β reflects how much recall is more important than precision. The value of β is 2 if the recall is twice as
significant as precision [38].
Table 2. Confusion matrix
Actual class
Relevant Non-Relevant
Predicted class Retrieved Correct result
True positive (TP)
Unexpected result
False positive (FP)
Not retrieved Missing result
False negative (FN)
Correct absence of result
True negative (TN)
The precision (7) indicates the system's ability to find the most relevant documents and is defined as
the percentage of documents located and relevant to the query. A recall (8) measures the ability of the system
to locate all relevant items from a document collection and is defined as the percentage of documents
relevant to a query. The accuracy of the (9) is a comparison between correctly identified cases and the
number of identified cases, compared to the error rate (10) on incorrectly identified cases.
TP = The number of correct predictions from relevant data.
FP = The number of incorrect predictions from irrelevant data.
FN = The number of incorrect predictions from irrelevant data.
TN = The number of correct predictions from relevant data.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
(𝑇𝑃+𝐹𝑃)
∗ 100% (7)
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
(𝑇𝑃+𝐹𝑁)
∗ 100% (8)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
(𝑇𝑃+𝑇𝑁)
(𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁)
∗ 100% (9)
𝐸𝑟𝑟𝑜𝑟 𝑅𝑎𝑡𝑒 =
(𝐹𝑁+𝑇𝑁)
(𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁)
∗ 100% (10)
2.8. Database management and geolocation filling
Purwandari et al., developed a database management system that manages weather data for
Indonesia. Three users are involved in this system: netizens, forecasters, and data engineers. A source of data
from netizens is collected using tweets from Twitter, forecasters rely on data from BMKG sensors throughout
Indonesia, and all data is analyzed by data engineers before being reported to the public. A data dictionary,
entity-relationship diagrams, and use cases have been used to visualize all completed data [39].
In spite of this, less than 1% of the crawled tweet posts include geolocation information. Therefore,
it is very important to ensure accurate predictions of the tweet posts for non-geo-tagged tweets when
analyzing data in different domains. Moreover, we can alter it by modifying the city district database by
adding district/city aliases to reflect the crawled tweets. Using this method, tweets from remote areas of
Indonesia can still be displayed with the longitude and latitude even if the global positioning system (GPS) is
not turned on. The content of tweets and metadata information can be used to identify a user's location even
if Twitter has access to this information. In such a case, third parties will have to use other sources to identify
the geolocation of a user or tweet.
8. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283
278
3. RESULTS AND DISCUSSION
3.1. Evaluation of machine learning algorithms
This study compared three machine learning methods. The results are shown in Table 3. According
to this table, SVM can successfully classify Twitter texts about the weather with a recall value of 87.3% and
a precision value of 90.6%. The MLR method yields 83.3% recall and 90.3% precision. With the MNB
method, the recall value is 73.6% and the precision is 86.3%. SVM provided the most accurate results,
followed by MLR, then MNB, and also displayed the lowest error rate in comparison. Especially on Twitter
about weather documents, SVM has proven to be effective in text classification. This is evident from the
results of the test on the weather text classification, which showed that the recall value was lower than the
precision value. Therefore, the precision level in this text classification was found to be effective. SVM
became popular due to its accuracy and recall; this was confirmed during the test of the method.
Table 3. Classifier evaluation using machine learning approaches (%)
Model Precision Recall F1-score Accuracy Error rate
SVM 90.6 87.3 88.1 87.3 12.7
MLR 90.3 83.3 85.6 83.3 16.7
MNB 86.3 73.6 77.5 73.5 26.5
An understanding of machine learning models requires a confusion matrix. The columns of the
confusion matrix represent instances of the prediction class, whereas the rows represent instances of the
actual class. The confusion matrix results illustrated in Figures 6(a) to 6(c) support the aforementioned
results. Figure 6(a) illustrates the confusion matrix results from using SVM. Based on Figure 6(b), the
method of MLR is also quite efficient for classifying weather-related tweets on Twitter. Figure 6(c) shows
that the results of the MNB method are poor for the "light rain" class, since no TPs are generated in this class
as indicated by the confusion matrix.
(a) (b)
(c)
Figure 6. The confusion matrix results illustrated in (a) SVM, (b) MLR, and (c) MNB
9. Int J Artif Intell ISSN: 2252-8938
Twitter-based classification for integrated source data of weather observations (Kartika Purwandari)
279
3.2. Evaluation of BERT method
BERT method confusion matrix is depicted in Figure 7. It was concluded that cloudy, sunny, and
light rain classes are able to perform classification well, meaning that the three classes have the exact same
number of TP results as the actual number of sentences. There was one data point predicted as 'heavy rain'
(FN). As can be seen, there is 1 prediction in the "rainy" class FP for the "heavy rain" class. Correct
predictions are located in the diagonal figures, so visually it is obvious that unexpected predictions lie outside
the diagonal confusion matrix. As shown in Table 4, the results of precision, recall, F1-score, accuracy, and
error rate using the BERT method are 99.1%, 99%, 99%, 99%, and 1% respectively.
Figure 7. Confusion matrix of BERT model
Table 4. Classifier evaluation using BERT method (%)
Model Precision Recall F1-score Accuracy Error rate
BERT 99.1 99 99 99 1
Table 5 provides precision, recall, and F1-score from each class demonstrating the results. BERT
model generated a maximum F1-Score for "cloudy", "sunny", and "light rain" classes, and it was worked on
as well for "rainy" and "heavy rain" classes. A model's output results can be ensured by using experimental
results that have been used to analyze training data loss and validation. In Figure 8, it can be seen that loss
from training data is often constant, increasing from epoch 4 to epoch 5, whereas loss from data validation is
more unstable, increasing from epoch 4 to epoch 5. Validation losses tend to produce unreliable results due to
the random input data each epoch receives. It can be said that the BERT model is very robust and stable.
Unfortunately, due to the imbalanced distribution of data in Figure 3, overfitting occurred as a result of
training data.
Table 5. Evaluation metrics for individual classes using BERT model (%)
Class Precision Recall F1-score
Cloudy 100 100 100
Sunny 100 100 100
Rainy 100 96.2 98.1
Heavy rain 89 100 94.2
Light rain 100 100 100
Additionally, Figure 9 displays the F1-Scores for each epoch along with Figure 8. In spite of the
decrease in yields from epoch 8 to epoch 9, it can be shown that the yield increases with each succeeding
epoch. Every epoch contains five batches, and each batch must complete its task before the weight is
changed. Weights are updated based on the estimated sum of losses. Using the convolutional output with
BERT layers, the loss function is computed. Weights with the best quality will be saved for testing after the
epoch has ended.
3.3. Web-based weather report
Following the BERT model, the next step would be to fill out the empty geolocations in the tweet.
Geolocations are determined based on latitude and longitude coordinates of the cities and districts from
Actual
10. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283
280
Central Agency on Statistics (BPS) which has been integrated into the database. This is a necessary step
before integrating weather information on a website. Once the geolocation point has been filled, the latitude
and longitude points are plotted into Esri Maps. Example of plotting weather reports submitted by netizens
into Esri Maps, shown in Figure 10. As can be seen in the report, the first geolocation shows the word "South
Jakarta". Consequently, the tweet is positioned at coordinates 6.2615° S, 106.8106° E, which means South
Jakarta coordinates.
Figure 8. The plot of model loss on training and
validation datasets
Figure 9. The plot of F1-score result on validation
datasets
Figure 10. Example of tweets integrated into a geographic information system (GIS) with weather
classification and geolocation plotting
3.4. Discussion
This study focused on the comparison of basic machine learning models (SVM, MLR, and MNB)
and deep learning models (BERT) for classification texts. The best classification results aim to be applied to
a website-based information system. By using a machine learning model, maximum results have been given,
especially in the SVM model. Classification results are compared primarily to advanced models like the
BERT transformer and classical natural language processing. In recent years, BERT has achieved state-of-
Loss
Value
11. Int J Artif Intell ISSN: 2252-8938
Twitter-based classification for integrated source data of weather observations (Kartika Purwandari)
281
the-art results in a wide range of natural languange processing (NLP) tasks [40]. The application of BERT
transfer learning using a text dataset on weather has proven to be able to provide good results. BERT is a big
neural network architecture, with a huge number of parameters, that can range from 100 million to over 300
million. So, training a BERT model from scratch on a small dataset would result in overfitting [41].
A result of loss validation and training shown in Figure 8 shows evidence of overfitting. Obviously,
this can happen when the model used for training is too focused on one training dataset, and so it cannot
predict correctly if given another similar dataset [40], [42]. Figure 3 shows the distribution of data for certain
training datasets. Twitter's data regarding sunny weather has the highest number in the period between
January and May 2019. The dry season begins in April and May. Therefore, March is the transitional period
between the rainy and dry seasons. After that, cloudy weather and heavy rain almost equal each other.
According to BMKG data, Indonesia enters its rainy season only in January or early February of 2019.
Despite the similarity in words between heavy rain, light rain, and rain in the tweets, heavy rain, light rain,
and rainy show a small comparison. Because of the ambiguity in the labels, it is difficult to determine which
category the tweets belong to. For example: “There is a high probability that rain will drench the entire DKI
Jakarta area today. We are expecting light rain to heavy rain in the morning”.
In filling out the geolocation, the ambiguity of mentioning the name of the district/city in the
sentence tweet also affects the plotting results on Esri Maps. The diversity of ethnic groups in Indonesia
causes the use of regional languages to be used in everyday language. According to data from the BPS, in
Indonesia there are 1,340 tribes or ethnic groups. Meanwhile, according to the language development and
development agency, the number of regional languages in Indonesia was 646 at the beginning of 2017. The
similarity between regional languages and regional names in a place affects geolocation filling. This causes
the plotting on Esri Maps to not match the area names mentioned in the tweet. For example, the word "karo"
can be translated as a regional language from the Central Java Region, and also there is name of district in
North Sumatra called "Karo". In this case, the text will be plotted at coordinates 3.1053° N, 98.2651° E on
Esri Maps which shows the “Karo” district location.
4. CONCLUSION
The use of Twitter has been proved an effective tool for opinion mining and polling, especially in
predicting weather conditions. BERT-based pretrained model is effective for classifying texts from Twitter,
based on the dataset used. Identifying data sets before modeling algorithms for different classifications or
scenarios is imperative. In addition to categorizing short sentences, BERT-base is useful for other purposes.
This model has a yield of 99%. In comparison to automatic classification algorithms (SVM, MNB, and
MLR), this accuracy proves to be very good. Based on it, the sentences after the BERT model have been used
for geolocation filling tasks from mentioning the name of the district/city in tweets. Tweets are mapped into
Esri Maps according to the geolocation points. For future works, the authors will continue mining and
analyzing more Twitter data using smart crawling to get a more accurate prediction about weather conditions
in Indonesia.
ACKNOWLEDGEMENTS
Author thanks Faiz Ayyas Munawwar for providing us with the illustration in this paper.
REFERENCES
[1] Y. Marini and K. T. Setiawan, “Indonesia sea surface temperature from TRMM Microwave Imaging (TMI) sensor,” IOP
Conference Series: Earth and Environmental Science, vol. 149, no. 1, 2018, doi: 10.1088/1755-1315/149/1/012055.
[2] K. E. Trenberth, “Changes in precipitation with climate change,” Climate Research, vol. 47, no. 1–2, pp. 123–138, 2011, doi:
10.3354/cr00953.
[3] M. R. Mozell and L. Thachn, “The impact of climate change on the global wine industry: Challenges & solutions,” Wine
Economics and Policy, vol. 3, no. 2, pp. 81–89, 2014, doi: 10.1016/j.wep.2014.08.001.
[4] R. E. Caraka, S. A. Bakar, M. Tahmid, H. Yasin, and I. D. Kurniawan, “Neurocomputing fundamental climate analysis,”
Telkomnika (Telecommunication Computing Electronics and Control), vol. 17, no. 4, pp. 1818–1827, 2019, doi:
10.12928/TELKOMNIKA.v17i4.11788.
[5] R. Caraka, R. C. Chen, T. Toharudin, M. Tahmid, B. Pardamean, and R. M. Putra, “Evaluation performance of SVR genetic
algorithm and hybrid PSO in rainfall forecasting,” ICIC Express Letters, Part B: Applications, vol. 11, no. 7 631, p. 639, 2020,
doi: 10.24507/icicelb.11.07.631.
[6] M. G. De Giorgi, A. Ficarella, and M. Tarantino, “Assessment of the benefits of numerical weather predictions in wind power
forecasting based on statistical methods,” Energy, vol. 36, no. 7, pp. 3968–3978, 2011, doi: 10.1016/j.energy.2011.05.006.
[7] J. Zhuang, T. Mei, S. C. H. Hoi, X. S. Hua, and S. Li, “Modeling social strength in social media community via kernel-based
learning,” MM’11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops, pp. 113–122, 2011, doi:
10.1145/2072298.2072315.
[8] S. Z. Jannah, “Clustering and Visualizing Surabaya Citizen Aspirations by Using Text Mining. Case Study: Media Center
12. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 1, March 2023: 271-283
282
Surabaya,” Institut Teknologi Sepuluh Nopember, 2018, [Online]. Available: https://repository.its.ac.id/58200/.
[9] C. Castillo, M. Mendoza, and B. Poblete, “Information credibility on Twitter,” Proceedings of the 20th International Conference
Companion on World Wide Web, WWW 2011, pp. 675–684, 2011, doi: 10.1145/1963405.1963500.
[10] A. Lenhart, K. Purcell, A. Smith, and K. Zickuhr, “Social Media & Mobile Internet Use among Teens and Young Adults.
Millennials.,” Pew Internet & American Life Project, vol. 01, pp. 1–16, 2010, [Online]. Available:
http://eric.ed.gov/?id=ED525056.
[11] R. Rahutomo, A. Budiarto, K. Purwandari, A. S. Perbangsa, T. W. Cenggoro, and B. Pardamean, “Ten-year compilation of
#savekpk twitter dataset,” Proceedings of 2020 International Conference on Information Management and Technology,
ICIMTech 2020, pp. 185–190, 2020, doi: 10.1109/ICIMTech50083.2020.9211246.
[12] A. Budiarto, R. Rahutomo, H. N. Putra, T. W. Cenggoro, M. F. Kacamarga, and B. Pardamean, “Unsupervised News Topic
Modelling with Doc2Vec and Spherical Clustering,” Procedia Computer Science, vol. 179, pp. 40–46, 2021, doi:
10.1016/j.procs.2020.12.007.
[13] B. Y. Pratama and R. Sarno, “Personality classification based on Twitter text using Naive Bayes, KNN and SVM,” Proceedings
of 2015 International Conference on Data and Software Engineering, ICODSE 2015, pp. 170–174, 2016, doi:
10.1109/ICODSE.2015.7436992.
[14] L. Yung-Hui, Y. Nai-Ning, K. Purwandari, and L. N. Harfiya, “Clinically applicable deep learning for diagnosis of diabetic
retinopathy,” Proceedings - 2019 12th International Conference on Ubi-Media Computing, Ubi-Media 2019, pp. 124–129, 2019,
doi: 10.1109/Ubi-Media.2019.00032.
[15] T. B. Pramono et al., “A Model of Visual Intelligent System for Genus Identification of Fish in the Siluriformes Order,” IOP
Conference Series: Earth and Environmental Science, vol. 794, no. 1, 2021, doi: 10.1088/1755-1315/794/1/012114.
[16] F. E. Gunawan et al., “Multivariate Time-Series Deep Learning for Joint Prediction of Temperature and Relative Humidity in a
Closed Space,” Conference: 2021 International Conference on Computer Science and Computational Intelligence, 2021.
[17] A. A. Hidayat, T. W. Cenggoro, and B. Pardamean, “Convolutional Neural Networks for Scops Owl Sound Classification,”
Procedia Computer Science, vol. 179, pp. 81–87, 2021, doi: 10.1016/j.procs.2020.12.010.
[18] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep Learning Based Text Classification: A
Comprehensive Review,” Apr. 2020, [Online]. Available: http://arxiv.org/abs/2004.03705.
[19] I. Nurlaila, R. Rahutomo, K. Purwandari, and B. Pardamean, “Provoking Tweets by Indonesia Media Twitter in the Initial Month
of Coronavirus Disease Hit,” in 2020 International Conference on Information Management and Technology (ICIMTech), Aug.
2020, pp. 409–414, doi: 10.1109/ICIMTech50083.2020.9211179.
[20] A. Rietzler, S. Stabinger, P. Opitz, and S. Engl, “Adapt or Get Left Behind: Domain Adaptation through BERT Language Model
Finetuning for Aspect-Target Sentiment Classification,” Aug. 2019, [Online]. Available: http://arxiv.org/abs/1908.11860.
[21] M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androutsopoulos, “SemEval-2015 Task 12: Aspect Based
Sentiment Analysis,” SemEval 2015 - 9th International Workshop on Semantic Evaluation, co-located with the 2015 Conference
of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT
2015 - Proceedings, pp. 486–495, 2015, doi: 10.18653/v1/s15-2082.
[22] K. Purwandari, J. W. C. Sigalingging, T. W. Cenggoro, and B. Pardamean, “Multi-class Weather Forecasting from Twitter Using
Machine Learning Aprroaches,” Procedia Computer Science, vol. 179, pp. 47–54, 2021, doi: 10.1016/j.procs.2020.12.006.
[23] B. Komer, J. Bergstra, and C. Eliasmith, “Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn,”
Proceedings of the 13th Python in Science Conference, pp. 32–37, 2014, doi: 10.25080/majora-14bd3278-006.
[24] S. Robertson, “Understanding inverse document frequency: On theoretical arguments for IDF,” Journal of Documentation, vol.
60, no. 5, pp. 503–520, 2004, doi: 10.1108/00220410410560582.
[25] S. Sintia, S. Defit, and G. W. Nurcahyo, “Product Codefication Accuracy With Cosine Similarity And Weighted Term Frequency
And Inverse Document Frequency (TF-IDF),” Journal of Applied Engineering and Technological Science (JAETS), vol. 2, no. 2,
pp. 62–69, 2021, doi: 10.37385/jaets.v2i2.210.
[26] I. Aljarah, A. M. Al-Zoubi, H. Faris, M. A. Hassonah, S. Mirjalili, and H. Saadeh, “Simultaneous Feature Selection and Support
Vector Machine Optimization Using the Grasshopper Optimization Algorithm,” Cognitive Computation, vol. 10, no. 3, pp. 478–
495, 2018, doi: 10.1007/s12559-017-9542-9.
[27] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, and F. Herrera, “An overview of ensemble methods for binary classifiers in
multi-class problems: Experimental study on one-vs-one and one-vs-all schemes,” Pattern Recognition, vol. 44, no. 8, pp. 1761–
1776, 2011, doi: 10.1016/j.patcog.2011.01.017.
[28] B. Heap, M. Bain, W. Wobcke, A. Krzywicki, and S. Schmeidl, “Word Vector Enrichment of Low Frequency Words in the Bag-
of-Words Model for Short Text Multi-class Classification Problems,” 2017, [Online]. Available: http://arxiv.org/abs/1709.05778.
[29] A. Zeggada, F. Melgani, and Y. Bazi, “A Deep Learning Approach to UAV Image Multilabeling,” IEEE Geoscience and Remote
Sensing Letters, vol. 14, no. 5, pp. 694–698, 2017, doi: 10.1109/LGRS.2017.2671922.
[30] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” 2019, [Online]. Available:
http://arxiv.org/abs/1907.11692.
[31] R. B. Mangolin et al., “A multimodal approach for multi-label movie genre classification,” Multimedia Tools and Applications,
vol. 81, no. 14, pp. 19071–19096, 2022, doi: 10.1007/s11042-020-10086-2.
[32] N. Kampani and D. Jhamb, “Analyzing the role of E-CRM in managing customer relations: A critical review of the literature,”
Journal of Critical Reviews, vol. 7, no. 4, pp. 221–226, 2020, doi: 10.31838/jcr.07.04.41.
[33] A. K. B. Singh, M. Guntu, A. R. Bhimireddy, J. W. Gichoya, and S. Purkayastha, “Multi-label natural language processing to
identify diagnosis and procedure codes from MIMIC-III inpatient notes,” Syria Studies, vol. 7, no. 1, pp. 37–72, Mar. 2020, doi:
2003.07507v1.
[34] C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to Fine-Tune BERT for Text Classification?,” Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11856 LNAI, pp. 194–206,
2019, doi: 10.1007/978-3-030-32381-3_16.
[35] J. V. Tu, “Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical
outcomes,” Journal of Clinical Epidemiology, vol. 49, no. 11, pp. 1225–1231, 1996, doi: 10.1016/S0895-4356(96)00002-9.
[36] X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An improved method to construct basic probability assignment based on the
confusion matrix for classification problem,” Information Sciences, vol. 340–341, pp. 250–261, 2016, doi:
10.1016/j.ins.2016.01.033.
[37] Y. S. Aurelio, G. M. de Almeida, C. L. de Castro, and A. P. Braga, “Learning from mbalanced data sets with weighted cross-
entropy function,” Neural Processing Letters, vol. 50, no. 2, pp. 1937–1949, 2019, doi: 10.1007/s11063-018-09977-1.
[38] D. Tran, H. Mac, V. Tong, H. A. Tran, and L. G. Nguyen, “A LSTM based framework for handling multiclass imbalance in DGA
13. Int J Artif Intell ISSN: 2252-8938
Twitter-based classification for integrated source data of weather observations (Kartika Purwandari)
283
botnet detection,” Neurocomputing, vol. 275, pp. 2401–2413, 2018, doi: 10.1016/j.neucom.2017.11.018.
[39] K. Purwandari, A. S. Perbangsa, J. W. C. Sigalingging, A. A. Krisna, S. Anggrayani, and B. Pardamean, “Database management
system design for automatic weather information with twitter data collection,” Proceedings of 2021 International Conference on
Information Management and Technology, ICIMTech 2021, pp. 326–330, 2021, doi: 10.1109/ICIMTech53080.2021.9535009.
[40] S. González-Carvajal and E. C. Garrido-Merchán, “Comparing BERT against traditional machine learning text classification,”
2020, [Online]. Available: http://arxiv.org/abs/2005.13012.
[41] L. Gong, D. He, Z. Li, T. Qin, L. Wang, and T. Y. Liu, “Efficient training of BERT by progressively stacking,” 36th International
Conference on Machine Learning, ICML 2019, vol. 2019-June, pp. 4202–4211, 2019.
[42] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership Inference Attacks Against Machine Learning Models,”
Proceedings - IEEE Symposium on Security and Privacy, pp. 3–18, 2017, doi: 10.1109/SP.2017.41.
BIOGRAPHIES OF AUTHORS
Kartika Purwandari received the bachelor’s degree in Information Technology
from Brawijaya University and the master’s degree in Computer Science from National
Central University Taiwan. She is currently a lecturer at Computer Science Department in
Bina Nusantara University, Jakarta, Indonesia. She is also a lecturer specialist S2 of basic
programming in Bina Nusantara University since December 2021. In the past 2 years ago, she
was become a research assistant at Bioinformatics and Data Science Research Center
(BDSRC) Bina Nusantara University. She has developed programming based on AI and
bioinformatics by joining the colorectal cancer project since she joined BDSRC. Furthermore,
she is also active in participating with AI projects in BDSRC to help in processing data about
lidar, air quality, crowd counting, fishery image, text, and pap smear. She can be contacted at
email: kartika.purwandari@binus.edu.
Tjeng Wawan Cenggoro is an AI researcher whose focus is in the development
of deep learning algorithms for application in computer vision, natural language processing,
and bioinformatics. He has led several research projects that utilize deep learning for
computer vision, which is applied to indoor video analytics and plant phenotyping. He has
published over 20 peer-reviewed publications and reviewed for prestigious journals such as
Scientific Reports and IEEE Access. He also holds 2 copyrights for AI-based video analytics
software. He received his master’s degree in Information Technology from Bina Nusantara
University as well as bachelor’s degree in Information Technology from STMIK Widya Cipta
Dharma. He is also certified instructor at NVIDIA Deep Learning Institute. He can be
contacted at email: wcenggoro@binus.edu.
Join Wan Chanlyn Sigalingging holds a Bachelor of Engineering in Electrical
and Electronics Engineering from Sumatera Utara University, Master of Science in Computer
Science and Information Engineering from National Central University. His research on
master degree has focused on He is currently working with the meteorology, climatology, and
geophysics agency at Indonesia. He is a member of the database department. His research
areas of interest include data engineer, NLP, image processing, artificial intelligent, and
digital signal processing. He can be contacted at email: join.wan.chanlyn@bmkg.go.id.
Dr. Bens Pardamean has over thirty years of global experience in information
technology, bioinformatics, and education, including a strong background in database
systems, computer networks, and quantitative research. His professional experience includes
being a practitioner, researcher, consultant, entrepreneur, and lecturer. His current research
interests are in developing and analyzing genetic data in cancer studies and genome-wide
association studies (GWAS) for agriculture genetic research. After successfully leading the
Bioinformatics Research Interest Group, he currently holds a dual appointment as the Director
of Bioinformatics & Data Science Research Center (BDSRC) and as a Professor of Computer
Science at the University of Bina Nusantara (BINUS) in Jakarta, Indonesia. He earned a
doctoral degree in informative research from the University of Southern California (USC), as
well as a master’s degree in computer education and a bachelor’s degree in computer science
from California State University, Los Angeles. He can be contacted at email:
bpardamean@binus.edu.