A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task.
To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs .
Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.
The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely
Recurrent Neural Network, which are very well suited to temporally evolving data
Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years.
We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good with respect to the two evaluation metrics adopted (BLEU and Human evaluation).
Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site.
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs
1. Industrial and Information
Engineering
Generation of Realistic Navigation Paths for Web Site Testing
using Recurrent Neural Networks and Generative Adversarial
Neural Networks
Silvio Pavanetto and Marco Brambilla
Semantic Web and Linked Open Data Helsinki,
Finland, Online on 9 – 12 June 2020
2. Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
1. Improve products even before the release
2. Generate open high-quality data for research
3. Related work with no focus on high-quality weblog
generation
3.1 Only few open source libraries
3. Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
4. Silvio Pavanetto and Marco Brambilla
Problem Definition
Challenges to be Faced
1. Understand if deep learning algorithms can
generate better weblogs data than statistical
methods
2. Understand what better weblog means
3. Among the various deep learning
techniques, apply GAN (Generative
Adversarial Network) to a new task
5. Silvio Pavanetto and Marco Brambilla
Problem Definition
Roadmap for solving the problem
Pre-process a publicly available weblog
Develop statistical
algorithm
Develop recurrent
neural network
Develop GAN
Evaluate the quality
of the generated data
6. Silvio Pavanetto and Marco Brambilla
Proposed Approach
Pre-processing algorithm
Cleaning
• Remove entries having
response code other than 200
• Remove activities coming
from bots
• Remove no HTML pages
• List of possible entry points
• Navigation pattern using data
mining (Apriori)
• Generation of datasets that
will be used by the other
algorithms
Knowledge extraction
7. Silvio Pavanetto and Marco Brambilla
Proposed Approach
Deep Learning - RNN
Why Recurrent Neural Network?
• Well suited for processing sequential data
8. Silvio Pavanetto and Marco Brambilla
Proposed Approach
Generative Adversarial Network
• New type of neural
network (first in 2014)
with incredible
generation capabilities
• Almost used only in
computer vision
Key concept: Put two neural networks one against the other
in a two-player game
9. Silvio Pavanetto and Marco Brambilla
Proposed Approach
GAN Implementation – Possible Solution
GAN is designed for generating continuous data
Possible solution:
• Generative model treated as an agent of reinforcement learning
(RL)
• The state is composed by the generated URLs so far, and the
action is the next URL to be generated
Reward: The discriminator produces a probability for the
sequence of being real
10. Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
Evaluation Metric: BLEU
BLEU, or Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more
reference translations, or also, is an algorithm for evaluating
the quality of text which has been machine-translated, from
one natural language to another.
11. Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
BLEU is not enough.
Human Evaluation!
• 50 real sequences and 50 generated by the algorithms mixed
• 6 judges are invited to check the 100 sequences
• +1 for the algorithm if the judge is fooled
• +0 point if the judge discovers that the sequence is not real
• Scores are averaged among all the judges
Evaluation game:
12. Silvio Pavanetto and Marco Brambilla
Experiments
Evaluation – Final Comparison
Weblog generation performance comparison
13. Silvio Pavanetto and Marco Brambilla
Conclusions
We proposed a step forward towards automatic production of high-
quality weblog using deep learning techniques, such as recurrent neural
network and generative adversarial neural networks.
Deep learning methods are suitable for weblog generation:
• The GAN is the best algorithm: it outperforms the baseline by:
• 0.2116 with the Human metric
• 0.1432 with the BLEU metric
14. Silvio Pavanetto and Marco Brambilla
Future Work
Integration with Model-Driven approaches useful for visualizing
statistics about weblogs in a graphical way
Addition of more variables in the training of the network that could
improve the quality of the generated weblog
Evaluation with other weblogs, belonging to different websites
Editor's Notes
(like .png, .gif or other file types loaded inside a web page)
(this task and its related issues will be discussed later)
RNN: Artificial neural network (ANN) where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence
In the above diagram, a chunk of neural network, AA, looks at some input xtxt and outputs a value htht. A loop allows information to be passed from one step of the network to the next.
These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.
Consider the sequence generation procedure as a sequential decision-making Process.
Quality is considered to
be the correspondence between a machine’s output and that of a human.
Although it is usually used for evaluating text, we already mentioned that
the task faced in this work could be associated to the text translation, because
of the conceptual similarity between the sequence of pages in a single
navigation session and the sequence of words in a phrase. In fact, every URL
is treated as a unique "word" in the vocabulary, composed of all the pages
of a particular website.
Using this metric, scores are calculated for individual translated segments —
generally sentences — by comparing them with a set of good quality reference
translations. Those scores are then averaged over the whole corpus to
reach an estimate of the translation’s overall quality. Transferring this to our
case, the translated segments are the generated navigation sequences, while
the good quality reference translations correspond to our original dataset:
the NASA weblog.
Humans are good in evaluting this type of data since a weblog is a composition of navigation sequence and every sequence is something that is decided and created by a human.
Quality is considered to
be the correspondence between a machine’s output and that of a human.
Although it is usually used for evaluating text, we already mentioned that
the task faced in this work could be associated to the text translation, because
of the conceptual similarity between the sequence of pages in a single
navigation session and the sequence of words in a phrase. In fact, every URL
is treated as a unique "word" in the vocabulary, composed of all the pages
of a particular website.
Using this metric, scores are calculated for individual translated segments —
generally sentences — by comparing them with a set of good quality reference
translations. Those scores are then averaged over the whole corpus to
reach an estimate of the translation’s overall quality. Transferring this to our
case, the translated segments are the generated navigation sequences, while
the good quality reference translations correspond to our original dataset:
the NASA weblog.