SlideShare a Scribd company logo
1 of 14
Industrial and Information
Engineering
Generation of Realistic Navigation Paths for Web Site Testing
using Recurrent Neural Networks and Generative Adversarial
Neural Networks
Silvio Pavanetto and Marco Brambilla
Semantic Web and Linked Open Data Helsinki,
Finland, Online on 9 – 12 June 2020
Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
1. Improve products even before the release
2. Generate open high-quality data for research
3. Related work with no focus on high-quality weblog
generation
3.1 Only few open source libraries
Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
Silvio Pavanetto and Marco Brambilla
Problem Definition
Challenges to be Faced
1. Understand if deep learning algorithms can
generate better weblogs data than statistical
methods
2. Understand what better weblog means
3. Among the various deep learning
techniques, apply GAN (Generative
Adversarial Network) to a new task
Silvio Pavanetto and Marco Brambilla
Problem Definition
Roadmap for solving the problem
Pre-process a publicly available weblog
Develop statistical
algorithm
Develop recurrent
neural network
Develop GAN
Evaluate the quality
of the generated data
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Pre-processing algorithm
Cleaning
• Remove entries having
response code other than 200
• Remove activities coming
from bots
• Remove no HTML pages
• List of possible entry points
• Navigation pattern using data
mining (Apriori)
• Generation of datasets that
will be used by the other
algorithms
Knowledge extraction
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Deep Learning - RNN
Why Recurrent Neural Network?
• Well suited for processing sequential data
Silvio Pavanetto and Marco Brambilla
Proposed Approach
Generative Adversarial Network
• New type of neural
network (first in 2014)
with incredible
generation capabilities
• Almost used only in
computer vision
Key concept: Put two neural networks one against the other
in a two-player game
Silvio Pavanetto and Marco Brambilla
Proposed Approach
GAN Implementation – Possible Solution
GAN is designed for generating continuous data
Possible solution:
• Generative model treated as an agent of reinforcement learning
(RL)
• The state is composed by the generated URLs so far, and the
action is the next URL to be generated
Reward: The discriminator produces a probability for the
sequence of being real
Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
Evaluation Metric: BLEU
BLEU, or Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more
reference translations, or also, is an algorithm for evaluating
the quality of text which has been machine-translated, from
one natural language to another.
Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
BLEU is not enough.
Human Evaluation!
• 50 real sequences and 50 generated by the algorithms mixed
• 6 judges are invited to check the 100 sequences
• +1 for the algorithm if the judge is fooled
• +0 point if the judge discovers that the sequence is not real
• Scores are averaged among all the judges
Evaluation game:
Silvio Pavanetto and Marco Brambilla
Experiments
Evaluation – Final Comparison
Weblog generation performance comparison
Silvio Pavanetto and Marco Brambilla
Conclusions
We proposed a step forward towards automatic production of high-
quality weblog using deep learning techniques, such as recurrent neural
network and generative adversarial neural networks.
Deep learning methods are suitable for weblog generation:
• The GAN is the best algorithm: it outperforms the baseline by:
• 0.2116 with the Human metric
• 0.1432 with the BLEU metric
Silvio Pavanetto and Marco Brambilla
Future Work
Integration with Model-Driven approaches useful for visualizing
statistics about weblogs in a graphical way
Addition of more variables in the training of the network that could
improve the quality of the generated weblog
Evaluation with other weblogs, belonging to different websites

More Related Content

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs (20)

Speed_Perception_Phase1
Speed_Perception_Phase1Speed_Perception_Phase1
Speed_Perception_Phase1
 
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
Decision Making based on Machine Learning at Outfittery (W-JAX 2017)
 
GOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine LearningGOTO Night: Decision Making Based on Machine Learning
GOTO Night: Decision Making Based on Machine Learning
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 
Amazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOAmazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEO
 
JDO 2019: Data Science for Developers - Matthew Renze
JDO 2019: Data Science for Developers -  Matthew RenzeJDO 2019: Data Science for Developers -  Matthew Renze
JDO 2019: Data Science for Developers - Matthew Renze
 
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
Real User Monitoring: Getting Real Data from Real Users in the Real World - S...
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
 
How to Add Test Automation to your Quality Assurance Toolbelt
How to Add Test Automation to your Quality Assurance ToolbeltHow to Add Test Automation to your Quality Assurance Toolbelt
How to Add Test Automation to your Quality Assurance Toolbelt
 
Entity matching of web offers, from html to similarity score.
Entity matching of web offers, from html to similarity score. Entity matching of web offers, from html to similarity score.
Entity matching of web offers, from html to similarity score.
 
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...
 
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
 
Quoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine LearningQuoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine Learning
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
btNOG 10: Preparing for IPv6 implementation using AI
btNOG 10: Preparing for IPv6 implementation using AIbtNOG 10: Preparing for IPv6 implementation using AI
btNOG 10: Preparing for IPv6 implementation using AI
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
 

More from Marco Brambilla

Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Marco Brambilla
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
Marco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
Marco Brambilla
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
Marco Brambilla
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Marco Brambilla
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
Marco Brambilla
 

More from Marco Brambilla (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
 
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...On the Quest for Changing Knowledge. Capturing emerging entities from social ...
On the Quest for Changing Knowledge. Capturing emerging entities from social ...
 
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
Studying Multicultural Diversity of Cities and Neighborhoods through Social M...
 
Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...Model driven software engineering in practice book - Chapter 9 - Model to tex...
Model driven software engineering in practice book - Chapter 9 - Model to tex...
 
Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...Model driven software engineering in practice book - chapter 7 - Developing y...
Model driven software engineering in practice book - chapter 7 - Developing y...
 

Recently uploaded

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

  • 1. Industrial and Information Engineering Generation of Realistic Navigation Paths for Web Site Testing using Recurrent Neural Networks and Generative Adversarial Neural Networks Silvio Pavanetto and Marco Brambilla Semantic Web and Linked Open Data Helsinki, Finland, Online on 9 – 12 June 2020
  • 2. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation? 1. Improve products even before the release 2. Generate open high-quality data for research 3. Related work with no focus on high-quality weblog generation 3.1 Only few open source libraries
  • 3. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation?
  • 4. Silvio Pavanetto and Marco Brambilla Problem Definition Challenges to be Faced 1. Understand if deep learning algorithms can generate better weblogs data than statistical methods 2. Understand what better weblog means 3. Among the various deep learning techniques, apply GAN (Generative Adversarial Network) to a new task
  • 5. Silvio Pavanetto and Marco Brambilla Problem Definition Roadmap for solving the problem Pre-process a publicly available weblog Develop statistical algorithm Develop recurrent neural network Develop GAN Evaluate the quality of the generated data
  • 6. Silvio Pavanetto and Marco Brambilla Proposed Approach Pre-processing algorithm Cleaning • Remove entries having response code other than 200 • Remove activities coming from bots • Remove no HTML pages • List of possible entry points • Navigation pattern using data mining (Apriori) • Generation of datasets that will be used by the other algorithms Knowledge extraction
  • 7. Silvio Pavanetto and Marco Brambilla Proposed Approach Deep Learning - RNN Why Recurrent Neural Network? • Well suited for processing sequential data
  • 8. Silvio Pavanetto and Marco Brambilla Proposed Approach Generative Adversarial Network • New type of neural network (first in 2014) with incredible generation capabilities • Almost used only in computer vision Key concept: Put two neural networks one against the other in a two-player game
  • 9. Silvio Pavanetto and Marco Brambilla Proposed Approach GAN Implementation – Possible Solution GAN is designed for generating continuous data Possible solution: • Generative model treated as an agent of reinforcement learning (RL) • The state is composed by the generated URLs so far, and the action is the next URL to be generated Reward: The discriminator produces a probability for the sequence of being real
  • 10. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good Evaluation Metric: BLEU BLEU, or Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations, or also, is an algorithm for evaluating the quality of text which has been machine-translated, from one natural language to another.
  • 11. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good BLEU is not enough. Human Evaluation! • 50 real sequences and 50 generated by the algorithms mixed • 6 judges are invited to check the 100 sequences • +1 for the algorithm if the judge is fooled • +0 point if the judge discovers that the sequence is not real • Scores are averaged among all the judges Evaluation game:
  • 12. Silvio Pavanetto and Marco Brambilla Experiments Evaluation – Final Comparison Weblog generation performance comparison
  • 13. Silvio Pavanetto and Marco Brambilla Conclusions We proposed a step forward towards automatic production of high- quality weblog using deep learning techniques, such as recurrent neural network and generative adversarial neural networks. Deep learning methods are suitable for weblog generation: • The GAN is the best algorithm: it outperforms the baseline by: • 0.2116 with the Human metric • 0.1432 with the BLEU metric
  • 14. Silvio Pavanetto and Marco Brambilla Future Work Integration with Model-Driven approaches useful for visualizing statistics about weblogs in a graphical way Addition of more variables in the training of the network that could improve the quality of the generated weblog Evaluation with other weblogs, belonging to different websites

Editor's Notes

  1. (like .png, .gif or other file types loaded inside a web page) (this task and its related issues will be discussed later)
  2. RNN: Artificial neural network (ANN) where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence In the above diagram, a chunk of neural network, AA, looks at some input xtxt and outputs a value htht. A loop allows information to be passed from one step of the network to the next. These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 
  3. Consider the sequence generation procedure as a sequential decision-making Process.
  4. Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.
  5. Humans are good in evaluting this type of data since a weblog is a composition of navigation sequence and every sequence is something that is decided and created by a human. Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.