Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

•Download as PPTX, PDF•

0 likes•1,275 views

A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task. To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs . Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed. The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely Recurrent Neural Network, which are very well suited to temporally evolving data Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years. We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good with respect to the two evaluation metrics adopted (BLEU and Human evaluation). Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site.

Software

Industrial and Information
Engineering
Generation of Realistic Navigation Paths for Web Site Testing
using Recurrent Neural Networks and Generative Adversarial
Neural Networks
Silvio Pavanetto and Marco Brambilla
Semantic Web and Linked Open Data Helsinki,
Finland, Online on 9 – 12 June 2020

Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?
1. Improve products even before the release
2. Generate open high-quality data for research
3. Related work with no focus on high-quality weblog
generation
3.1 Only few open source libraries

Silvio Pavanetto and Marco Brambilla
Introduction and Motivations
Why weblog generation?

Silvio Pavanetto and Marco Brambilla
Problem Definition
Challenges to be Faced
1. Understand if deep learning algorithms can
generate better weblogs data than statistical
methods
2. Understand what better weblog means
3. Among the various deep learning
techniques, apply GAN (Generative
Adversarial Network) to a new task

Silvio Pavanetto and Marco Brambilla
Problem Definition
Roadmap for solving the problem
Pre-process a publicly available weblog
Develop statistical
algorithm
Develop recurrent
neural network
Develop GAN
Evaluate the quality
of the generated data

Silvio Pavanetto and Marco Brambilla
Proposed Approach
Pre-processing algorithm
Cleaning
• Remove entries having
response code other than 200
• Remove activities coming
from bots
• Remove no HTML pages
• List of possible entry points
• Navigation pattern using data
mining (Apriori)
• Generation of datasets that
will be used by the other
algorithms
Knowledge extraction

Silvio Pavanetto and Marco Brambilla
Proposed Approach
Deep Learning - RNN
Why Recurrent Neural Network?
• Well suited for processing sequential data

Silvio Pavanetto and Marco Brambilla
Proposed Approach
Generative Adversarial Network
• New type of neural
network (first in 2014)
with incredible
generation capabilities
• Almost used only in
computer vision
Key concept: Put two neural networks one against the other
in a two-player game

Silvio Pavanetto and Marco Brambilla
Proposed Approach
GAN Implementation – Possible Solution
GAN is designed for generating continuous data
Possible solution:
• Generative model treated as an agent of reinforcement learning
(RL)
• The state is composed by the generated URLs so far, and the
action is the next URL to be generated
Reward: The discriminator produces a probability for the
sequence of being real

Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
Evaluation Metric: BLEU
BLEU, or Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more
reference translations, or also, is an algorithm for evaluating
the quality of text which has been machine-translated, from
one natural language to another.

Silvio Pavanetto and Marco Brambilla
Experiments
Understand if a weblog is good
BLEU is not enough.
Human Evaluation!
• 50 real sequences and 50 generated by the algorithms mixed
• 6 judges are invited to check the 100 sequences
• +1 for the algorithm if the judge is fooled
• +0 point if the judge discovers that the sequence is not real
• Scores are averaged among all the judges
Evaluation game:

Silvio Pavanetto and Marco Brambilla
Experiments
Evaluation – Final Comparison
Weblog generation performance comparison

Silvio Pavanetto and Marco Brambilla
Conclusions
We proposed a step forward towards automatic production of high-
quality weblog using deep learning techniques, such as recurrent neural
network and generative adversarial neural networks.
Deep learning methods are suitable for weblog generation:
• The GAN is the best algorithm: it outperforms the baseline by:
• 0.2116 with the Human metric
• 0.1432 with the BLEU metric

Silvio Pavanetto and Marco Brambilla
Future Work
Integration with Model-Driven approaches useful for visualizing
statistics about weblogs in a graphical way
Addition of more variables in the training of the network that could
improve the quality of the generated weblog
Evaluation with other weblogs, belonging to different websites

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Speed_Perception_Phase1

pahammad

Outfittery’s mission is to provide relevant fashion to men. In the past it was our stylists that put together the best outfits for our customers. But since about a year ago we started to rely more on intelligent algorithms to augment our human experts. This transition to become a data driven company has left its marks on our IT landscape: In the beginning we just did simple A/B tests. Then we wanted to use more complex logic so we added a generic data enrichment layer.Later we also provided easy configurability to steer processes.And this in turn enabled us to orchestrate our machine learning algorithms as self contained Docker containers within a Kubernetes cluster. All in all it’s a nice setup that we are pretty happy with. It then really took us some time to realise that we actually had built a delivery platform to deliver just any pure function that our data scientists come up with – directly into our microservices landscape. We just now started to use it that way; we just put their R&D experiments directly into production…  This talk will guide you through this journey, explain how this platform is built, and what we do with it.

Decision Making based on Machine Learning at Outfittery (W-JAX 2017)

OUTFITTERY

Outfittery's mission is to provide relevant fashion to men. In the past it was our stylists that put together the best outfits for our customers. But since about a year ago we started to rely more on intelligent algorithms to augment our human experts. This transition to become a data driven company has left its marks on our IT landscape: In the beginning we just did simple A/B tests. Then we wanted to use more complex logic so we added a generic data enrichment layer. Later we also provided easy configurability to steer processes. And this in turn enabled us to orchestrate our machine learning algorithms as self contained Docker containers within a Kubernetes cluster. All in all it's a nice setup that we are pretty happy with. It then really took us some time to realise that we actually had built a delivery platform to deliver just any pure function that our data scientists come up with - directly into our microservices landscape. We just now started to use it that way; we just put their R&D experiments directly into production... :-) This talk will guide you through this journey, explain how this platform is built, and what we do with it.

GOTO Night: Decision Making Based on Machine Learning

OUTFITTERY

Software Analytics: Data Analytics for Software Engineering

Tao Xie

Outfittery's mission is to provide relevant fashion to men. In the past it was our stylists that put together the best outfits for our customers. But since about a year ago we started to rely more on intelligent algorithms to augment our human experts. This transition to become a data driven company has left its marks on our IT landscape: In the beginning we just did simple A/B tests. Then we wanted to use more complex logic so we added a generic data enrichment layer. Later we also provided easy configurability to steer processes. And this in turn enabled us to orchestrate our machine learning algorithms as self contained Docker containers within a Kubernetes cluster. All in all it's a nice setup that we are pretty happy with. It then really took us some time to realise that we actually had built a delivery platform to deliver just any pure function that our data scientists come up with - directly into our microservice landscape. We just now started to use it that way; we just put their R&D experiments directly into production... :-) This talk will guide you through this journey, explain how this platform is built, and what we do with it.

How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...

OUTFITTERY

Innovate Better Through Machine data Analytics

Hal Rottenberg

Amazon Search Summit - the need for split testing in SEO

Will Critchlow

Data science is revolutionizing the world around us. We’re incorporating artificial intelligence, machine learning, and data-driven decision making into all aspects of business. However, many software developers have yet to learn how to leverage these practices to create better software. In this presentation, we’ll learn how expert developers are using data science to create better software. We’ll learn how to use data analytics, machine learning, and anticipatory design to create more intelligent software. In addition, we’ll learn how to use data from our dev-ops pipeline to improve our software development practices.

JDO 2019: Data Science for Developers - Matthew Renze

PROIDEA

Improvements to user experience translate directly to real business metrics and the bottom line. To guide the business to making wise choices on user experience, you need an accurate picture of site performance for real users. In this talk, Steve Lerner will describe how eBay’s performance monitoring strategy has evolved, how the insights gained from real user monitoring have impacted eBay’s business, and some of the considerations that have shaped their in house implementation of Real User Monitoring to serve eBay’s massive global scale. See Steve Lerner's Edge Presentation: http://www.akamai.com/html/custconf/edgetv-commerce.html#real-user-monitoring The Akamai Edge Conference is a gathering of the industry revolutionaries who are committed to creating leading edge experiences, realizing the full potential of what is possible in a Faster Forward World. From customer innovation stories, industry panels, technical labs, partner and government forums to Web security and developers' tracks, there’s something for everyone at Edge 2013. Learn more at http://www.akamai.com/edge

Real User Monitoring: Getting Real Data from Real Users in the Real World - S...

Akamai Technologies

How Data Science can boost your SEO ?

Vincent Terrasi

SQA job postings are still in abundance, but it is rare to find one that does not include some form of test automation pedigree. Brett will present the topic and then lead the discussion as we explore the various paths to building your test automation acumen, and learn how to add this valuable skill-set to your resume. If you are already an SQA with test automation experience we encourage you to participate and bring your learning forward and into the discussion where we will compare and contrast Computer Science degrees, Code Camps, licensed automation tools such as HP UFT (QTP), test frameworks and scripting tools such as jMeter and SOAPUI. There is much to explore on this topic and we want everyone to leave with a few key areas they can start building on today.

How to Add Test Automation to your Quality Assurance Toolbelt

Brett Tramposh

This poster is presenting a methodology for entity matching of product web offers. It was presented during the 8th Euroscipy conference in end of august of 2015. This poster is presenting Pricing Assistant’s recent work on product matching. The goal was to create a tool capable of determining if two web pages are selling the same product. Our approach combines various techniques from the fields of image analysis, semantic analysis and machine learning. The technique had great results and outperformed existing literature in fields such as skincare, cycling equipment and sporting goods.

Entity matching of web offers, from html to similarity score.

Paul Puget

In this webinar presentation, Brian Jordan gives a tour of Code.org’s continuous, automated cross-browser testing suite, newly integrated with CircleCI. Brian discusses how Code.org’s small engineering team approaches testing throughout the product development cycle, including a suite of cross-browser tests using Sauce Labs that now run for every single commit an engineer pushes to their feature branch.

Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...

Sauce Labs

RRE is an open-source search quality evaluation tool that can be used to produce a set of reports about the quality of a system, iteration after iteration, and that can be integrated within a continuous integration infrastructure to monitor quality metrics after each release. Many aspects remained problematic though: – how to directly evaluate a middle layer search-API that communicates with Apache Solr or Elasticsearch? – how to easily generate explicit and implicit ratings without spending hours on tedious json files? – how to better explore the evaluation results? with nice widgets and interesting insights? Rated Ranking Evaluator Enterprise solves these problems and much more. Join us as we introduce the next generation of open-source search quality evaluation tools, exploring the internals and real-world scenarios!

Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...

Sease

Traditional machine learning systems are hand-designed and tuned by machine learning experts. To scale up the impact of machine learning to many real-world applications, we must figure out a way to automate the designing process of these pipelines. In this talk, I will discuss the use of machine learning to automate the process of designing neural architectures and data augmentation strategies (Neural Architecture Search and AutoAugment).

Quoc Le at AI Frontiers : Automated Machine Learning

AI Frontiers

Building High Available and Scalable Machine Learning Applications

Yalçın Yenigün

btNOG 10: Preparing for IPv6 implementation using AI

APNIC

presentation.pdf

caa28steve

How to Build an Attribution Solution in 1 Day

Phillip Law

How to Build an Attribution Solution in 1 Day

Phillip Law

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs (20)

Speed_Perception_Phase1

Decision Making based on Machine Learning at Outfittery (W-JAX 2017)

GOTO Night: Decision Making Based on Machine Learning

Software Analytics: Data Analytics for Software Engineering

How we integrate Machine Learning Algorithms into our IT Platform at Outfitte...

Innovate Better Through Machine data Analytics

Amazon Search Summit - the need for split testing in SEO

JDO 2019: Data Science for Developers - Matthew Renze

Real User Monitoring: Getting Real Data from Real Users in the Real World - S...

How Data Science can boost your SEO ?

How to Add Test Automation to your Quality Assurance Toolbelt

Entity matching of web offers, from html to similarity score.

Tests for Every Branch Using CircleCI and Sauce Labs to Continuously Test CS ...

Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...

Quoc Le at AI Frontiers : Automated Machine Learning

Building High Available and Scalable Machine Learning Applications

btNOG 10: Preparing for IPv6 implementation using AI

presentation.pdf

How to Build an Attribution Solution in 1 Day

More from Marco Brambilla

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...

Marco Brambilla

Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...

Marco Brambilla

We discuss the use of hierarchical transformers for user semantic similarity in the context of analyzing users' behavior and profiling social media users. The objectives of the research include finding the best model for computing semantic user similarity, exploring the use of transformer-based models, and evaluating whether the embeddings reflect the desired similarity concept and can be used for other tasks. We use a large dataset of Twitter users and apply an automatic labeling approach. The dataset consists of English tweets posted in November and December 2020, totaling about 27GB of compressed data. Preprocessing steps include filtering out short texts, cleaning user connections, and selecting a benchmark set of users for evaluation. Since Transformer architectures are known to work well on short text, we cannot use them on extensive collections of tweets describing the activity of a user. Therefore, we propose a hierarchical structure of transformer models to be used first on tweets and then on their aggregations. The models used in the study include hierarchical transformers, and the tweet embeddings are obtained using four Transformer-based models: RoBERTa2, BERTweet3, Sentence BERT4, and Twitter4SSE5. The researchers test different techniques for processing tweet embeddings to generate accurate user embeddings, including mean pooling, recurrence over BERT (RoBERT), and transformer over BERT (ToBERT). The evaluation of the models is done on a set of 5,000 users, comparing user similarities with 30 other candidate users, 5 of which are considered similar and 25 considered dissimilar. The evaluation metrics used include mean average precision (MAP), mean reciprocal rank (MRR) at 10, and normalized discounted cumulative gain (nDCG). The optimization process involves selecting a loss function and using the AdamW optimizer with specific hyperparameters. The results show that the hierarchical approach with a Stage-1 Twitter4SSE model and a Stage-2 Transformer model performs the best among the alternatives. In conclusion, the research provides a large unbiased dataset for user similarity analysis, presents a hierarchical language model optimized for accurate user similarity computation, and validates the models' performance on similarity tasks, with potential applications to related problems. The future work includes investigating the impact of time and topic drift on the models' performance.

Hierarchical Transformers for User Semantic Similarity - ICWE 2023

Marco Brambilla

The Web and social media are the environments where people post their content, opinions, activities, and resources. Therefore, a considerable amount of user-generated content is produced every day for a wide variety of purposes. On the other side, people live their everyday life immersed in the physical world, where society, economy, politics and personal relations continuously evolve. These two opposite and complementary environment are today fully integrated: they reflect each other and they interact with each other in a stronger and stronger way. Exploring and studying content and data coming from both environments offers a great opportunity to understand the ever evolving modern society, in terms of topics of interest, events, relations, and behaviour. In this speech I will discuss through business cases and socio-political scenarios how we can extract insights and understand reality by combining and analyzing data from the digital and physical world, so as to reach a better overall picture of reality itself. Along this path, we need to keep into account that reality is complex and varies in time, space and along many other dimensions, including societal and economic variables. The speech highlights the main challenges that need to be addressed and outlines some data science strategies that can be applied to tackle these specific challenges. This slide deck has been presented as a keynote speech at WISE 2022 in Biarritz, France.

Exploring the Bi-verse.A trip across the digital and physical ecospheres

Marco Brambilla

In online social media platforms, users can express their ideas by posting original content or by adding comments and responses to existing posts, thus generating virtual discussions and conversations. Studying these conversations is essential for understanding the online communication behavior of users. This study proposes a novel approach to retrieve popular patterns on online conversations using network-based analysis. The analysis consists of two main stages: intent analysis and network generation. Users’ intention is detected using keyword-based categorization of posts and comments, integrated with classification through Naïve Bayes and Support Vector Machine algorithms for uncategorized comments. A continuous human-in-the-loop approach further improves the keyword-based classification. To build and understand communication patterns among the users, we build conversation graphs starting from the hierarchical structure of posts and comments, using a directed multigraph network. The experiments categorize 90% comments with 98% accuracy on a real social media dataset. The model then identifies relevant patterns in terms of shape and content; and finally determines the relevance and frequency of the patterns. Results show that the most popular online discussion patterns obtained from conversation graphs resemble real-life interactions and communication.

Conversation graphs in Online Social Media

Marco Brambilla

Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...

Marco Brambilla

Available Data Science M.Sc. Thesis Proposals

Marco Brambilla

Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and potential audience. This is crucial for them because it let them understand the trends and future commercial and marketing opportunities. However, all this relies on a solid and reliable data collection phase, that grants that all the analyses, extractions and predictions are applied on clean, solid and focused data. Indeed, the typical topic-based collection of social media content performed through keyword-based search typically entails very noisy results. We recently implemented a simple study aiming at cleaning the data collected from social content, within specific domains or related to given topics of interest. We propose a basic method for data cleaning and removal of off-topic content based on supervised machine learning techniques, i.e. classification, over data collected from social media platforms based on keywords regarding a specific topic. We define a general method for this and then we validate it through an experiment of data extraction from Twitter, with respect to a set of famous cultural institutions in Italy, including theaters, museums, and other venues. For this case, we collaborated with domain experts to label the dataset, and then we evaluated and compared the performance of classifiers that are trained with different feature extraction strategies.

Data Cleaning for social media knowledge extraction

Marco Brambilla

Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds. In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?. This work was presented at The Web Conference 2018, MSM workshop.

Iterative knowledge extraction from social networks. The Web Conference 2018

Marco Brambilla

Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage ``better" driving behaviour through immediate feedback while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style. In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour. This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance policies and car rentals. Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined groundtruth on drivers classification. The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...

Marco Brambilla

For centuries, science (in German "Wissenschaft") has aimed to create ("schaften") new knowledge ("Wissen") from the observation of physical phenomena, their modelling, and empirical validation. Recently, a new source of knowledge has emerged: not (only) the physical world any more, but the virtual world, namely the Web with its ever-growing stream of data materialized in the form of social network chattering, content produced on demand by crowds of people, messages exchanged among interlinked devices in the Internet of Things. The knowledge we may find there can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted. The challenge is once again to capture and create knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data). The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective. While this may still be far from true, some existing approaches are actually addressing the problem and provide preliminary insights into the possibilities that successful attempts may lead to. The talk explores the mixed realistic-utopian domain of knowledge extraction and reports on some tools and cases where digital and physical world have brought together for better understanding our society.

Myths and challenges in knowledge extraction and analysis from human-generate...

Marco Brambilla

Knowledge bases like DBpedia, Yago or Google's Knowledge Graph contain huge amounts of ontological knowledge harvested from (semi-)structured, curated data sources, such as relational databases or XML and HTML documents. Yet, the Web is full of knowledge that is not curated and/or structured and, hence, not easily indexed, for example social data. Most work so far in this context has been dedicated to the extraction of entities, i.e., people, things or concepts. This poster describes our work toward the extraction of relationships among entities. The objective is reconstructing a typed graph of entities and relationships to represent the knowledge contained in social data, without the need for a-priori domain knowledge. The experiments with real datasets show promising performance across a variety of domains. The key distinguishing feature of the work is its focus on highly unstructured social data (tweets and Facebook posts) without reliable grammar structures. Traditional relation extraction approaches supervised , semi-supervised or unsupervised, commonly assume the availability of grammatically correct language corpora.

Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...

Marco Brambilla

Internet of Things technologies and applications are evolving and continuously gaining traction in all fields and environments, including homes, cities, services, industry and commercial enterprises. However, still many problems need to be addressed. For instance, the IoT vision is mainly focused on the technological and infrastructure aspect, and on the management and analysis of the huge amount of generated data, while so far the development of front-end and user interfaces for IoT has not played a relevant role in research. On the contrary, user interfaces in the IoT ecosystem they can play a key role in the acceptance of solutions by final adopters. In this paper we present a model-driven approach to the design of IoT interfaces, by defining a specific visual design language and design patterns for IoT\ applications, and we show them at work. The language we propose is defined as an extension of the OMG standard language called IFML.

Model-driven Development of User Interfaces for IoT via Domain-specific Comp...

Marco Brambilla

Consumer-centered software applications nowadays are required to be available both as mobile and desktop versions. However, the app design is frequently made only for one of the two (i.e., mobile first or web first) while missing an appropriate design for the other (which, in turn, simply mimics the interaction of the first one). This results into poor quality of the interaction on one or the other platform. Current solutions would require different designs, to be realized through different design methods and tools, and that may require to double development and maintenance costs. In order to mitigate such an issue, this paper proposes a novel approach that supports the design of both web and mobile applications at once. Starting from a unique requirement and business specification, where web– and mobile–specific aspects are captured through tagging, we derive a platform independent design of the system specified in IFML. This model is subsequently refined and detailed for the two platforms, and used to automatically generate both the web and mobile versions. If more precise interactions are needed for the mobile part, a blending with MobML, a mobile-specific modeling language, is devised. Full traceability of the relations between artifacts is granted.

A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.

Marco Brambilla

Big Data and Stream Data Analysis at Politecnico di Milano

Marco Brambilla

The Web Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues. The course is organised in four parts. 1. Syntax In the first part, the course introduces the basis of content analysis. If focuses on the syntactic aspects, covering the fundamentals of natural language processing and text mining. It describes the structure and typical characteristics of the different web sources, spanning search results, social media contents, social network structures, Web APIs, and so on. It also provides an overview of the basic Web analysis techniques applied in Web search and Web recommendation. 2. Semantics In the second part, the course presents semantic technologies. These technologies are very important nowadays because they allow to treat the "variety" dimension of Big Data, i.e., they enable integration of multiple and diverse sources of information, which is typical on the modern Web platform. Covered topics include: - RDF - a flexible data model to represent heterogeneous data - OWL - a flexible ontological language to model heterogeneous data sources - SPARQL - a query language for RDF. It shows how to put all the pieces together in order to achieve interoperability among heterogeneous information sources 3. Time The third part covers the realm of temporal-dependent data. The topics covered here allow to treat the "velocity" dimension of Big Data. It shows the importance for many Big Data analysis scenarios to process data stream, coming for instance from Internet of Things (IoT) and Social Media sources; and it describes how to apply semantic and syntactic techniques in the context of time-dependent information. For instance, it shows how to extend RDF to model RDF streams, how to extend SPARQL to continuously process RDF streams and how to reason on those RDF Streams 4. Applications In the fourth part, the course focuses on specific application scenarios and presents the typical settings and problems where the presented techniques can be applied. This part discusses settings such as: big data analysis for smart cities; data analytics for brand monitoring (marketing) and event monitoring; data analysis for trend detection and user engagement; and so on.

Web Science. An introduction

Marco Brambilla

Massive data integration technologies have been recently used to produce very large ontologies. However, knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail. Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities. Thus, we propose a method for discovering emerging entities by extracting them from social content. We start from a purely-syntactic method as a baseline, and we propose a semantics-based method based on entity recognition and DBpedia. The method associates candidate entities to feature vectors, built from social content by using co-occurrence, and then extracts the emerging entities by using feature similarity measures. Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities and extracting their relevant relationships to given types; the method can be continuously or periodically iterated, using the already identified emerged knowledge as new starting point. We validate our method by applying it to a set of diverse domain-specific application scenarios, spanning fashion, literature, exhibitions and so on. We show the approach at work and we demonstrate its effectiveness on datasets with different characterization in terms of coverage, dynamics and size.

On the Quest for Changing Knowledge. Capturing emerging entities from social ...

Marco Brambilla

Cities are growing as melting pots of people with different culture, religion, and language. In this paper, through multilingual analysis of Twitter contents shared within a city, we analyze the prevalent language in the different neighborhoods of the city and we compare the results with census data, in order to highlight any parallelisms or discrepancies between the two data sources. We show that the officially identified neighborhoods are actually representing significantly different communities and that the use of the social media as a data source helps to detect those weak signals that are not captured from traditional data.

Studying Multicultural Diversity of Cities and Neighborhoods through Social M...

Marco Brambilla

Model driven software engineering in practice book - Chapter 9 - Model to tex...

Marco Brambilla

Model driven software engineering in practice book - chapter 7 - Developing y...

Marco Brambilla

More from Marco Brambilla (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...

Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...

Hierarchical Transformers for User Semantic Similarity - ICWE 2023

Exploring the Bi-verse.A trip across the digital and physical ecospheres

Conversation graphs in Online Social Media

Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...

Available Data Science M.Sc. Thesis Proposals

Data Cleaning for social media knowledge extraction

Iterative knowledge extraction from social networks. The Web Conference 2018

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...

Myths and challenges in knowledge extraction and analysis from human-generate...

Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...

Model-driven Development of User Interfaces for IoT via Domain-specific Comp...

A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.

Big Data and Stream Data Analysis at Politecnico di Milano

Web Science. An introduction

On the Quest for Changing Knowledge. Capturing emerging entities from social ...

Studying Multicultural Diversity of Cities and Neighborhoods through Social M...

Model driven software engineering in practice book - Chapter 9 - Model to tex...

Model driven software engineering in practice book - chapter 7 - Developing y...

Recently uploaded

Technology has taken up space all over the world. From generating content with a single command on ChatGPT to getting your food served by Robots at your favorite restaurant, artificial advancements have ruled every space. Every industry is set to develop top-notch technology in every sector; finance, IT, healthcare, gaming, and banking, with competitive market standards. One of these rapidly growing industries is Mobile App Development. According to the Straits Research report, it is expected to reach USD 583.03 billion at a CAGR OF 12.8% between (2022 and 2030). It clearly shows how mobile app development has become an integral part of the digital landscape and revolutionized technology.

The Top App Development Trends Shaping the Industry in 2024-25 .pdf

ayushiqss

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

kalichargn70th171

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In Chinsurah ❤Personal Whatsapp Number Chinsurah Call Girls 8617697112 💦✅. Chinsurah escorts we are avaliable for our all types budget customers with offer great deals in Chinsurah.Call Now our Chinsurah escort service & call girls ... Independent call girls in Chinsurah escorts available 24 hours a day for discreet incall and outcall bookings from trusted call girls - Elis.in. Nitya salvi Chinsurah escorts service agency # Are you looking for sexy call girls ? Call our agency to get you dream independent call girl, ... One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Flexibility Choices and options Lists of many beauty fantasies Turn your dream into reality Perfect companionship Cheap and convenient In-call and Out-call services And many more. WhatsApp Chat: 📞 8617697112 Visit The Website : https://www.nityasalvi.com/

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...

Nitya salvi

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

Shane Coughlan

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

masabamasaba

Pharm-D Biostatistics and Research methodology

Anusha Are

The title is not connected to what is inside

shinachiaurasa2

In an era where security concerns are paramount, the integration of artificial intelligence (AI) into CCTV cameras has revolutionized surveillance capabilities. One of the most significant advancements is the ability to achieve real-time threat detection, enabling immediate responses to potential security breaches. This blog explores how AI is reshaping surveillance through real-time threat detection and the implications of this technology.

Optimizing AI for immediate response in Smart CCTV

shikhaohhpro

HR Software Buyers Guide in 2024 - HRSoftware.com

Fatema Valibhai

Software Quality Assurance Interview Questions

Arshad QA

10 Trends Likely to Shape Enterprise Technology in 2024

Mind IT Systems

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf

VishalKumarJha10

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

SelfMade bd

In the past six months, the AI landscape has undergone a massive transformation, ushering in a new era of productivity with the latest in Large Language Models (LLMs) and AI technology. This deep dive unlocks how to: Create CustomGPT Models: No coding needed to tailor AI for your unique projects. Integrate your own data, including PDFs and Excel sheets, making information handling a breeze. Plus, discover how to call your own actions/integrations for even more personalized utility. Navigate Advanced Prompting: Overcome AI's memory limits and utilize Retrieval-Augmented Generation for accessing your personalized data, streamlining how you interact with AI. Stay Ahead with AI Trends: Peek into the evolving world of LLMs, featuring newcomers like Google Gemini, Anthropic Claude, Open Sora, and Twitter Grok, and understand what their advancements mean for your productivity. Witness Real-Life Transformations: Through examples and prompt demonstrations, see firsthand how these AI strategies revolutionize routine tasks, from data analysis to content creation. Learn to leverage image output and input for advanced practical use cases, adding a new dimension to your productivity toolkit. No previous coding or AI experience is needed for this talk. Stay ahead in the fast-evolving world of work. Embrace the AI revolution and transform your workflow with advanced LLM techniques. Join us to ensure you're not left behind in the productivity race.

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

VictorSzoltysek

Unlocking the Future of AI Agents with Large Language Models

aagamshah0812

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

Delhi Call girls

In the realm of real-time applications, Large Language Models (LLMs) have long dominated language-centric tasks, while tools like OpenCV have excelled in the visual domain. However, the future (maybe) lies in the fusion of LLMs and deep learning, giving birth to the revolutionary concept of Large Action Models (LAMs). Imagine a world where AI not only comprehends language but mimics human actions on technology interfaces. For example, the Rabbit r1 device presented at CES 2024, driven by an AI operating system and LAM, brings this vision to life. It executes complex commands, leveraging GUIs with unprecedented ease. In this presentation, join me on a journey as a software engineer tinkering with WebRTC, Janus, and LLM/LAMs. Together, we’ll evaluate the current state of these AI technologies, unraveling the potential they hold for shaping the future of real-time applications.

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Alberto González Trastoy

ManageIQ - Sprint 236 Review - Slide Deck

ManageIQ

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Philip Schwarz

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain

masabamasaba

Recently uploaded (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

Pharm-D Biostatistics and Research methodology

The title is not connected to what is inside

Optimizing AI for immediate response in Smart CCTV

HR Software Buyers Guide in 2024 - HRSoftware.com

Software Quality Assurance Interview Questions

10 Trends Likely to Shape Enterprise Technology in 2024

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

Unlocking the Future of AI Agents with Large Language Models

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

ManageIQ - Sprint 236 Review - Slide Deck

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

1. Industrial and Information Engineering Generation of Realistic Navigation Paths for Web Site Testing using Recurrent Neural Networks and Generative Adversarial Neural Networks Silvio Pavanetto and Marco Brambilla Semantic Web and Linked Open Data Helsinki, Finland, Online on 9 – 12 June 2020

2. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation? 1. Improve products even before the release 2. Generate open high-quality data for research 3. Related work with no focus on high-quality weblog generation 3.1 Only few open source libraries

3. Silvio Pavanetto and Marco Brambilla Introduction and Motivations Why weblog generation?

4. Silvio Pavanetto and Marco Brambilla Problem Definition Challenges to be Faced 1. Understand if deep learning algorithms can generate better weblogs data than statistical methods 2. Understand what better weblog means 3. Among the various deep learning techniques, apply GAN (Generative Adversarial Network) to a new task

5. Silvio Pavanetto and Marco Brambilla Problem Definition Roadmap for solving the problem Pre-process a publicly available weblog Develop statistical algorithm Develop recurrent neural network Develop GAN Evaluate the quality of the generated data

6. Silvio Pavanetto and Marco Brambilla Proposed Approach Pre-processing algorithm Cleaning • Remove entries having response code other than 200 • Remove activities coming from bots • Remove no HTML pages • List of possible entry points • Navigation pattern using data mining (Apriori) • Generation of datasets that will be used by the other algorithms Knowledge extraction

7. Silvio Pavanetto and Marco Brambilla Proposed Approach Deep Learning - RNN Why Recurrent Neural Network? • Well suited for processing sequential data

8. Silvio Pavanetto and Marco Brambilla Proposed Approach Generative Adversarial Network • New type of neural network (first in 2014) with incredible generation capabilities • Almost used only in computer vision Key concept: Put two neural networks one against the other in a two-player game

9. Silvio Pavanetto and Marco Brambilla Proposed Approach GAN Implementation – Possible Solution GAN is designed for generating continuous data Possible solution: • Generative model treated as an agent of reinforcement learning (RL) • The state is composed by the generated URLs so far, and the action is the next URL to be generated Reward: The discriminator produces a probability for the sequence of being real

10. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good Evaluation Metric: BLEU BLEU, or Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations, or also, is an algorithm for evaluating the quality of text which has been machine-translated, from one natural language to another.

11. Silvio Pavanetto and Marco Brambilla Experiments Understand if a weblog is good BLEU is not enough. Human Evaluation! • 50 real sequences and 50 generated by the algorithms mixed • 6 judges are invited to check the 100 sequences • +1 for the algorithm if the judge is fooled • +0 point if the judge discovers that the sequence is not real • Scores are averaged among all the judges Evaluation game:

12. Silvio Pavanetto and Marco Brambilla Experiments Evaluation – Final Comparison Weblog generation performance comparison

13. Silvio Pavanetto and Marco Brambilla Conclusions We proposed a step forward towards automatic production of high- quality weblog using deep learning techniques, such as recurrent neural network and generative adversarial neural networks. Deep learning methods are suitable for weblog generation: • The GAN is the best algorithm: it outperforms the baseline by: • 0.2116 with the Human metric • 0.1432 with the BLEU metric

14. Silvio Pavanetto and Marco Brambilla Future Work Integration with Model-Driven approaches useful for visualizing statistics about weblogs in a graphical way Addition of more variables in the training of the network that could improve the quality of the generated weblog Evaluation with other weblogs, belonging to different websites

Editor's Notes

(like .png, .gif or other file types loaded inside a web page) (this task and its related issues will be discussed later)
RNN: Artificial neural network (ANN) where connections between nodes form a directed graph along a sequence. This allows it to exhibit temporal dynamic behavior for a time sequence In the above diagram, a chunk of neural network, AA, looks at some input xtxt and outputs a value htht. A loop allows information to be passed from one step of the network to the next. These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren’t all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.
Consider the sequence generation procedure as a sequential decision-making Process.
Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.
Humans are good in evaluting this type of data since a weblog is a composition of navigation sequence and every sequence is something that is decided and created by a human. Quality is considered to be the correspondence between a machine’s output and that of a human. Although it is usually used for evaluating text, we already mentioned that the task faced in this work could be associated to the text translation, because of the conceptual similarity between the sequence of pages in a single navigation session and the sequence of words in a phrase. In fact, every URL is treated as a unique "word" in the vocabulary, composed of all the pages of a particular website. Using this metric, scores are calculated for individual translated segments — generally sentences — by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Transferring this to our case, the translated segments are the generated navigation sequences, while the good quality reference translations correspond to our original dataset: the NASA weblog.

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Recommended

Recommended

More Related Content

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Similar to Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs (20)

More from Marco Brambilla

More from Marco Brambilla (20)

Recently uploaded

Recently uploaded (20)

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Editor's Notes