Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

•

3 likes•938 views

This document provides an overview of representation learning techniques used at Red Hat, including word2vec, doc2vec, url2vec, and customer2vec. Word2vec is used to learn word embeddings from text, while doc2vec extends it to learn embeddings for documents. Url2vec and customer2vec apply the same technique to learn embeddings for URLs and customer accounts based on browsing behavior. These embeddings can be used for tasks like search, troubleshooting, and data-driven customer segmentation. Duplicate detection is another application, where title and content embeddings are compared. Representation learning is also explored for baseball players to model player value.

Technology

REPRESENTATION
LEARNING @ RED HAT
Michael A. Alcorn (malcorn@redhat.com)
Machine Learning Engineer - Information Retrieval
https://sites.google.com/view/michaelaalcorn/
1

Outline
Background
word2vec/url2vec
doc2vec/account2vec
Duplicate Detection
(batter|pitcher)2vec
MLconf Blog
2

Background
Why?
Small amount (zero?) of labeled data for task
Lots of unlabeled data (labeled data for a different
task?)
Can we use large amounts of unlabeled data to make
better predictions?
Not the same as traditional unsupervised learning!
in Goodfellow et al.'s Deep Learning
textbook
by Bengio et al.
Representation learning
Transfer learning
Excellent chapter
Article
3

word2vec
ew
TextTextTextText
NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2)
4

word2vec
ew
Deeplearning4j - " "
Mikolov et al. (2013)
Word2vec
5

word2vec
Analogies
"x is to y as ? is to z" x - y + z = ?
bash - shellshock + heartbleed = openssl
ﬁrefox - linux + windows = internet_explorer
openshift - cloud + storage = gluster
rhn_register - rhn + rhsm = subscription-
manager
=+—
6

Naming Colors
mapping RGB values to
color names
Results are pretty underwhelming for those in the
know
Can word embeddings improve ( )?
Blog post by Janelle Shane
GitHub
7

url2vec
Tasks concerning URLs
Search - returning relevant content
Troubleshooting - recommending related articles
Obvious method - look at text
Alternative/enhanced method - use customer
browsing behavior as additional contextual clues
8

url2vec
How?
Treat each day of browsing activity as a "sentence"
Treat each URL as a "word"
Run word2vec!
9

url2vec
https://access.redhat.com/solutions/25190
https://access.redhat.com/solutions/10107
Application: ScatterPlot3D
10

doc2vec
" "
Le and Mikolov (2014)
NLP 05: From Word2vec to Doc2vec: a simple example with Gensim
11

customer2vec
Why?
Data-driven segmentation
Same idea as url2vec except now we treat each account as
a "document" of many "sentences" (different browsing
days)
12

Duplicate Detection
There are a number of "duplicate" KCS solutions on
the Customer Portal
Muddy search results
How can we identify candidate duplicate documents?
Obvious approach - compare text (e.g., tf-idf)
Bag-of-words loses any structural meaning behind text
Can we learn better representations?
Title is essentially a summary of the solution content
Learn representations of body that are similar to title
representations (like the DSSM; )my code
15

Deep Semantic Similarity Model
Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing
16

(batter|pitcher)2vec ( )GitHub
Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and inform trades
Find undervalued/overvalued players
17

Can we learn meaningful representations of MLB
players?
Accurate representations could be used to simulate
games and inform trades
Find undervalued/overvalued players
(batter|pitcher)2vec ( )GitHub
18

(batter|pitcher)2vec
"
"
Learning to Coach
Football
Wang and Zemel (2016)
20

Viewers also liked

A Machine Learning approach for detecting a Malware: The project is to improve the way we detect script based malware using Machine Learning. Malware has become one of the most active channel to deliver threats like Banking Trojans and Ransomware. The talk is aimed at finding a new and effective way to detect the malware. We started with acquiring both malicious and clean samples. Later we performed feature identification, while building on top of existing knowledge base of malware. Then we performed automated feature extraction. After certain feature set is obtained, we teased-out feature which are categorical, interdependent or composite. We applied varying machine learning models, producing both binary and categorical outcomes. We cross validated our results and re-tuned our feature set and our model, until we obtained satisfying results, with least false-positives. We concluded that not all the extracted features are significant, in fact some features are detrimental on the model performance. Once such features are factored-out, it results not only in better match, but also provides a significant gain in performance.

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

MLconf

ML to Cure the World: The practice of medicine involves diagnosis, treatment, and prevention of diseases. Recent technological breakthroughs have made little dent to the centuries-old system of practicing medicine: complex diagnostic decisions are still mostly dependent on “educated” work-ups of the doctors, and rely on somewhat outdated tools and incomplete data. All of this often leads to imperfect, biased, and, at times, incorrect diagnosis and treatment. With a growing research community as well as tech companies working on AI advances to medicine, the hope for healthcare renaissance is definitely not lost. The emphasis of this talk will be on ML-driven medicine. We will discuss recent AI advancements for aiding medical decision including language understanding, medical knowledge base construction and diagnosis systems. We will discuss the importance of personalized medicine that takes into account not only the user, but also the context, and other metadata. We will also highlight challenges in designing ML-based medical systems that are accurate, but at the same time engaging and trustworthy for the user. Bio: Xavier Amatriain is currently co-founder and CTO of Curai, a stealth startup trying to radically improve healthcare for patients by using AI. Previous to this, he was VP of Engineering at Quora, and Research/engineering Director at Netflix, where he led the team building the famous Netflix recommendation algorithms. Before going into leadership positions in industry, Xavier was a research scientist at Telefonica Research and a research director at UCSB. With over 50 publications (and 3k+ citations) in different fields, Xavier is best known for his work on machine learning in general and recommender systems in particular. He has lectured at different universities both in the US and Spain and is frequently invited as a speaker at conferences and companies.

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

MLconf

Classifying Multi-Variate Time Series at Scale: Characterizing and understanding the runtime behavior of large scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series. In this talk, we propose a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series. Our technique maps each time series to a Grammian Angular Difference Field (GADF), interprets that as an image, uses Google’s pre-trained CNN (trained on Inception v3) to map the GADF images into a 2048-dimensional vector space and then uses a small MLP with two hidden layers, with fifty nodes in each layer, and a softmax output to achieve the final classification. Our work is not domain specific – a fact proven by our achieving competitive accuracies with published results on the univariate UCR data set as well as the multivariate UCI data set. Bio: Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.

Ashfaq Munshi, ML7 Fellow, Pepperdata

MLconf

Deep Reinforcement Learning with Shallow Trees: In this talk, I present Concept Network Reinforcement Learning (CNRL), developed at Bonsai. It is an industrially applicable approach to solving complex tasks using reinforcement learning, which facilitates problem decomposition, allows component reuse, and simplifies reward functions. Inspired by Sutton’s options framework, we introduce the notion of “Concept Networks” which are tree-like structures in which leaves are “sub-concepts” (sub-tasks), representing policies on a subset of state space. The parent (non-leaf) nodes are “Selectors”, containing policies on which sub-concept to choose from the child nodes, at each time during an episode. There will be a high-level overview on the reinforcement learning fundamentals at the beginning of the talk. Bio: Matineh Shaker is an Artificial Intelligence Scientist at Bonsai in Berkeley, CA, where she builds machine learning, reinforcement learning, and deep learning tools and algorithms for general purpose intelligent systems. She was previously a Machine Learning Researcher at Geometric Intelligence, Data Science Fellow at Insight Data Science, Predoctoral Fellow at Harvard Medical School. She received her PhD from Northeastern University with a dissertation in geometry-inspired manifold learning.

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

MLconf

Getting Value Out of Chat Data: Chat-based interfaces are increasingly common, whether as customers interacting with companies or as employees communicating with each other within an organization. Given the large number of chat logs being captured, along with recent advances in natural language processing, there is a desire to leverage this data for both insight generation and machine learning applications. Unfortunately, chat data is user-generated data, meaning it is often noisy and difficult to normalize. It is also mostly short texts and heavily context-dependent, which cause difficulty in applying methods such as topic modeling and information extraction. Despite these challenges, it is still possible to extract useful information from these data sources. In this talk, I will be providing an overview of techniques and practices for working with chat-based user interaction data with a focus on machine-augmented data annotation and unsupervised learning methods. Bio: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

MLconf

Lessons Learnt from building ML Products for enterprise SaaS: Having spent the last 4+ years productizing ML powered enterprise products, we have learnt a lot! Join us to hear the stories of our stumbles (ahem learnings) in applying machine learning to solve business problems for Fortune 500 companies. Our hands-on experience has shaped our product strategy, ML platform design and organization’s operational principles. And the investments we made based on our learnings have helped us drastically improve our time to market for ML products. Come on by to hear the technical and organizational challenges (and some solutions) in building ML products for enterprise SaaS. Hopefully our learnings will be useful in your journey. Bio: LN leads the architecture and design of Workday’s ML Platform and Services. He is all about building large scale distributed systems and data platforms. Currently his days (and some nights) are spent on solving the challenges in building ML products for Enterprise SaaS. LN’s career spans across HP, IBM Research, Symantec and now Workday. At Symantec, he was the architect and lead of a streaming platform that ingested and processed 2+ billions of events per day. As a Research Staff Member at IBM T.J. Watson Research Center, LN built optimizations for automatic parallelization, techniques for approximate computing, deployment automation for OpenStack, and analytics for large scale cloud services. LN holds a Ph.D. in Computer Science from Colorado State University and has published more than 40 technical publications / patents. His work has received awards from ACM, IBM, and HP. Bio:Madhura Dudhgaonkar is responsible for leading Workday’s search, data science and machine learning teams based in San Francisco. Her teams have spent ~4 years building machine learning products used by Fortune 500 companies. Her experience ranges from being a hands-on engineer to leading large engineering organizations. Madhura’s career spans across SUN Microsystems, Adobe and now Workday. During her career, she has been involved with building a variety of products – from developing Java Language to building a version 1.0 consumer product to building enterprise SaaS products. She holds a bachelor’s degree in electronics and telecommunications and a master’s degree in math and computer science. Madhura is originally from a small town in India and came to the United States to pursue her passion in technology. She currently calls San Francisco home, and despite nine years here, can’t get enough of its hilly charm, the diversity of people, culture, and experiences.

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

MLconf

The Role of AI and Machine Learning in Creativity: I’ll discuss Magenta, a Google Brain project investigating music and art generation using deep learning and reinforcement learning. I’ll describe the goals of Magenta and how it fits into the general trend of AI moving into our daily lives. One crucial question is: Where does AI and Machine Learning fit in the creative process? I’ll argue that it’s about augmenting and extending the artist rather than just creating artifacts (songs, paintings, etc.) with machines. I’ll talk about two recent projects. In the first, we explore the use of recurrent neural networks to extend musical phrases in different ways. In the second we look at teaching a neural network to draw with strokes. This will be a high-level overview talk with no need for knowledge of AI or Machine Learning. Bio:Doug leads Magenta, a Google Brain project working to generate music, video, image and text using deep learning and reinforcement learning. A main goal of Magenta is to better understanding how AI can enable artists and musicians to express themselves in innovative new ways. Before Magenta, Doug led the Google Play Music search and recommendation team. From 2003 to 2010 Doug was faculty at the University of Montreal’s MILA Machine Learning lab, where he worked on expressive music performance and automatic tagging of music audio.

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

MLconf

Personalized User Recommendations at Tinder: The TinVec Approach: With 26 million matches per day and more than 20 billion matches made to date, Tinder is the world’s most popular app for meeting new people. Our users swipe for a variety of purposes, like dating to find love, expanding social networks and meeting locals when traveling. Recommendation is an important service behind-the-scenes at Tinder, and a good recommendation system needs to be personalized to meet an individual user’s preferences. In this talk, we will discuss a new personalized recommendation approach being developed at Tinder, called TinVec. TinVec embeds users’ preferences into vectors leveraging on the large amount of swipes by Tinder users. We will discuss the design, implementation, and evaluation of TinVec as well as its application to personalized recommendations. Bio: Dr. Steve Liu is chief scientist at Tinder. In his role, he leads research innovation and applies novel technologies to new product developments. He is currently a professor and William Dawson Scholar at McGill University School of Computer Science. He has also served as a visiting research scientist at HP Labs. Dr. Liu has published more than 280 research papers in peer-reviewed international journals and conference proceedings. He has also authored and co-authored several books. Over the course of his career, his research has focused on big data, machine learning/AI, computing systems and networking, Internet of Things, and more. His research has been referenced in articles publishing across The New York Times, IDG/Computer World, The Register, Business Insider, Huffington Post, CBC, NewScientist, MIT Technology Review, McGill Daily and others. He is a recipient of the Outstanding Young Canadian Computer Science Researcher Prizes from the Canadian Association of Computer Science and is a recipient of the Tomlinson Scientist Award from McGill University. He is serving or has served on the editorial boards of ACM Transactions on Cyber-Physical Systems (TCPS), IEEE/ACM Transactions on Networking (ToN), IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Transactions on Vehicular Technology (TVT), and IEEE Communications Surveys and Tutorials (COMST). He has also served on the organizing committees of more than 38 major international conferences and workshops. Dr. Liu received his Ph.D. in Computer Science with multiple honors from the University of Illinois at Urbana-Champaign. He received his Master’s degree in Automation and BSc degree in Mathematics from Tsinghua University.

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

MLconf

Codifying Data Science Intuition: Using Decision Theory to Automate Time Series Model Selection: While models generated from cross-sectional data can utilize cross-validation for model selection, most time series models cannot be cross-validated due to the temporal structure of the data used to create them. It is possible to employ a rolling cross-validation technique, however this process is computationally expensive and provides no indication of the long-term forecast accuracies of the models. The purpose of this talk is to elaborate how decision theory can be used to automate time series model selection in order to streamline the manual process of validation and testing. By creating consecutive, temporally independent holdout sets, performance metrics for each model’s prediction on each holdout set are fed into a decision function to select an unbiased model. The decision function helps minimize the poorest performance of each model across all holdout sets in order to counteract the possibility of choosing a model that overfits or underfits the holdout sets. Not only does this process improve forecast accuracy, but it also reduces computation time by only requiring the creation of a fixed number of proposed forecasting models.

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

MLconf

Machine Learning Based Attack Vector Modeling for CyberSecurity: Connections have behavioural patterns that are unique to protocols, loads, window sizes, bandwidth, and mainly the type of traffic. A CDN enterprise behaves completely differently than how a Cloud service company would behave and they both would be different from a corporation. This also means that attack vectors and attack landscapes are different in all these places. In this talk we speak about modeling different kinds of attacks and build a model that is able to identify these different kinds of attacks using ML. The method we use is to identify different profiles based on many variables that specifically but robustly identify attacks of different kinds. The variables are specific to business, network profile, traffic. The variables are also high-level i.e. aggregate, and packet-level. This way the models are specifically picking up on constant variations in traffic, and create machine learning models to identify these attacks. Using the power of H2O these analyses are not just limited to a research and analysis of the traffic and concluding with a “OH, this was what it was.” moment but to actually deploy code, besides existing IDS and IPS, or deploying highly optimized, independent programs that can handle high thruputs at the rate of 1.2 Million decisions per second making it one of the fastest implementations of ML to identify, defend and protect critical infrastructure that are potentially under threat.

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

MLconf

Matei zaharia, spark presentation m lconf 2013

MLconf

Lukas Biewald, MLconf

MLconf

Best Practices for Hyperparameter Optimization: All machine learning and artificial intelligence pipelines – from reinforcement agents to deep neural nets – have tunable hyperparameters. Optimizing these hyperparameters provides tremendous performance gains, but only if the optimization is done correctly. This presentation will discuss topics including selecting performance criteria, why you should always use cross validation, and choosing between state of the art optimization methods.

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

MLconf

Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification: Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System. In this dataset, Logistic Regression outperformed SVM with ROC index of 81.8 and 81.7 for models with and without graph metrics, respectively. SVM with a polynomial kernel had ROC index of 72.9 and 75.6 for models with and without graph metrics, respectively. Although this did not perform as well as Logistic Regression, the results are consistent with previous studies utilizing SVM to classify diabetes.

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

MLconf

Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud Prevention: PayPal is at the forefront of applying large scale graph processing and machine learning algorithms to keep fraudsters at bay. In this talk, I’ll present how advanced graph processing and machine learning algorithms such as Deep Learning and Gradient Boosting are applied at PayPal for fraud prevention. I’ll elaborate on specific challenges in applying large scale graph processing & machine technique to payment fraud prevention. I’ll explain how we employ sophisticated machine learning tools – open source and in-house developed. I will also present results from experiments conducted on a very large graph data set containing millions of edges and vertices.

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

MLconf

Viewers also liked (15)

Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017

Ashfaq Munshi, ML7 Fellow, Pepperdata

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...

Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017

Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017

Matei zaharia, spark presentation m lconf 2013

Lukas Biewald, MLconf

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017

Similar to Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/ Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Jason Anderson

In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models will also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text. We begin this talk with a discussion on text embedding spaces for modelling different types of relationships between items which makes them suitable for different IR tasks. Next, we present how topic-specific representations can be more effective than learning global embeddings. Finally, we conclude with an emphasis on dealing with rare terms and concepts for IR, and how embedding based approaches can be augmented with neural models for lexical matching for better retrieval performance. While our discussions are grounded in IR tasks, the findings and the insights covered during this talk should be generally applicable to other NLP and machine learning tasks.

Neural Models for Information Retrieval

Bhaskar Mitra

In this presentation, delivered by ABK Andreas Kollegger at QCon London 2024, the focus was on Connecting the Dots for Information Discovery. The classic RAG application extends an LLM with private information, able to fetch answers to questions that are contained in a single chunk of text. What if the answer requires connecting the dots across multiple chunks that may not be directly similar to the question? That is information discovery with GraphRAG. You'll learn how to: - reconstruct chunks into the original context - meaningfully connect disparate chunks - expand unstructured text data with structured data - combine all this into a RAG workflow

Connecting the Dots for Information Discovery.pdf

Neo4j

LLMs in Production: Tooling, Process, and Team Structure

Aggregage

Top 100 PHP Interview Questions and Answers

Vineet Kumar Saini

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Tekblink Jeeten

Open source Technology

Amardeep Vishwakarma

Neo4j: Data Engineering for RAG (retrieval augmented generation)

Neo4j

Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016) Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.

The Semantic Knowledge Graph

Trey Grainger

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

Lucidworks

Hala skafkeynote@conferencedata2021

hala Skaf

What is Node.js used for: The 2015 Node.js Overview Report

Gabor Nagy

lecture_34e.pptx

janibashashaik25

Sem tech 2011 v8

dallemang

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Codiax

Concepts of JetBrains MPS

Vaclav Pech

- Speaker: Hervé Vũ Roussel - CEO & Co-founder @ QuodAI - Vài nét về speaker: Hervé Vũ Roussel trước đây đã từng là CTO của một công ty phần mềm ở Silicon Valley Mỹ. Anh đã và đang là advisor và mentor cho nhiều tổ chức như IBM AI XPRIZE, PlatoHQ (YC'16), RMIT, AngelHack, ... Anh cũng là một trong các diễn giả thường xuyên cho chủ đề AI và Software engineer cũng như đã tư vấn cho nhiều trường đại học, công ty về các chương trình đào tạo khoa học máy tính và kỹ sư phần mềm. Hiện tại, Hervé đang là CEO của Quod AI, một nền tảng giúp giải thích source code bằng ngôn ngữ tự nhiên. Đến với talk lần này anh sẽ chia sẻ kinh nghiệm của mình trong việc thiết kế một kiến trúc chịu tải cao và dễ mở rộng (highly scalable architecture) cho các nền tảng AI bao gồm: - Những nguyên tắc nền tảng trong xây dựng kiến trúc phần mềm - Cách lựa chọn công nghệ lưu trữ dữ liệu - Xây dựng data pipelines bất đồng bộ

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Grokking VN

Get your organization’s feet wet with Semantic Web Technologies

André Torkveen

Jumpstart: Building Your First MongoDB App

MongoDB

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Vital.AI

Similar to Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017 (20)

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Neural Models for Information Retrieval

Connecting the Dots for Information Discovery.pdf

LLMs in Production: Tooling, Process, and Team Structure

Top 100 PHP Interview Questions and Answers

Top 100-php-interview-questions-and-answers-are-below-120816023558-phpapp01

Open source Technology

Neo4j: Data Engineering for RAG (retrieval augmented generation)

The Semantic Knowledge Graph

The Neural Search Frontier - Doug Turnbull, OpenSource Connections

Hala skafkeynote@conferencedata2021

What is Node.js used for: The 2015 Node.js Overview Report

lecture_34e.pptx

Sem tech 2011 v8

Wanda Fiat (Ars Analitica) – NLP Beyond Chatbots. Quick Solutions to Hard Pro...

Concepts of JetBrains MPS

Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...

Get your organization’s feet wet with Semantic Web Technologies

Jumpstart: Building Your First MongoDB App

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

More from MLconf

Understanding Human Impact: Social and Equity Assessments for AI Technologies Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies. In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

MLconf

The Brain’s Guide to Dealing with Context in Language Understanding Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity. In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

MLconf

Applying Computer Vision to Reduce Contamination in the Recycling Stream With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt. Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology. Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean. Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable. In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

MLconf

Quantum Computing: a Treasure Hunt, not a Gold Rush Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

MLconf

Data Labeling as Religious Experience One of the most common places to deploy a production machine learning systems is as a replacement for a legacy rules-based system that is having a hard time keeping up with new edge cases and requirements. I'll be walking through the process and tooling we used to help us design, train, and deploy a model to replace a set of static rules we had for handling invite spam at Slack, talk about what we learned, and discuss some problems to solve in order to make these migrations easier for everyone.

Josh Wills - Data Labeling as Religious Experience

MLconf

Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

MLconf

Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

MLconf

Optimized Image Classification on the Cheap In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier. To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?

Meghana Ravikumar - Optimized Image Classification on the Cheap

MLconf

The Importance of Modeling Data Collection Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have. In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set. My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.

Noam Finkelstein - The Importance of Modeling Data Collection

MLconf

The Uncanny Valley of ML Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.

June Andrews - The Uncanny Valley of ML

MLconf

Deep Learning Architectures for Semantic Relation Detection Tasks Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

MLconf

Building an Incrementally Trained, Local Taste Aware, Global Deep Learned Recommender System Model At Netflix, our main goal is to maximize our members’ enjoyment of the selected show by minimizing the amount of time it takes for them to find it. We try to achieve this goal by personalizing almost all the aspects of our product -- from what shows to recommend, to how to present these shows and construct their home-pages to what images to select per show, among many other things. Everything is recommendations for us and as an applied Machine Learning group, we spend our time building models for personalization that will eventually increase the joy and satisfaction of our members. In this talk we will primarily focus our attention on a) making a global deep learned recommender model that is regional tastes and popularity aware and b) adapting this model to changing taste preferences as well as dynamic catalog availability. We will first go through some standard recommender system models that use Matrix Factorization and Topic Models and then compare and contrast them with more powerful and higher capacity deep learning based models such as sequence models that use recurrent neural networks. We will show what it entails to build a global model that is aware of regional taste preferences and catalog availability. We will show how models that are built on simple Maximum Likelihood principle fail to do that. We will then describe one solution that we have employed in order to enable the global deep learned models to focus their attention on capturing regional taste preferences and changing catalog.In the latter half of the talk, we will discuss how we do incremental learning of deep learned recommender system models. Why do we need to do that ? Everything changes with time. Users’ tastes change with time. What’s available on Netflix and what’s popular also change over time. Therefore, updating or improving recommendation systems over time is necessary to bring more joy to users. In addition to how we apply incremental learning, we will discuss some of the challenges we face involving large-scale data preparation, infrastructure setup for incremental model training as well as pipeline scheduling. The incremental training enables us to serve fresher models trained on fresher and larger amounts of data. This helps our recommender system to nicely and quickly adapt to catalog and users’ taste changes, and improve overall performance.

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI World The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

Vito Ostuni - The Voice: New Challenges in a Zero UI World

MLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

MLconf

Neel Sundaresan - Teaching a machine to code

MLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

MLconf

Soumith Chintala - Increasing the Impact of AI Through Better Software

MLconf

Roy Lowrance - Predicting Bond Prices: Regime Changes

MLconf

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

Partners Life - Insurer Innovation Award 2024

The Digital Insurer

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

A Principled Technologies deployment guide Conclusion Deploying VMware Cloud Foundation 5.1 on next gen Dell PowerEdge servers brings together critical virtualization capabilities and high-performing hardware infrastructure. Relying on our hands-on experience, this deployment guide offers a comprehensive roadmap that can guide your organization through the seamless integration of advanced VMware cloud solutions with the performance and reliability of Dell PowerEdge servers. In addition to the deployment efficiency, the Cloud Foundation 5.1 and PowerEdge solution delivered strong performance while running a MySQL database workload. By leveraging VMware Cloud Foundation 5.1 and PowerEdge servers, you could help your organization embrace cloud computing with confidence, potentially unlocking a new level of agility, scalability, and efficiency in your data center operations.

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Principled Technologies

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Manulife - Insurer Innovation Award 2024

The Digital Insurer

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Strategies for Landing an Oracle DBA Job as a Fresher

A Year of the Servo Reboot: Where Are We Now?

Partners Life - Insurer Innovation Award 2024

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Artificial Intelligence: Facts and Myths

GenAI Risks & Security Meetup 01052024.pdf

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Data Cloud, More than a CDP by Matt Robison

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

How to Troubleshoot Apps for the Modern Connected Worker

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Boost PC performance: How more available memory can improve productivity

Manulife - Insurer Innovation Award 2024

AWS Community Day CPH - Three problems of Terraform

Exploring the Future Potential of AI-Enabled Smartphone Processors

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

1. REPRESENTATION LEARNING @ RED HAT Michael A. Alcorn (malcorn@redhat.com) Machine Learning Engineer - Information Retrieval https://sites.google.com/view/michaelaalcorn/ 1

2. Outline Background word2vec/url2vec doc2vec/account2vec Duplicate Detection (batter|pitcher)2vec MLconf Blog 2

3. Background Why? Small amount (zero?) of labeled data for task Lots of unlabeled data (labeled data for a different task?) Can we use large amounts of unlabeled data to make better predictions? Not the same as traditional unsupervised learning! in Goodfellow et al.'s Deep Learning textbook by Bengio et al. Representation learning Transfer learning Excellent chapter Article 3

4. word2vec ew TextTextTextText NVIDIA - " "Introduction to Neural Machine Translation with GPUs (Part 2) 4

5. word2vec ew Deeplearning4j - " " Mikolov et al. (2013) Word2vec 5

6. word2vec Analogies "x is to y as ? is to z" x - y + z = ? bash - shellshock + heartbleed = openssl ﬁrefox - linux + windows = internet_explorer openshift - cloud + storage = gluster rhn_register - rhn + rhsm = subscription- manager =+— 6

7. Naming Colors mapping RGB values to color names Results are pretty underwhelming for those in the know Can word embeddings improve ( )? Blog post by Janelle Shane GitHub 7

8. url2vec Tasks concerning URLs Search - returning relevant content Troubleshooting - recommending related articles Obvious method - look at text Alternative/enhanced method - use customer browsing behavior as additional contextual clues 8

9. url2vec How? Treat each day of browsing activity as a "sentence" Treat each URL as a "word" Run word2vec! 9

10. url2vec https://access.redhat.com/solutions/25190 https://access.redhat.com/solutions/10107 Application: ScatterPlot3D 10

11. doc2vec " " Le and Mikolov (2014) NLP 05: From Word2vec to Doc2vec: a simple example with Gensim 11

12. customer2vec Why? Data-driven segmentation Same idea as url2vec except now we treat each account as a "document" of many "sentences" (different browsing days) 12

13. customer2vec Why? Data-driven segmentation Same idea as url2vec except now we treat each account as a "document" of many "sentences" (different browsing days) 13

14. customer2vec 14

15. Duplicate Detection There are a number of "duplicate" KCS solutions on the Customer Portal Muddy search results How can we identify candidate duplicate documents? Obvious approach - compare text (e.g., tf-idf) Bag-of-words loses any structural meaning behind text Can we learn better representations? Title is essentially a summary of the solution content Learn representations of body that are similar to title representations (like the DSSM; )my code 15

16. Deep Semantic Similarity Model Jianfeng Gao - " "Deep Learning for Web Search and Natural Language Processing 16

17. (batter|pitcher)2vec ( )GitHub Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players 17

18. Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players (batter|pitcher)2vec ( )GitHub 18

19. Can we learn meaningful representations of MLB players? Accurate representations could be used to simulate games and inform trades Find undervalued/overvalued players SI.com NBCSports.com =+— LR (batter|pitcher)2vec ( )GitHub 19

20. (batter|pitcher)2vec " " Learning to Coach Football Wang and Zemel (2016) 20

21. THANK YOU! 21

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

Similar to Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017

Similar to Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017 (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017