AI in multi billion search engines. Building AI and Search teams

www.globalbigdataconference.com
Twitter : @bigdataconf

AI in multi billion search
engines. Building AI and
Search teams

My goal for this talk is to explain how
1. AI/ML in Search Engines improves customer experience, revenue/GMV,
operational costs, helps to get more customers and serve them better
2. Build AI systems,for search
3. To build global AI teams and how to make them successful

My message
Successful AI in Search is an AI infrastructure, engineering and science
culture and toolsets enabling to continuously introduce, measure, improve
AI features in every part of the search engine rather than several SOTA
models in ranking or query understanding
The same applies to every large scale consumer or business facing platform
(recommendation engines, call center analytics etc)
Development of such a type of AI solutions implies certain requirements on
teams which are successful in building large scale consumer facing AI
software systems

Multi billion? (customers, dollars, documents )
We focus on search engines with many billion dollars revenue/GMV, billions of
users (or hundreds of millions), billions of documents which justify investment in
building AI infrastructure because it improves revenues by hundreds
millions/billions of dollars or saves infrastructure costs on large scale - justifying
developing many AI applications for search

Typical Search Engine - High Level View

Search Engine - high level view
Many other *key* parts are not in the picture
● Experimentation and other framework to support ‘Search Science’ design,
analysis, debugging, deployment, of various query understanding, ranking etc
models
● Evaluation - continuous monitoring quality of results and analysis of users
behaviour
● Loging, monitoring, alerting (to serve all others, feed consumer behavior
systems, and react to operational problems)
etc

AI is everywhere
Using AI components to improve data acquisition by 10%, indexing by 10%, query
understanding by 10%, ranking by 10%, result page by 10% gives more gains in
customer satisfaction and revenue/GMB than applying SOTA and improving only
one components such as query understanding or ranking by 30%.
The AI development should be driven by creating infrastructure, culture and
toolsets for continuous AI deployment, improvement, measurement everywhere in
the search stack. There are no ‘engineering’ teams, every team is an AI team

AI is not separable from engineering
AI development is not separable from other engineering developments.
Improvement of index selection by 10% lets either accommodate other data
sources to improve coverage (by getting more documents) or to improve ranking
(by getting more computation and using it for more advanced functions since
fewer documents to rank)
Improvement of infrastructure to decrease latency by 10% lets to deploy more
sophisticated ranking or query understanding functions (10% more time)
Good engineering quality and culture is not separable from AI development but a
mandatory part of AI culture

AI ‘rank labs’ and AI platforms
Search engine teams benefit if there is an unified environment to train, deploy,
serve models - by reducing work on infrastructure, MLOps, sharing metrics,
making easy to measure end to end metrics
There is no an environment which will handle all types of AI development, AI
serving, AI measurement.
Search Systems are naturally complex by different tasks, different environments,
different languages, CI/CD systems but never ending work on unifying AI
development infrastructures across search teams helps

Multiple ways to ‘deploy’ AI models
Deploy models to TF Serving, TorchServe etc
Deploy models served in container
Compile a model directly into a machine code or as a source code of search
component (GBDT models into c++/java code to be used in ranking)
Relearn and change parameters of existing models served
Tons of other deployment scenarios
Etc etc etc

Multiple ways to serve AI models in Search
Streaming (ex: document updates)
Batch (ex: offline processing of queries or users or documents)
Serving runtime services (ranking, query understanding)

Multiple ways to improve AI in Search
Change evaluation methods and metrics, train models to new metrics
Change sampling procedures
Change training procedures
Change modeling techniques
Model previously unmodeled tasks
New data and new features
Infrastructure changes in serving etc etc etc

AI platforms
So, it’s almost impossible to make one platform to handle all types of AI
development and deployment (see variety in previous slides)
But, unification of some of tasks reduces development and operation efforts and
costs, increase velocity of AI development bringing a lot of money and customer
satisfaction
Every AI driven Search company created “Rank Lab” AI platform for Search
Now , there is an open source such as KubeFlow, MLFlow, to simplify
developments

AI services in prod
Good if they are decoupled, so multiple small teams can work on services
independently
But wiring is needed (a ‘signal’ from query understanding to be used in ranking
etc)
Processing of a query may call ~dozens of AI services, processing of a document
in data acquisition and index may call 100s of AI services. Performance
considerations are extremely important
AI infrastructure benefits greatly from common software practices, protocols,
orchestration to organize this ‘AI chaos’ and make order out of it

AI in prod
Besides ML objective functions and metrics
a. Latency
b. Resiliency
c. Throughput
d. Resource utilization
Are super important factors in design of every AI services at the search engine
stack
AI service development should be tested and benchmark against them
Your model will serve billions of document updates or billions of queries, every
1ms delay, 1 ms downtime, etc will cost either bad user experience or millions in
ops

AI in Search
Several use cases to demonstrate that AI is
driving search in every part of search stack

Indexing - index selection
Perhaps, one of the first of AI applications to the search domains. Started in 90s
when web volume increased and index selection strategy started to be important
Which documents should be indexed? In which index layers they should be
placed? (many modern search engines are multi level, smaller index for frequently
searched items, a very big and comprehensive index for rarely search items)
AI for quality, popularity assessment of ‘documents’

Indexing - duplicate resolution
Which ‘documents’ are essentially the same (represent the same item)? Or have
highly duplicated content, so the second document does not carry more
information?
Which documents are the same wrt to a particular query? (we do not want to show
collocated Target and Target Pharmacy for local search query target but they are
different entities for query pharmacy)

Indexing - attributes extraction
Given documents - full text descriptions of houses, ecommerce products,
businesses etc - extract significant attributes important for search to understand
items of interest
(size, wheel size, weight, number of pages, location, view)

Index - statistical tasks
Evaluation of quality and size of the index.
Is our index provides good coverage? What categories are missing? What data
quality problems?
Evaluation of Index size of external systems

Index - data quality
What attributes are important and must be mandatory and which attributes can be
optional in data acquisition? Which types of data, which categories to acquire?
AI processes continuously looking into search logs to decide customer priorities,
what drives conversion and using this information to drive data acquisition
Also detection of spam, fraud, adult content.

Demand generation beyond just data
The same type of AI evaluation procedures to compute and forecast future
demand of items, to drive purchase decision if search engine is used to sell items

AI for query understanding
Query understanding is mapping of customer’s query into a machine
understandable format to retrieve a set of relevant items and rank them with
highest probability of customer engagement (view, purchase, etc)
Synonym expansion for better retrieval, removal of insignificant terms, correcting
spelling and other errors, term weighting, attribute and entity extraction, compound
and phrase extraction, classification (novelty, price range etc ) etc

Query understanding Classification
Mapping a query into a certain set of categories to be used in retrieval and ranking
-> most probable document category (italian -> restaurants in local search),
-> most probable distance (gas -> 5 miles distance, micheline restaurant -> 50
miles distance local search)
-> novelty: printers -> released within 1 years, pillows -> release date does not
matter
Typically: 100s classifiers per search engine with significant impact on quality /
revenue

Query understanding Similar Queries
Given queries q1 - q2 how similar they are (how results for one query will be good
as results for the other query)
Tons of applications in query understanding and ranking: given features for one
query, apply them to another query for ranking, extend retrieval set etc etc

Query understanding: entity and attribute extraction
Given a query: map it into structured representation of entities and attributes to be
used for better retrieval and ranking

AI for ranking
Learning to rank / Machine Learning based ranking technologies to rank document
(LeToR/ MLR)
AI for unbiased ranking
User interactions based LeToR/Counterfactual

AI for search assistance
Typeahead prediction - language modeling, other contextual information, location
of user, previous searches of users,
Query dependent , user dependent navigational panels and guided search

AI Whole page
Given an output of several search engines: how to combine them to construct the
best customer experience.
Ex: music, video, book, podcasts as in iTunes;
web search, maps, youtube, news, image, books, scholar etc in Google

AI SERP snippet
How to generate the best descriptions of items in the search result page so
customers understand relevance of items without clicking on them
How to select the best chunk of text representing the item, picture, formats, -
depending on the query and the user

AI price prediction
Predict the price of the item (for selling search engines)
Which will maximize item conversion and customer satisfaction and revenue of the
company
(economics problems, but tightly connected to search, depends on item position in
search, relevance, exposure, prices of other search results)

AI conversational search
Conversational interfaces for search, multi turn interactions with customers to
understand customer search intent and help her/him to express their intent or
even to find it by making latent intent explicit
NLP/NLU, dialog state management, deep reinforcement learning, text generation
ASR for voice based systems

Post search AI
Given a set of queries relevant to user (saved queries, previous sessions) and a
set of items relevant to users
Generate email and other notifications about new items, price changes, availability
changes - which will help users to find/buy/discover what they want

Building Search Teams
Running Search Engines which are front face of businesses, for example, real
estate (Zillow, Trulia), eCommerce (Walmart, eBay), is different and similar from
running search engines such as Google Web, I’ll focus mostly on search engines
which are front faces of business

Do you need a search team?
Some companies buy Search SaaS service, some companies ask consultants to
build a search engine for them.
It might work when the search is not core of your business, your customers
satisfaction, profits, core of your business depends on it. Say, search of web
forum threads on your web site, or any other non-critical search
When multi billion business and satisfaction of hundreds millions of users depend
on your search engines and may significantly improve revenue and customer
satisfaction numbers by improving your search engine, the only way - to control
and own the search engine and have your organization owning and developing it

Do you need a search team?
“We believe that we need to own and control the primary technologies behind the
products we make,” was added to Apple values when Tim Cook become the CEO
of Apple
Quite a good value and proven to be a very efficient value. This value is efficient is
any other business besides Apple
If the search engine is the primary technology behind your business/products, the
only way to operate it is to own and control it

Typical roles in the Search team
There are multiple critical roles in Search. I’ll describe some of them. There are
more roles. The exact composition of your search team depends on your
business: some teams are more AI ranking heavy where the search ranking is the
most business critical, some teams are more backend engineering heavy when
your business success depends on integration with many other systems
(availability, pricing)
Success factor: all roles must be ‘AI aware’ and know how AI works to make the
whole team successful

Skills - search infrastructure
Building search engine you’ll build it either on top of existing open source such as
Solr or ElasticSearch, or you will build it from scratch.
You’ll need experts in the technology you use, since you can improve
performance, operation costs, resilience, etc only if your team knows this
technology deeply
Performance is important characteristic of search engine, you’ll need experts in
technologies used (for example, Java backend engineers for Solr which is based
in Java or experts in building high load distributed systems in c++ if you implement
your engine in c++ )

Skills - search infrastructure
Depending on scale of your operations a lot of search operations and performance
improvements may in particular algorithmic improvements, for example, more
efficient index structures, or more efficient fuzzy match algorithms.
You’ll need experts in algorithms in those particular areas

Skills - Search Quality aka Search Science
Search quality consist of multiple subcomponents. Natural language processing
for query understanding, Machine learned based ranking, and other (index
selection etc- depends on your business)
You’ll need good Machine Learning engineers and applied scientists. Better with
background in IR, Search, LTR, NLP (distribution depends on your business). All
of them are well established long term areas and there are people who are experts
already. But typically, a good generic ML catches up quite fast. But you need a set
of people who are deep experts in search quality to be core of your team to drive
other people

Skills - Search Quality aka Search Science
What I noticed abstract Applied Scientist who know only how to train model do not
work well in search
You need more MLE type profile, good in Science, but capable to build and
improve efficient system. A lot of search development is not about mining new
features, training new models, but about building new components

Skills - Operations / SRE
Do not forget - typically, the search engine is a core of your business.
It goes down, customers can not find what they want - they go to competitors , it
costs a lot
Having good SRE/ops teamwho can operate a high load, complex distributed
system with multiple dependencies on other systems is indispensable for your
search engine operations

Skills - operations / SRE
There are multiple dimensions of complexity.
Your scientist, search quality, search infrastructure people will be continuously
improving your search engine from performance, efficiency, search quality, other
points of view.
You need to build devops and operations team, who can support such complexity

Skills - UX
Everybody talks about LTR and Query understanding, but satisfaction of your
customers and revenues of your business depends a lot on UX
I saw surprisingly trivial UX changes which caused huge conversion/revenue gains
You need designers who know how to build Search UX
But these designers must be data driven, understanding how to run UX A/B or
other experiments and how to interpret their results.

Skills - UX
Search pages are necessary complex. There are many search results, there is a
lot information about search result in every snippet, there are other interaction
elements (filters, maps)
Any UX performance problems causes lost customer satisfaction and revenues
‘Full stack’ engineer who know how to build stack - from the search engine
API/query language to the final efficient rendering of pages,

Skills - Product managers
There are multiple different roles product managers have in search engines
development - perhaps, even deserve different titles
1 getting a continuous stream, of feature requests from businesses. Working with
data and business leadership to understand if business truly needs these feature
or not. Frequently, businesses are disconnected from consumer, behavior and
other data and may have not a right assessment of important of certain feature.
Good PMs create a good connection between business and engineering.
Sometime giving higher priority to feature request, some time proving that it should
not be implemented (engineering and other costs do not justify business gain or
actually the feature may cause negative business results -it’s not obvious without
data)

Skills - Product managers
2 building search metrics, which will reflect true interests of business but which
will be implementable and pursuable for engineering team. It’s not enough to say
we want to have higher revenue/profit, higher CSAT score etc, many other metrics
may serve customers and business, be understandable and usable by business
and be useful to train ML models
3 design a roadmap, which will improve search metrics, but taking into account
tons of constraints in development from efficiency, data and other
Being a good PMs search - requires deep technical skills, and business
understanding, and communications with both and more sides

Skills - statisticians/Data Scientists
Design and analysis of search experiments, getting insights from search
experiments,
Analysing metrics and connecting them with customer experience and business.
Analyzing customer behaviour, getting insights, what’s right or wrong in the
search,

Skill - Data Engineering
A typical search engine produces billions of customer based events (search, click
on result page, refinement, map view etc) per day.
It consumes billions of other events (update of a web page or other source of
information) per day.
A typical search engine lives on petabytes of data per day streams and they
processing is crucial both for operations and improvements (model training etc)

Running a search team
Invest heavily in continuous training and professional development in every
stream. Each area is actively developing and there is huge margin between good
and better in performance and impact on your business in every workstream. All
education efforts pay back well
Invest heavily in good collaboration culture, visibility, alignments, team
connections - create a clique of connections. In Search, everyone may
surprisingly affect performance of anyone else (or hurt) and may contribute a lot to
your business. Most areas are heavily interdependent. High visibility/alignment
within the whole org helps launch bigger impact features/products with better
quality. People are more happy when they know all details what they are doing.

Invest heavily in an engineering culture, search engines are very complex systems
(at certain moment Google was biggest system by lines of code, I believe) and
such complex systems can not function well without high quality of engineering at
every step. People are more happy when they produce high quality stuff
Invest heavily in experimentation infrastructure (everybody knows about it) and
experimentation culture (little known) - available to everybody. Businesses got
huge gains from search experiments run by PM and even business owners, rather
than by scientists only. But it’s a culture and education across whole org, not
limited to engineering

Invest in high visibility of work of a search team by other team, stakeholders,
business owners. Search has huge impact on business. But due to its natural
complexity, its impact is not always fully understandable and visible by non
engineering. Visibility affects prioritization, resource allocation, many other things.
Important to have high visibility of what happens in search, what results it brought,
how it works to anybody else in the organization

Addendum
What makes a good Search Engineer

What makes a good search engineer
This part of the presentation is about what are qualities of a good search engineer
and how to build career in AI/Search
1. How to be successful in your search projects and what makes you a good
search engineer
2. How to be successful in a long term career building

Qualities of a good Search engineer
Required Knowledge for long term success in search (to be able to delivery
multiple company level impact successful projects):
1. Machine Learning, new models, new features,
2. Engineering, implementing software solutions with performance, quality, etc
requirements
3. Metrics / Customer, transforming customer experience into metrics which can
be used for ML training, experiments/analysis
4. Statistics, design and analysis of experiments
5. Business, understanding business, how to transform business development
into metrics/OKRs, and consequentually into new search features, new
search products

Many search features require changes in many parts of search stack: indexing,
ranking, query understanding, evaluation setups
Requires collaboration with many different teams: engineering, MLE, research,
statisticians.
Ability to collaborate at large scale with multiple diverse teams: communications,
document writing, project organization at multiple levels from coding to project
management to product management

Sometime, search development work requires long time a person / a small team
efforts, where help from management or from colleagues will not change much
Require ability to have long term focus and be able to work in an isolated result
focused environment (PhD style work), result focused environment

Ability to work on long term projects with no guaranteed outcomes
Many search projects are focused on improving certain customer satisfaction
metrics, (the number of local results, the number of new relevant results etc etc),
improving the model, feature set, something else.
Frequently, there is no guarantee that it’s achievable. Some search projects
require work with multiple unsuccessful tries before finding a good solution
Requires certain persistence to go through failure to failure before finding a
successful solution

Qualities of good Search engineer
Understanding the customer, and skills of transforming understanding the
customer needs into into actionable metrics
A lot of search development is not about continuous improvement of one
relevance, query understanding, index size etc metric, but about discovering and
understanding of various aspects of customer satisfaction and transforming this
understanding into new metrics, which can be used for training models,
measurement and improvements of the search

Continuous awareness of new developments in many areas of
ML/IR/NLP/statistics which can be used to improve search
Continuous professional development, learning, reading, in machine
learning/AI/NLP/IR, engineering/programming, and other professionals skills

Success of many big projects and initiatives depends on collaboration with
multiple teams from other technology teams to business departments (legal,
marketing, etc)
Ability to find a support and convince people with very different points of view
about importance, criteria of success, impact of technology projects
and
Ability to listen to feedback and proposals of very different people from business to
tech, objectively understand it and incorporate it into technology development

Qualities of a good Search Engineer
Engineering part is super important and frequently underestimated in many
articles and books. Only small part of the search development is a training of new
models. The other part is development of new product features, building
infrastructure to serve models, etc software engineering is a part of the job.
Search engines has strict performance limits, search engine is a face of your
business. It’s down, business is down. Quality engineering.
Skills how to write good, quality, performance code, how to test it, tune it,
document it, etc is crucial part of search engineering success.

Long term career success as a Search engineer
Reputation is the number 1 success criteria of a long term career success.
Reputation of you as an engineer, MLE, leader, collaborator. Reputation of you,
teams you built, etc
Reputation among engineering teams, business teams, your peers, partners and
you leadership.
Reputation based on different qualities from building large scale systems to
success in ML projects to understanding business needs and transforming them
into engineering products
First 15 years of career is focus on building of a reputation

Long term career success
Select only jobs which truly suits your
Next job offer: analyze the company: values, culture, technology area, business
vision - is it what you want?
Very important for the first job after college, PhD you get etc - good initial fit is
crucial
Assess companies, will you relate to its business, culture and people?
What you learn there will define your career for several decades
Do systematic assessment of every job offer -- but especially the first job after
college, PhD one is very important

The best job is a job with a company that suits you
When you select next step, be sure that company culture, values, product,
engineering fits you, your development goals, your values. Do not move because
of popular technology, a big title, sudden unexpected salary increase, hype, and
other accidental to your long term career reasons

Focus on development of long term professional relationships
Develop diverse base of meaningful work connections, with colleagues from
different technology departments, different lines of business, marketing, legal,
recruiters etc based on joint work and your reputation as your work with them

Within your company, Move to more strategic projects with big impact on the
company business
Strategic projects - More opportunities for career development, more meaningful
work connections, more things to learn for long term career goals , typically more
interesting technologies, more to learn about business, technology, customer,
more opportunity for self development, more skills, more knowledge

More to more strategic and bigger impact contributions in your area of work
First job - develop models, develop software features as requested by mentor,
manager
Move from individual projects to team projects, from coding and model training to
defining vision, strategy, roadmap, execution, building teams
In *every* role and project, widen your scope, do more challenging tasks, bigger
impact on the company business

Do not complaint, Make changes
It applies to code, technology, org structure, culture, relationships, products,
anything you believe can be improved
Do not just complain about things going wrong. Fix them whenever possible. By
coding, writing documentation, making people aware about wrong things and
proposing solutions, at every level of your career, you can make bigger changes
than you are expected at this step of career. Bring changes rather than whine.
Even if a problem is well above your role, propose solutions, notify relevant
people, bring value to solve it, rather than just complain.

Continuous professional development is crucial at every step of the career
Every year ask yourself questions,
over last 12 months
1. How much I learned about the technologies, the products, the services, the
markets? What part of this knowledge is relevant to my work? How much did
it help to improve my performance (performance of my team)
2. How many new people have I gotten to know at work? How diverse is this
people set? How many people have I improved relationships with?
-

Continuous professional development is crucial at every step of the career
Over last 12 months
1. What new results, accomplishments have i achieved? What have I launched,
improved? How much does it add to my reputation? Track record?
2. What new skills have i developed? Am I better in communications?
Technology? Analytics skills? Judgement? In which areas?
How can I do it better next year? What should I improve? How to apply these new
skills, relationship, knowledge?

AI in multi billion search engines. Building AI and Search teams

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI in multi billion search engines. Building AI and Search teams

Similar to AI in multi billion search engines. Building AI and Search teams (20)

Recently uploaded

Recently uploaded (20)

AI in multi billion search engines. Building AI and Search teams