SlideShare a Scribd company logo
1 of 18
Download to read offline
www.karakun.com
Bringing AI to SME projects:
Addressing customer needs with a
flexible set of tools and services
Holger Keibel
Elisabeth Maier
AI-SDV 2020
2
Background
• Karakun AG (Basel, 50 employees)
• Builds custom software where no standard
solution exists on the market
• Uses open-source components where possible
• Offers software platforms
to boost development efficiency, e.g.
HIBU platform offering pre-built functionalities
for solutions around Enterprise Search,
Language Analytics, and AI
3
Our customers’ most frequent AI needs
Text classification
• Assign categories
to texts
• Predefined set of
categories
Information extraction
• Identify within a text
relevant pieces of
information
• Entities, keywords,
values etc.
Topic identification
• Assign a label to a
text, summarizing its
main topic
• Generally use terms
found in the text
4
Built-in classifiers & extractors
5
Custom classifiers & extractors
• Fine-tune built-in classifier/extractor to customer’s domain
• Extend built-in classifier/extractor by additional categories/information types
• Create new classifier/extractor for custom set of categories/information types
• Assign editorial content to newsletters
• E-mail triage
• Recognize tax-relevant documents / specific contract types / ..
• Recognize country-specific payment slips and extract relevant data
• …
6
(Supervised) Learning
• Statistical: SVMs, Naive Bayes,
decision trees
• Neural networks (deep learning)
AI choices for custom classifiers/extractors
Rule-based
• Regular expressions
• Ontologies / terminologies
7
Cost factors and quality aspects
Rule-based Supervised learning
Required data volume low high
Required data quality rather low high
Initial ramp-up costs rather high rather high
Maintenance costs high moderate
Costs of scaling system to new
domains, applications and languages
(→ time to market)
high moderate
Sensitive to context low high
Recall (→ false negatives) low (1) high
8
Training data for supervised learning
Rule of thumb (up until recently):
To train a document classifier with N target categories,
it requires training documents in the order of 10,000*N.
→ For SMEs, suitable and sufficient training data are …
• In general: not readily available
• Costly to procure
• Investments generally don’t pay off for SMEs’ business cases
9
Examples from previous projects
Classification task Training data
Assign editorial content to
newsletters (finance)
Large number readily available:
all articles from past newsletters
Extract key data from invoices
None available;
Generation of synthetic data not suitable here
Detect whether a message
talks about adverse effects of a
medication
Hardly any existed;
Collected some by web search (medications & known adverse effects);
But highly biased: missing unknown adverse effects
10
Customer project by DSwiss:
Encrypted digital safes
• Users can upload any type of document
• Classifier and extractors used for
search filters
• Frequently need to extend to
new categories and languages
• But:
• Classifier is rule-based
• Difficult to obtain large amount of
suitable training data
11
Our approach in previous projects
• Assess classification/extraction task
• Inspect relevant data that are readily available
• Do our built-in classifiers/extractors suffice?
• If new classifier/extractor is needed, consider all approaches:
Rule-based:
Sometimes the best
choice
Statistical:
Often good choice if
decent amount of
training data available
and features can be
engineered efficiently
Neural:
In practice rarely used
in specific customer
projects – not enough
data to get advantage
over statistical
12
A true game-changer:
Pre-trained language models
• Based on the Transformer architecture (e.g. BERT, GPT-2/GPT-3)
• Pre-training model with prediction tasks
• On massive data
• Using only plain text → self-supervised learning
• Build up rich contextualized representations of words
vs. non-contextual word embeddings (word2vec and GloVe)
• Fine-tuning model to a target task
• Transfer learning
13
Pre-training Fine-tuning
Encoder
Encoder
Encoder
…
Embedding
cls w1 w2 wN…
Output layer
BERT
Encoder
Encoder
Encoder
…
Embedding
cls w1 w2 wN…
Output layer for target task
BERT
14
Acceptable performance level
Fine-tuning: much less data needed
Training samples (log)
Fine-tuning pre-trained system
Training a network from scratch
Performance after training
15
Advantages of using BERT
• Re-use architecture and trained model
• Only need to replace output layer to task-specific layer
• Significantly fewer training data needed
→ For SMEs, suitable and sufficient training data are …
• In many cases feasible to procure
• Investments do pay off for SMEs’ business cases
16
Pre-trained
moderate
moderate
rather high ??
moderate
moderate
high
high
Cost factors and quality aspects
Rule-based Supervised learning
Required data volume low high
Required data quality rather low high
Initial ramp-up costs rather high rather high
Maintenance costs high moderate
Costs of scaling system to new
domains, applications and languages
(→ time to market)
high moderate
Sensitive to context low high
Recall (→ false negatives) low (1) high
17
Joint research project
• Partners: SUPSI (Lugano) and DSwiss (Zurich)
• Co-funded by Innosuisse
Goals:
• Create core classifiers and extractors by fine-tuning BERT
• Increase coverage of document types
• Improve performance
• Extend to new tasks, e.g.
• Extract data from invoices
• Extract data from ID cards
18
Thank you!
hibu-platform.com
karakun.com

More Related Content

What's hot

ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni MadridICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
Dr. Haxel Consult
 
Modelling Customer Lifetime Revenue for Subscription Business
Modelling Customer Lifetime Revenue for Subscription BusinessModelling Customer Lifetime Revenue for Subscription Business
Modelling Customer Lifetime Revenue for Subscription Business
Databricks
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 

What's hot (20)

AI-SDV 2021 Biomax
AI-SDV 2021 BiomaxAI-SDV 2021 Biomax
AI-SDV 2021 Biomax
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...
 
II-SDV 2016 VantagePoint
II-SDV 2016 VantagePointII-SDV 2016 VantagePoint
II-SDV 2016 VantagePoint
 
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution fr...
 
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni MadridICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
ICIC 2013 Conference Proceedings Ricardo Eito Brun Uni Madrid
 
Relecura - Features Overview
Relecura - Features OverviewRelecura - Features Overview
Relecura - Features Overview
 
IC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePointIC-SDV 2018: Search Technology / VanatagePoint
IC-SDV 2018: Search Technology / VanatagePoint
 
Modelling Customer Lifetime Revenue for Subscription Business
Modelling Customer Lifetime Revenue for Subscription BusinessModelling Customer Lifetime Revenue for Subscription Business
Modelling Customer Lifetime Revenue for Subscription Business
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataII-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
 
II-DV 2017: Averbis
II-DV 2017: AverbisII-DV 2017: Averbis
II-DV 2017: Averbis
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
 
Data mining tools used in business intelligence
Data mining tools used in business intelligenceData mining tools used in business intelligence
Data mining tools used in business intelligence
 
II-SDV 2017: Centredoc
II-SDV 2017: CentredocII-SDV 2017: Centredoc
II-SDV 2017: Centredoc
 
ViewPorter® Louis™ Machine Learning
ViewPorter® Louis™ Machine LearningViewPorter® Louis™ Machine Learning
ViewPorter® Louis™ Machine Learning
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 

Similar to AI-SDV 2020: Bringing AI to SME projects: Addressing customer needs with a flexible set of tools and services Holger Keibel (Karakun, Switzerland) Elisabeth Maier (Karakun, Switzerland)

Ramesh_CV_4_Years_Experience
Ramesh_CV_4_Years_ExperienceRamesh_CV_4_Years_Experience
Ramesh_CV_4_Years_Experience
Ramesh Thadivada
 
Pradeep vemula_5.1_ MF Resume
Pradeep vemula_5.1_ MF ResumePradeep vemula_5.1_ MF Resume
Pradeep vemula_5.1_ MF Resume
pradeep vemula
 
GouriShankar_Informatica
GouriShankar_InformaticaGouriShankar_Informatica
GouriShankar_Informatica
Gouri Shankar M
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
DataWorks Summit
 
Vishwanath_M_CV_NL
Vishwanath_M_CV_NLVishwanath_M_CV_NL
Vishwanath_M_CV_NL
Vishwanath M
 

Similar to AI-SDV 2020: Bringing AI to SME projects: Addressing customer needs with a flexible set of tools and services Holger Keibel (Karakun, Switzerland) Elisabeth Maier (Karakun, Switzerland) (20)

Ramesh_CV_4_Years_Experience
Ramesh_CV_4_Years_ExperienceRamesh_CV_4_Years_Experience
Ramesh_CV_4_Years_Experience
 
Pradeep vemula_5.1_ MF Resume
Pradeep vemula_5.1_ MF ResumePradeep vemula_5.1_ MF Resume
Pradeep vemula_5.1_ MF Resume
 
Unlock your core business assets for the hybrid cloud with addi webinar dec...
Unlock your core business assets for the hybrid cloud with addi   webinar dec...Unlock your core business assets for the hybrid cloud with addi   webinar dec...
Unlock your core business assets for the hybrid cloud with addi webinar dec...
 
GouriShankar_Informatica
GouriShankar_InformaticaGouriShankar_Informatica
GouriShankar_Informatica
 
IT Profile CANVAS Presentation
IT Profile CANVAS PresentationIT Profile CANVAS Presentation
IT Profile CANVAS Presentation
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Rohit Resume
Rohit ResumeRohit Resume
Rohit Resume
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Vishwanath_M_CV_NL
Vishwanath_M_CV_NLVishwanath_M_CV_NL
Vishwanath_M_CV_NL
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
WebXpress Business Intelligence Capability
WebXpress Business Intelligence CapabilityWebXpress Business Intelligence Capability
WebXpress Business Intelligence Capability
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...
 
APEKSHA_SHRIVASTAVA
APEKSHA_SHRIVASTAVAAPEKSHA_SHRIVASTAVA
APEKSHA_SHRIVASTAVA
 

More from Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
nilamkumrai
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 

AI-SDV 2020: Bringing AI to SME projects: Addressing customer needs with a flexible set of tools and services Holger Keibel (Karakun, Switzerland) Elisabeth Maier (Karakun, Switzerland)

  • 1. www.karakun.com Bringing AI to SME projects: Addressing customer needs with a flexible set of tools and services Holger Keibel Elisabeth Maier AI-SDV 2020
  • 2. 2 Background • Karakun AG (Basel, 50 employees) • Builds custom software where no standard solution exists on the market • Uses open-source components where possible • Offers software platforms to boost development efficiency, e.g. HIBU platform offering pre-built functionalities for solutions around Enterprise Search, Language Analytics, and AI
  • 3. 3 Our customers’ most frequent AI needs Text classification • Assign categories to texts • Predefined set of categories Information extraction • Identify within a text relevant pieces of information • Entities, keywords, values etc. Topic identification • Assign a label to a text, summarizing its main topic • Generally use terms found in the text
  • 5. 5 Custom classifiers & extractors • Fine-tune built-in classifier/extractor to customer’s domain • Extend built-in classifier/extractor by additional categories/information types • Create new classifier/extractor for custom set of categories/information types • Assign editorial content to newsletters • E-mail triage • Recognize tax-relevant documents / specific contract types / .. • Recognize country-specific payment slips and extract relevant data • …
  • 6. 6 (Supervised) Learning • Statistical: SVMs, Naive Bayes, decision trees • Neural networks (deep learning) AI choices for custom classifiers/extractors Rule-based • Regular expressions • Ontologies / terminologies
  • 7. 7 Cost factors and quality aspects Rule-based Supervised learning Required data volume low high Required data quality rather low high Initial ramp-up costs rather high rather high Maintenance costs high moderate Costs of scaling system to new domains, applications and languages (→ time to market) high moderate Sensitive to context low high Recall (→ false negatives) low (1) high
  • 8. 8 Training data for supervised learning Rule of thumb (up until recently): To train a document classifier with N target categories, it requires training documents in the order of 10,000*N. → For SMEs, suitable and sufficient training data are … • In general: not readily available • Costly to procure • Investments generally don’t pay off for SMEs’ business cases
  • 9. 9 Examples from previous projects Classification task Training data Assign editorial content to newsletters (finance) Large number readily available: all articles from past newsletters Extract key data from invoices None available; Generation of synthetic data not suitable here Detect whether a message talks about adverse effects of a medication Hardly any existed; Collected some by web search (medications & known adverse effects); But highly biased: missing unknown adverse effects
  • 10. 10 Customer project by DSwiss: Encrypted digital safes • Users can upload any type of document • Classifier and extractors used for search filters • Frequently need to extend to new categories and languages • But: • Classifier is rule-based • Difficult to obtain large amount of suitable training data
  • 11. 11 Our approach in previous projects • Assess classification/extraction task • Inspect relevant data that are readily available • Do our built-in classifiers/extractors suffice? • If new classifier/extractor is needed, consider all approaches: Rule-based: Sometimes the best choice Statistical: Often good choice if decent amount of training data available and features can be engineered efficiently Neural: In practice rarely used in specific customer projects – not enough data to get advantage over statistical
  • 12. 12 A true game-changer: Pre-trained language models • Based on the Transformer architecture (e.g. BERT, GPT-2/GPT-3) • Pre-training model with prediction tasks • On massive data • Using only plain text → self-supervised learning • Build up rich contextualized representations of words vs. non-contextual word embeddings (word2vec and GloVe) • Fine-tuning model to a target task • Transfer learning
  • 13. 13 Pre-training Fine-tuning Encoder Encoder Encoder … Embedding cls w1 w2 wN… Output layer BERT Encoder Encoder Encoder … Embedding cls w1 w2 wN… Output layer for target task BERT
  • 14. 14 Acceptable performance level Fine-tuning: much less data needed Training samples (log) Fine-tuning pre-trained system Training a network from scratch Performance after training
  • 15. 15 Advantages of using BERT • Re-use architecture and trained model • Only need to replace output layer to task-specific layer • Significantly fewer training data needed → For SMEs, suitable and sufficient training data are … • In many cases feasible to procure • Investments do pay off for SMEs’ business cases
  • 16. 16 Pre-trained moderate moderate rather high ?? moderate moderate high high Cost factors and quality aspects Rule-based Supervised learning Required data volume low high Required data quality rather low high Initial ramp-up costs rather high rather high Maintenance costs high moderate Costs of scaling system to new domains, applications and languages (→ time to market) high moderate Sensitive to context low high Recall (→ false negatives) low (1) high
  • 17. 17 Joint research project • Partners: SUPSI (Lugano) and DSwiss (Zurich) • Co-funded by Innosuisse Goals: • Create core classifiers and extractors by fine-tuning BERT • Increase coverage of document types • Improve performance • Extend to new tasks, e.g. • Extract data from invoices • Extract data from ID cards