SlideShare a Scribd company logo
1 of 67
Data ethics and machine learning
Discrimination, algorithmic bias, and
how to discover them.
DINO PEDRESCHI
KDDLAB, DIPARTIMENTO DI INFORMATICA, UNIVERSITÀ DI PISA
Opportunities of
big data
4
5
Spot business trends
Prevent diseases
Fight crime
Improve transportation
Personalised services
Improve wellbeing
Event Detection
Detecting events in a geographic area
classifying the different kinds of users.
City of Rome
Metropolitan area
Covered geographical region: city of Rome
Dataset size per snapshot: ≈ 1.2 GBytes per day
Number of records: ≈ 5.6 million lines per day
8 months between 2015 and 2016
San Pietro
San Giovanni
Circo Massimo
Stadio Olimpico
End users
Traveler
Mobility
Manager
City
Personal mobility assistant
12
Carpooling
Network
Estimating wellbeing with mobility data
AI and Big Data 13
A
B
C
H
W
Predicting GDP with Retail Market data
14
generic utility
function
(rationality)
personal utility
function
(diversity)
Product
Price
Quantity
Needed
Sophistication
R2 = 17.25% R2 = 32.38%
R2 = 85.72%
Risks of big data
15
Big Data, Big Risks
Big data is algorithmic, therefore it cannot be biased! And yet…
• All traditional evils of social discrimination, and many new ones, exhibit
themselves in the big data ecosystem
• Because of its tremendous power, massive data analysis must be used
responsibly
• Technology alone won’t do: also need policy, user involvement and
education efforts
16
By 2018, 50% of business ethics
violations will occur through
improper use of big data analytics
[source: Gartner, 2016]
AI and Big Data 17
AI and Big Data 18
19
The danger of black boxes - 1
The COMPAS score (Correctional Offender Management Profiling for
Alternative Sanctions)
A 137-questions questionnaire and a predictive model for “risk of
crime recidivism.” The model is a proprietary secret of Northpointe,
Inc.
The data journalists at propublica.org have shown that
• the prediction accuracy of recidivism is rather low (around 60%)
• the model has a strong ethnic bias
◦ blacks who did not reoffend are classified as high risk twice as much as
whites who did not reoffend
◦ whites who did reoffend were classified as low risk twice as much as
blacks who did reoffend.
AI and Big Data 20
The danger of black boxes -2
The three major US credit bureaus, Experian, TransUnion, and
Equifax, providing credit scoring for millions of individuals, are
often discordant.
In a study of 500,000 records, 29% of consumers received credit
scores that differ by at least fifty points between credit bureaus, a
difference that may mean tens of thousands dollars over the life of
a mortgage [CRS+16].
AI and Big Data 21
The danger of black boxes - 3
In 2010, some homeowners with a regular payment
history of their mortgage reported a sudden drop of forty
points in their credit score, soon after their own enquiry.
AI and Big Data 22
The danger of black boxes - 4
During the 1970s and 1980s, St. George’s Hospital
Medical School in London used a computer program for
initial screening of job applicants.
The program used information from applicants’ forms,
which contained no reference to ethnicity.
The program was found to unfairly discriminate against
female applicants and ethnic minorities (inferred from
surnames and place of birth), less likely to be selected for
interview [LM88].
AI and Big Data 23
The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy classifier may
result from an accidental bias in the training data.
In a task of discriminating wolves from huskies in a
dataset of images, the resulting deep learning model is
shown to classify a wolf in a picture based solely on …
AI and Big Data 24
The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy classifier may
result from an accidental bias in the training data.
In a task of discriminating wolves from huskies in a
dataset of images, the resulting deep learning model is
shown to classify a wolf in a picture based solely on …
the presence of snow in the background!
[RSG16] “Why Should I Trust You?” Explaining the Predictions of Any Classifier
SIGKDD 2016 Conference Paper
AI and Big Data 25
Deep learning is creating computer
systems we don't fully understand
www.theverge.com/2016/7/12/12158238/first-click-deep-learning-algorithmic-
black-boxes
AI and Big Data 26
Is AI Permanently Inscrutable?
nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable
27
The danger of black boxes - 6
In a recent study at Princeton Univ, the authors show
how the semantics derived automatically from large
text/web corpora contains human biases
◦ E.g., names associated with whites were found to be
significantly easier to associate with pleasant than
unpleasant terms, compared to names associated with
black people.
Therefore, any machine learning model trained on text
data for, e.g., sentiment or opinion mining has a strong
chance of inheriting the prejudices reflected in the
human-produced training data.
AI and Big Data 28
Human Bias
AI and Big Data 29
Human Bias can be Learned - 7
AI and Big Data 30
As we stated in our 2008 SIGKDD paper that started the field of
discrimination-aware data mining [PRT08]:
“learning from historical data recording human decision making
may mean to discover traditional prejudices that are endemic in
reality, and to assign to such practices the status of general rules,
maybe unconsciously, as these rules can be deeply hidden within
the learned classifier.”
AI and Big Data 31
Policies
BIG DATA ETHICS
Satya Nadella's rules for AI
www.theverge.com/2016/6/29/12057516/satya-nadella-ai-robot-laws
AI and Big Data 33
U.S. – F.T.C.
Salvatore Ruggieri 34
www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or-
exclusion-understanding-issues/160106big-data-rpt.pdf (Sept. 2014)
U.S. – White House
Salvatore Ruggieri 35
www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1
_2014.pdf (May 2014)
U.S. – White House
Salvatore Ruggieri
36
www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc
rimination.pdf (May 2016)
U.S. – White House
www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST
C/preparing_for_the_future_of_ai.pdf (October 2016)
AI and Big Data 37
E.U. - EDPS
Salvatore Ruggieri 38
secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con
sultation/Opinions/2015/15-11-19_Big_Data_EN.pdf
E.U. - EDPS
Salvatore Ruggieri 39
secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con
sultation/Opinions/2015/15-09-11_Data_Ethics_EN.pdf
Netherlands
www.knaw.nl/en/news/publications/ethical-and-legal-aspects-of-informatics-
research (September 2016)
AI and Big Data 40
Big Data Ethics
informationaccountability.org/big-data-ethics-initiative/
AI and Big Data 41
Value-Sensitive Design
Design for privacy
Design for security
Design for inclusion
Design for sustainability
Design for democracy
Design for safety
Design for transparency
Design for accountability
Design for human capabilities
AI and Big Data 42
EU Projects: SoBigData.eu
Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015,
duration: 2015-2019, www.sobigdata.eu
AI and Big Data 43
Master Universitario Di II Livello
BigData Technology
BigData Sensing&Procurement
BigData Mining
BigData StoryTelling
BigData Ethics
Il Master Big Data ha l’obiettivo di formare“data scientists”,dei
professionisti dotati di un mix di competenze multidisciplinari
che permettono non solo di acquisire dati ed estrarne conos-
cenza, ma anche di raccontare“storie” attraverso questi dati, a
supporto delle decisioni, della creatività e dello sviluppo di
servizi innovativi, e di saper gestire le ripercussioni etiche e
legali dei Big Data, che spesso contengono informazioni
personali e suscitano problematiche relative alla privacy, alla
trasparenza,alla consapevolezza.
Aree di innovazione socio-economica:
BigData for Social Good
BigData forBusiness
Big Data AnalyticsESocial Mining
SoBigData
Data Ethics Literacy
Rapporto MIUR su Big Data, 28 Luglio 2016
◦ www.istruzione.it/allegati/2016/bigdata.pdf
Master UNIPI in Big Data Analytics & Social Mining
◦ masterbigdata.it
AI and Big Data 44
Data ethics
technologies
DISCRIMINATION DISCOVERY FROM DATA
AI and Big Data 46
Discrimination discovery
Given:
◦ an historical database of decision records, each describing
features of an applicant to a benefit
◦ e.g., a credit request to a bank and the corresponding on credit approval/denial
◦ some designated categories of applicants, such as groups
protected by anti-discrimination laws,
find whether, and in which circumstances, there are
evidences of discrimination of the designated categories
that emerge from the data.
DCUBE: Discrimination Discovery in Databases 47
German Credit dataset
DCUBE: Discrimination Discovery in Databases 48
How? Fight with the same weapons
Idea: use data mining to discover discrimination
◦ the decision policies hidden in a database can be represented by
decision rules and discovered by frequent pattern mining
◦ Once found all such decision rules, highlight all potential niches
of discrimination by filtering the rules using a measure that
quantifies the discrimination risk.
DCUBE: Discrimination Discovery in Databases 49
Discrimination discovery from data
FOREIGN_WORKER=yes
& PURPOSE=new_car & HOUSING=own
 CREDIT=bad
◦ elift = 5,19 supp = 56 conf = 0,37
elift = 5,19 means that foreign workers have more than 5
times more probability of being refused credit than the
average population (even if they own their house).
50
 Outcome:
 Funded
 Not funded
 Conditionally funded
Case Study: grant evaluation
51
Dataset attributes
52
Features of the PI
Project costs
Research Area
Project Evaluation
A potentially discriminatory rule
Antecedent
◦ Project proposals in “Physical and Analytical
Chemical Sciences”
◦ Young females
◦ Total cost of 1,358,000 Euros or above
Possible interpretation
◦ “Peer-reviewers of panel PE4 trusted young females
requiring high budgets less than males leading
similar projects”
53
Case study: US Harmonized Tariff System
US Harmonized Tariff System (HTS)
https://hts.usitc.gov/
Detailed tariff classification system for
merchandise imported to US
Chapter 61, 62, 64, 65: apparels
◦ Different taxes for same garments
separately produced for male and female
◦ Description is at semi-structured form
64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and
girls
38.6¢/kg + 10%08.9%Men and boys
CoatsFur felt hatsCotton pajamas
Different
taxes for
same
apparels for
men and
women
64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and
girls
38.6¢/kg + 10%08.9%Men and boys
CoatsFur felt hatsCotton pajamas
Different
taxes for
same
apparels for
men and
women
54
Women: 14%
Men: 9%
1.3 billions USD!!!
AI and Big Data 55
Totes-Isotoner Corp. v. U.S.
Rack Room Shoes Inc. and
Forever 21 Inc. vs U.S.
Court of International Trade
U.S. Court of Appeals for the Federal
Circuit (2014)
“[…] the courts may have concluded that
Congress had no discriminatory intent when
ruling the HTS, but there is little
doubt that gender-based tariffs have
discriminatory impact”
Sample rule from the HTS dataset
AI and Big Data 56
Soccer Player Ratings
Soccer Player Ratings
How humans
evaluate sports
performance?
Human evaluation line
Technical
features
Machine
performance
Human evaluation line
Technical
features
Technical+Contextual
features
Machine
performance
Wrapping up
AI AND BIG DATA 62
Right of explanation
• Applying AI within many domains requires
transparency and responsibility:
• health care
• finance
• surveillance
• autonomous vehicles
• Government
• EU General Data Protection Regulation (April
2016) establishes (?) a right of explanation
for all individuals to obtain “meaningful
explanations of the logic involved” when
automated (algorithmic) individual decision-
making, including profiling, takes place.
• In sharp contrast, (big) data-driven AI/ML
models are often black boxes.
AI and Big Data 63
Accountability
“Why exactly was my loan application rejected?”
“What could I have done differently so that my application
would not have been rejected?”
AI and Big Data 64
Social Mining & Big Data Ecosystem
www.sobigdata.eu
66
Knowledge Discovery
& Data Mining Lab
http://kdd.isti.cnr.it
Special thanks
• Salvatore Ruggieri
• Franco Turini
• Fosca Giannotti
• Anna Monreale
• Luca Pappalardo
SMARTCATs

More Related Content

What's hot

What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsShilpaKrishna6
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data ScienceDaniel Caesar
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data GovernanceDATAVERSITY
 
AI in Law Enforcement - Applications and Implications of Machine Vision and M...
AI in Law Enforcement - Applications and Implications of Machine Vision and M...AI in Law Enforcement - Applications and Implications of Machine Vision and M...
AI in Law Enforcement - Applications and Implications of Machine Vision and M...Daniel Faggella
 
[Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy [Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy AIIM International
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesT.S. Lim
 
Data Ethics in the Workplace: Beyond AI, Privacy and Security
Data Ethics in the Workplace: Beyond AI, Privacy and SecurityData Ethics in the Workplace: Beyond AI, Privacy and Security
Data Ethics in the Workplace: Beyond AI, Privacy and SecurityCase IQ
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data ScienceActonRoy
 

What's hot (20)

What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
Big data.
Big data.Big data.
Big data.
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data Science
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data Readiness Assessment
Big Data Readiness AssessmentBig Data Readiness Assessment
Big Data Readiness Assessment
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data Governance
 
AI in Law Enforcement - Applications and Implications of Machine Vision and M...
AI in Law Enforcement - Applications and Implications of Machine Vision and M...AI in Law Enforcement - Applications and Implications of Machine Vision and M...
AI in Law Enforcement - Applications and Implications of Machine Vision and M...
 
[Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy [Webinar Slides] Developing a Successful Data Retention Policy
[Webinar Slides] Developing a Successful Data Retention Policy
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Data Ethics in the Workplace: Beyond AI, Privacy and Security
Data Ethics in the Workplace: Beyond AI, Privacy and SecurityData Ethics in the Workplace: Beyond AI, Privacy and Security
Data Ethics in the Workplace: Beyond AI, Privacy and Security
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 

Similar to Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi

JanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptJanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptgeorgejustymirobi1
 
Big data for development
Big data for development Big data for development
Big data for development Junaid Qadir
 
Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Paolo Missier
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big databis_foresight
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docxcroysierkathey
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionIvan Gruer
 
Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.Sandra Bermúdez
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Denis Parra Santander
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationcaniceconsulting
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and DefenseKishor Datta Gupta
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
Big data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-webBig data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-webRick Bouter
 
Unveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdfUnveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdfKajal Digital
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 

Similar to Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi (20)

JanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptJanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.ppt
 
Big data for development
Big data for development Big data for development
Big data for development
 
Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Social Νetworks Data Mining
Social Νetworks Data MiningSocial Νetworks Data Mining
Social Νetworks Data Mining
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" Introduction
 
Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.Algocracy and the state of AI in public administrations.
Algocracy and the state of AI in public administrations.
 
Business with Big data
Business with Big dataBusiness with Big data
Business with Big data
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
Human-Centered Machine Learning: Harnessing Visualization and Interactivity f...
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
3.2
3.23.2
3.2
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislation
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
Big data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-webBig data 4 4 the art of the possible 4-en-web
Big data 4 4 the art of the possible 4-en-web
 
Unveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdfUnveiling the Power of Data Science.pdf
Unveiling the Power of Data Science.pdf
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 

More from Data Driven Innovation

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Data Driven Innovation
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...Data Driven Innovation
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...Data Driven Innovation
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Data Driven Innovation
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...Data Driven Innovation
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Data Driven Innovation
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Data Driven Innovation
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Data Driven Innovation
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...Data Driven Innovation
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Data Driven Innovation
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Data Driven Innovation
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...Data Driven Innovation
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)Data Driven Innovation
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Data Driven Innovation
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Data Driven Innovation
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Data Driven Innovation
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Data Driven Innovation
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Data Driven Innovation
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Driven Innovation
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data Driven Innovation
 

More from Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
 

Recently uploaded

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Recently uploaded (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi

  • 1. Data ethics and machine learning Discrimination, algorithmic bias, and how to discover them. DINO PEDRESCHI KDDLAB, DIPARTIMENTO DI INFORMATICA, UNIVERSITÀ DI PISA
  • 2.
  • 3.
  • 5. 5 Spot business trends Prevent diseases Fight crime Improve transportation Personalised services Improve wellbeing
  • 6. Event Detection Detecting events in a geographic area classifying the different kinds of users. City of Rome Metropolitan area Covered geographical region: city of Rome Dataset size per snapshot: ≈ 1.2 GBytes per day Number of records: ≈ 5.6 million lines per day 8 months between 2015 and 2016
  • 13. Estimating wellbeing with mobility data AI and Big Data 13 A B C H W
  • 14. Predicting GDP with Retail Market data 14 generic utility function (rationality) personal utility function (diversity) Product Price Quantity Needed Sophistication R2 = 17.25% R2 = 32.38% R2 = 85.72%
  • 15. Risks of big data 15
  • 16. Big Data, Big Risks Big data is algorithmic, therefore it cannot be biased! And yet… • All traditional evils of social discrimination, and many new ones, exhibit themselves in the big data ecosystem • Because of its tremendous power, massive data analysis must be used responsibly • Technology alone won’t do: also need policy, user involvement and education efforts 16
  • 17. By 2018, 50% of business ethics violations will occur through improper use of big data analytics [source: Gartner, 2016] AI and Big Data 17
  • 18. AI and Big Data 18
  • 19. 19
  • 20. The danger of black boxes - 1 The COMPAS score (Correctional Offender Management Profiling for Alternative Sanctions) A 137-questions questionnaire and a predictive model for “risk of crime recidivism.” The model is a proprietary secret of Northpointe, Inc. The data journalists at propublica.org have shown that • the prediction accuracy of recidivism is rather low (around 60%) • the model has a strong ethnic bias ◦ blacks who did not reoffend are classified as high risk twice as much as whites who did not reoffend ◦ whites who did reoffend were classified as low risk twice as much as blacks who did reoffend. AI and Big Data 20
  • 21. The danger of black boxes -2 The three major US credit bureaus, Experian, TransUnion, and Equifax, providing credit scoring for millions of individuals, are often discordant. In a study of 500,000 records, 29% of consumers received credit scores that differ by at least fifty points between credit bureaus, a difference that may mean tens of thousands dollars over the life of a mortgage [CRS+16]. AI and Big Data 21
  • 22. The danger of black boxes - 3 In 2010, some homeowners with a regular payment history of their mortgage reported a sudden drop of forty points in their credit score, soon after their own enquiry. AI and Big Data 22
  • 23. The danger of black boxes - 4 During the 1970s and 1980s, St. George’s Hospital Medical School in London used a computer program for initial screening of job applicants. The program used information from applicants’ forms, which contained no reference to ethnicity. The program was found to unfairly discriminate against female applicants and ethnic minorities (inferred from surnames and place of birth), less likely to be selected for interview [LM88]. AI and Big Data 23
  • 24. The danger of black boxes - 5 In a recent paper at SIGKDD 2016 [RSG16] the authors show how an accurate but untrustworthy classifier may result from an accidental bias in the training data. In a task of discriminating wolves from huskies in a dataset of images, the resulting deep learning model is shown to classify a wolf in a picture based solely on … AI and Big Data 24
  • 25. The danger of black boxes - 5 In a recent paper at SIGKDD 2016 [RSG16] the authors show how an accurate but untrustworthy classifier may result from an accidental bias in the training data. In a task of discriminating wolves from huskies in a dataset of images, the resulting deep learning model is shown to classify a wolf in a picture based solely on … the presence of snow in the background! [RSG16] “Why Should I Trust You?” Explaining the Predictions of Any Classifier SIGKDD 2016 Conference Paper AI and Big Data 25
  • 26. Deep learning is creating computer systems we don't fully understand www.theverge.com/2016/7/12/12158238/first-click-deep-learning-algorithmic- black-boxes AI and Big Data 26
  • 27. Is AI Permanently Inscrutable? nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable 27
  • 28. The danger of black boxes - 6 In a recent study at Princeton Univ, the authors show how the semantics derived automatically from large text/web corpora contains human biases ◦ E.g., names associated with whites were found to be significantly easier to associate with pleasant than unpleasant terms, compared to names associated with black people. Therefore, any machine learning model trained on text data for, e.g., sentiment or opinion mining has a strong chance of inheriting the prejudices reflected in the human-produced training data. AI and Big Data 28
  • 29. Human Bias AI and Big Data 29
  • 30. Human Bias can be Learned - 7 AI and Big Data 30
  • 31. As we stated in our 2008 SIGKDD paper that started the field of discrimination-aware data mining [PRT08]: “learning from historical data recording human decision making may mean to discover traditional prejudices that are endemic in reality, and to assign to such practices the status of general rules, maybe unconsciously, as these rules can be deeply hidden within the learned classifier.” AI and Big Data 31
  • 33. Satya Nadella's rules for AI www.theverge.com/2016/6/29/12057516/satya-nadella-ai-robot-laws AI and Big Data 33
  • 34. U.S. – F.T.C. Salvatore Ruggieri 34 www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or- exclusion-understanding-issues/160106big-data-rpt.pdf (Sept. 2014)
  • 35. U.S. – White House Salvatore Ruggieri 35 www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1 _2014.pdf (May 2014)
  • 36. U.S. – White House Salvatore Ruggieri 36 www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc rimination.pdf (May 2016)
  • 37. U.S. – White House www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST C/preparing_for_the_future_of_ai.pdf (October 2016) AI and Big Data 37
  • 38. E.U. - EDPS Salvatore Ruggieri 38 secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con sultation/Opinions/2015/15-11-19_Big_Data_EN.pdf
  • 39. E.U. - EDPS Salvatore Ruggieri 39 secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con sultation/Opinions/2015/15-09-11_Data_Ethics_EN.pdf
  • 42. Value-Sensitive Design Design for privacy Design for security Design for inclusion Design for sustainability Design for democracy Design for safety Design for transparency Design for accountability Design for human capabilities AI and Big Data 42
  • 43. EU Projects: SoBigData.eu Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015, duration: 2015-2019, www.sobigdata.eu AI and Big Data 43
  • 44. Master Universitario Di II Livello BigData Technology BigData Sensing&Procurement BigData Mining BigData StoryTelling BigData Ethics Il Master Big Data ha l’obiettivo di formare“data scientists”,dei professionisti dotati di un mix di competenze multidisciplinari che permettono non solo di acquisire dati ed estrarne conos- cenza, ma anche di raccontare“storie” attraverso questi dati, a supporto delle decisioni, della creatività e dello sviluppo di servizi innovativi, e di saper gestire le ripercussioni etiche e legali dei Big Data, che spesso contengono informazioni personali e suscitano problematiche relative alla privacy, alla trasparenza,alla consapevolezza. Aree di innovazione socio-economica: BigData for Social Good BigData forBusiness Big Data AnalyticsESocial Mining SoBigData Data Ethics Literacy Rapporto MIUR su Big Data, 28 Luglio 2016 ◦ www.istruzione.it/allegati/2016/bigdata.pdf Master UNIPI in Big Data Analytics & Social Mining ◦ masterbigdata.it AI and Big Data 44
  • 46. AI and Big Data 46
  • 47. Discrimination discovery Given: ◦ an historical database of decision records, each describing features of an applicant to a benefit ◦ e.g., a credit request to a bank and the corresponding on credit approval/denial ◦ some designated categories of applicants, such as groups protected by anti-discrimination laws, find whether, and in which circumstances, there are evidences of discrimination of the designated categories that emerge from the data. DCUBE: Discrimination Discovery in Databases 47
  • 48. German Credit dataset DCUBE: Discrimination Discovery in Databases 48
  • 49. How? Fight with the same weapons Idea: use data mining to discover discrimination ◦ the decision policies hidden in a database can be represented by decision rules and discovered by frequent pattern mining ◦ Once found all such decision rules, highlight all potential niches of discrimination by filtering the rules using a measure that quantifies the discrimination risk. DCUBE: Discrimination Discovery in Databases 49
  • 50. Discrimination discovery from data FOREIGN_WORKER=yes & PURPOSE=new_car & HOUSING=own  CREDIT=bad ◦ elift = 5,19 supp = 56 conf = 0,37 elift = 5,19 means that foreign workers have more than 5 times more probability of being refused credit than the average population (even if they own their house). 50
  • 51.  Outcome:  Funded  Not funded  Conditionally funded Case Study: grant evaluation 51
  • 52. Dataset attributes 52 Features of the PI Project costs Research Area Project Evaluation
  • 53. A potentially discriminatory rule Antecedent ◦ Project proposals in “Physical and Analytical Chemical Sciences” ◦ Young females ◦ Total cost of 1,358,000 Euros or above Possible interpretation ◦ “Peer-reviewers of panel PE4 trusted young females requiring high budgets less than males leading similar projects” 53
  • 54. Case study: US Harmonized Tariff System US Harmonized Tariff System (HTS) https://hts.usitc.gov/ Detailed tariff classification system for merchandise imported to US Chapter 61, 62, 64, 65: apparels ◦ Different taxes for same garments separately produced for male and female ◦ Description is at semi-structured form 64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and girls 38.6¢/kg + 10%08.9%Men and boys CoatsFur felt hatsCotton pajamas Different taxes for same apparels for men and women 64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and girls 38.6¢/kg + 10%08.9%Men and boys CoatsFur felt hatsCotton pajamas Different taxes for same apparels for men and women 54 Women: 14% Men: 9% 1.3 billions USD!!!
  • 55. AI and Big Data 55 Totes-Isotoner Corp. v. U.S. Rack Room Shoes Inc. and Forever 21 Inc. vs U.S. Court of International Trade U.S. Court of Appeals for the Federal Circuit (2014) “[…] the courts may have concluded that Congress had no discriminatory intent when ruling the HTS, but there is little doubt that gender-based tariffs have discriminatory impact”
  • 56. Sample rule from the HTS dataset AI and Big Data 56
  • 58. Soccer Player Ratings How humans evaluate sports performance?
  • 59.
  • 62. Wrapping up AI AND BIG DATA 62
  • 63. Right of explanation • Applying AI within many domains requires transparency and responsibility: • health care • finance • surveillance • autonomous vehicles • Government • EU General Data Protection Regulation (April 2016) establishes (?) a right of explanation for all individuals to obtain “meaningful explanations of the logic involved” when automated (algorithmic) individual decision- making, including profiling, takes place. • In sharp contrast, (big) data-driven AI/ML models are often black boxes. AI and Big Data 63
  • 64. Accountability “Why exactly was my loan application rejected?” “What could I have done differently so that my application would not have been rejected?” AI and Big Data 64
  • 65. Social Mining & Big Data Ecosystem www.sobigdata.eu
  • 66. 66 Knowledge Discovery & Data Mining Lab http://kdd.isti.cnr.it
  • 67. Special thanks • Salvatore Ruggieri • Franco Turini • Fosca Giannotti • Anna Monreale • Luca Pappalardo SMARTCATs