SlideShare a Scribd company logo
1 of 24
Unlocking Indexing and
Search Data Goldmine
Written question data
• 1.5 million written questions
• Many fields, we currently only use:
• uri - unique identifier
- when tabled, given a uri. Later the tabled one deleted, and an answered question created with
new uri
• uin - not unique identifier, can be reused in different sessions, and can be missing
• title – can be missing
• questionText
• answerText
• askingMember_ses – members share the same ses Id, disambiguate by their incumbency dates
• answeringMember_ses – members share the same ses Id
• answeringDept_ses
• dateTabled
• dateOfAnswer
• dateForAnswer
Schema
Implementation
Answering department ses id
• 191 unique answering department ses ids
• Top 5:
Department of Health (10%)
Home Office (8%)
Ministry of Defence (6%)
Foreign and Commonwealth Office (6%)
Treasury (5%)
• We only have 39 answering bodies in triple store
• Departments evolved and changed names, need to model these
• 601,991 (40.1%) questions with answering bodies not in triple store
• Top 5 missing answering bodies
Department of Health
Department of Trade and Industry
Department for Communities and Local Government
Department of the Environment
Department for Culture, Media and Sport
• 108,128 (7.2%) have null answering dept ses id
Asking member ses id
• 2,836 unique asking member ses ids
• Top 5
John Bercow (0.8%)
Jim Cunningham (0.7%)
Norman Baker (0.6%)
Paul Flynn (0.6%)
Andrew Rosindell (0.6%)
• Three missing in the triple store
RtHonLord Aberdare
Elaine Thomson
Jeff Cuthbert (National Assembly for Wales)
• 6,942 (0.6%) have null asking member ses id
Answering member ses id
• 834 unique answering member ses ids
• Top 5
Dawn Primarolo (1%)
Adam Ingram (0.8%)
Rosie Winterton (0.8%)
Ben Bradshaw (0.8%)
Elliot Morley (0.7%)
• One missing in the triple store
RtHonLord Aberdare
• 6,744 (0.4%) have null answering member ses id
Other
• Days between Date Tabled and Date Of Answer
• Average 14 days
• Outliers: -748 days, 1317 days
• Days between Date For Answer and Date Of Answer
• Average 3.8 days
• Outliers: -7930 days, 7895 days
• Null uin value
• 347671 (23%), mainly old data before 2000
• Null title value
• 202213 (13%), mainly old data before 1993
Recent data
• 70,880 questions tabled since Jan 1, 2017
• Answering department
• 36 unique vs. 191 (all data)
• 3 not in triple store vs. 152 (all data)
• 9,644 (13.6%) questions with answering bodies not in triple store vs. 40.1% (all data)
• Asking member
• 1025 unique vs. 2,836 (all data)
• 1,970 (2.8%) missing vs. 0.6% (all data)
• Answering member
• 150 unique vs. 834 (all data)
• 1,970 (2.8%) missing vs. 0.4% (all data)
• Days between Date Tabled and Date Of Answer
• Average 9 days vs. 14 days (all data)
• Days between Date For Answer and Date Of Answer
• Average 2.7 days vs. 3.8 days (all data)
Querying data
• Fixed query (packaged SPARQL queries)
• Questions asked by a member
https://api.parliament.uk/query/questions_askedby_member?member_id=4fn7q5Wl
• Questions answered by a member
https://api.parliament.uk/query/questions_answeredby_member?member_id=SWXSOmi9
• Questions search by terms in heading
https://api.parliament.uk/query/questions_search_by_title?lowercase_string=health
• OData (you can query in almost any way!)
• Total number of questions https://api.parliament.uk/OData/Question/$count
• Total number of answers https://api.parliament.uk/OData/Answer/$count
• Questions by a member https://api.parliament.uk/OData/Member('0FqjjgNp')/AskingPersonHasQuestion
• Answers by a member https://api.parliament.uk/OData/Member('0FqjjgNp')/AnsweringPersonHasAnswer
• Questions asked on a date
https://api.parliament.uk/OData/Question?$filter=QuestionAskedAt%20eq%202018-05-23T00:00:00Z
• Questions asked between two dates
https://api.parliament.uk/OData/Question?$filter=QuestionAskedAt%20gt%202018-04-
23T00:00:00Z%20and%20QuestionAskedAt%20lt%202018-04-26T00:00:00Z
• Correcting answers expanded with corrected answers
https://api.parliament.uk/OData/CorrectingAnswer?$expand=AnswerReplacesAnswer
Distributions of data
• Follow a power law distribution
0
2000
4000
6000
8000
10000
12000
1
40
79
118
157
196
235
274
313
352
391
430
469
508
547
586
625
664
703
742
781
820
859
898
937
976
1015
1054
1093
1132
1171
1210
1249
1288
1327
1366
1405
1444
1483
1522
1561
1600
1639
1678
1717
1756
1795
1834
1873
1912
1951
1990
2029
2068
2107
2146
2185
2224
2263
2302
2341
2380
2419
2458
2497
2536
2575
2614
2653
2692
2731
2770
2809
Distribution of number of questions for asking members
0
2000
4000
6000
8000
10000
12000
14000
16000
1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
376
391
406
421
436
451
466
481
496
511
526
541
556
571
586
601
616
631
646
661
676
691
706
721
736
751
766
781
796
811
826
Distribution of number of questions for answering members
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
175
178
181
184
187
190
Distribution of number of questions for answering bodies
0
200
400
600
800
1000
1200
1400
1600
1800
Distribution of number of questions for tabling date
0
200
400
600
800
1000
1200
1400
1600
1800
1/6/2017 0:00 2/6/2017 0:003/6/2017 0:00 4/6/2017 0:00 5/6/2017 0:00 6/6/2017 0:00 7/6/2017 0:00 8/6/2017 0:00 9/6/2017 0:00 10/6/2017
0:00
11/6/2017
0:00
12/6/2017
0:00
1/6/2018 0:00 2/6/2018 0:003/6/2018 0:00 4/6/2018 0:00 5/6/2018 0:00
Distribution of number of questions for tabling date (January 2017 to Now)
0
500
1000
1500
2000
2500
3000
Distribution of number of questions for answering date
0
200
400
600
800
1000
1200
1/3/2017 0:00 2/3/2017 0:003/3/2017 0:00 4/3/2017 0:00 5/3/2017 0:00 6/3/2017 0:00 7/3/2017 0:00 8/3/2017 0:00 9/3/2017 0:00 10/3/2017
0:00
11/3/2017
0:00
12/3/2017
0:00
1/3/2018 0:00 2/3/2018 0:003/3/2018 0:00 4/3/2018 0:00 5/3/2018 0:00
Distribution of number of questions for answering date (January 2017 to Now)
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
-40 -20 0 20 40 60 80 100 120 140 160 180
Distribution of number of questions vs. days between date for and of answer
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175
Distribution of number of questions vs. days between table date and date of answer
0
50000
100000
150000
200000
250000
300000
dept
treasury
minister
finance
armed
commons
support
immigration
protection
british
statistics
civil
legal
dean
whitty
ministerial
operations
financial
charges
rented
change
relations
homes
middle
army
green
yorkshire
day
duties
rail
diabetes
china
shipping
independent
future
select
rescue
palestine
blackstone
doctors
minimum
prevention
peace
maternity
russia
detainees
political
unmanned
gaza
trident
bbc
colombia
agreements
fighter
languages
mail
prescriptions
ashton
inspections
Distribution of terms counts in question headings
Member question network
• A way to get an overview of question data
• Nodes: 2,893 members
• Edges:
• 175,484 (member A’s question answered by member B)
• Properties of the network (using Python NetworkX)
• Average Node Degree: 121.3
• Network diameter: 6
• Network radius: 3
• Average shortest path length: 2.6
• Clustering coefficient: 0.3
• Network density: 0.04
• Network Centre:
• Earl Attlee, Lord Hylton, Lord Wallace of Saltaire, Lord Stoddart of Swindon, Earl Howe,
Lord Bates, Lord Patten, Lord Pearson of Rannoch, Lord Hoyle, Lord Howell of Guildford,
Earl of Shrewsbury, Lord Davies of Oldham, Baroness Chalker of Wallasey, Lord Braine of
Wheatley, Lord Waddington, Baroness Neville-Rolfe
A
B
C10 5
1
250
All data - 2,893 nodes, 175,484 edges
Abortion – 1,281 questions
House of Commons
House of Lords
Brexit – 420 questions
House of Commons
Education–44,714 questions
• We are only scratching the surface of the goldmine
• More question data to import
• Other data fields to import
• Subject indexing and related items data to import
• Other types of data to import
• Much more to learn from the data
• Some ideas
• Incorporate answering departments, and terms and topics in answer
networks
• Improve network visualisation
• Navigation, link direction, weights, zoom in to view details of members etc
• Public can access question data through data platform, and do
fantastic research and discovery!
Further reading
• https://pds.blog.parliament.uk/2017/06/23/a-new-data-service-for-parliament/
• https://pds.blog.parliament.uk/2018/01/24/accessing-semantic-data-with-odata-
web-interface/
• https://github.com/ukparliament/ontologies/tree/master/question-and-answer
• https://medium.com/@langsamu/api-parliament-uk-7b87597019a4
• http://odata.github.io/
• http://www.iaeng.org/IJCS/issues_v43/issue_2/IJCS_43_2_03.pdf

More Related Content

Similar to Unlocking the Indexing and Search Data Goldmine

2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is bestlucenerevolution
 
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...Anne Thessen
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningHJ van Veen
 
RedisConf18 - Amazing User Experiences
RedisConf18 - Amazing User Experiences   RedisConf18 - Amazing User Experiences
RedisConf18 - Amazing User Experiences Redis Labs
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchJulia Kiseleva
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?Ehren Reilly
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social MediaJeffrey Nichols
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
2016-08-22_winning_on_technicalities_for_linkedin
2016-08-22_winning_on_technicalities_for_linkedin2016-08-22_winning_on_technicalities_for_linkedin
2016-08-22_winning_on_technicalities_for_linkedinDaniel Thornton
 
Implementing Keyword Sort with Elasticsearch
Implementing Keyword Sort with ElasticsearchImplementing Keyword Sort with Elasticsearch
Implementing Keyword Sort with ElasticsearchYann Cluchey
 
Visualization - Concept presentation
Visualization - Concept presentationVisualization - Concept presentation
Visualization - Concept presentationWade Treichler
 
IBM Watson Analytics - Sales Prediction
IBM Watson Analytics - Sales Prediction IBM Watson Analytics - Sales Prediction
IBM Watson Analytics - Sales Prediction Pragyan Sharma
 

Similar to Unlocking the Indexing and Search Data Goldmine (20)

2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
 
Ojala "The Sophisticated User"
Ojala "The Sophisticated User"Ojala "The Sophisticated User"
Ojala "The Sophisticated User"
 
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine Learning
 
RedisConf18 - Amazing User Experiences
RedisConf18 - Amazing User Experiences   RedisConf18 - Amazing User Experiences
RedisConf18 - Amazing User Experiences
 
Scaling small apps
Scaling small appsScaling small apps
Scaling small apps
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?
 
Taming the Wilde
Taming the WildeTaming the Wilde
Taming the Wilde
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social Media
 
Data platform ID generation
Data platform ID generationData platform ID generation
Data platform ID generation
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Haifa
HaifaHaifa
Haifa
 
2016-08-22_winning_on_technicalities_for_linkedin
2016-08-22_winning_on_technicalities_for_linkedin2016-08-22_winning_on_technicalities_for_linkedin
2016-08-22_winning_on_technicalities_for_linkedin
 
Passwords
PasswordsPasswords
Passwords
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Data
DataData
Data
 
Implementing Keyword Sort with Elasticsearch
Implementing Keyword Sort with ElasticsearchImplementing Keyword Sort with Elasticsearch
Implementing Keyword Sort with Elasticsearch
 
Visualization - Concept presentation
Visualization - Concept presentationVisualization - Concept presentation
Visualization - Concept presentation
 
IBM Watson Analytics - Sales Prediction
IBM Watson Analytics - Sales Prediction IBM Watson Analytics - Sales Prediction
IBM Watson Analytics - Sales Prediction
 

More from UK Parliament Data

Making parliamentary procedure machine readable
Making parliamentary procedure machine readableMaking parliamentary procedure machine readable
Making parliamentary procedure machine readableUK Parliament Data
 
Modelling Parliamentary Procedure
Modelling Parliamentary ProcedureModelling Parliamentary Procedure
Modelling Parliamentary ProcedureUK Parliament Data
 
A new data platform for Parliament
A new data platform for ParliamentA new data platform for Parliament
A new data platform for ParliamentUK Parliament Data
 
What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?UK Parliament Data
 
UK Parliament: the long road to open data
UK Parliament:  the long road to open data UK Parliament:  the long road to open data
UK Parliament: the long road to open data UK Parliament Data
 
Domain Driven Design at UK Parliament
Domain Driven Design at UK ParliamentDomain Driven Design at UK Parliament
Domain Driven Design at UK ParliamentUK Parliament Data
 
Parliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan BarrettParliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan BarrettUK Parliament Data
 
Playing with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony HirstPlaying with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony HirstUK Parliament Data
 
How technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve GoodrichHow technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve GoodrichUK Parliament Data
 
Mapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli HawkinsMapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli HawkinsUK Parliament Data
 

More from UK Parliament Data (14)

Coping with complexity
Coping with complexityCoping with complexity
Coping with complexity
 
Making parliamentary procedure machine readable
Making parliamentary procedure machine readableMaking parliamentary procedure machine readable
Making parliamentary procedure machine readable
 
What would erskine may do?
What would erskine may do?What would erskine may do?
What would erskine may do?
 
Modelling Parliamentary Procedure
Modelling Parliamentary ProcedureModelling Parliamentary Procedure
Modelling Parliamentary Procedure
 
Domain modelling Parliament
Domain modelling Parliament Domain modelling Parliament
Domain modelling Parliament
 
A new data platform for Parliament
A new data platform for ParliamentA new data platform for Parliament
A new data platform for Parliament
 
What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?
 
UK Parliament: the long road to open data
UK Parliament:  the long road to open data UK Parliament:  the long road to open data
UK Parliament: the long road to open data
 
Domain Driven Design at UK Parliament
Domain Driven Design at UK ParliamentDomain Driven Design at UK Parliament
Domain Driven Design at UK Parliament
 
Open Revolution - James Smith
Open Revolution - James SmithOpen Revolution - James Smith
Open Revolution - James Smith
 
Parliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan BarrettParliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan Barrett
 
Playing with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony HirstPlaying with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony Hirst
 
How technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve GoodrichHow technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve Goodrich
 
Mapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli HawkinsMapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli Hawkins
 

Recently uploaded

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Recently uploaded (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

Unlocking the Indexing and Search Data Goldmine

  • 2.
  • 3.
  • 4. Written question data • 1.5 million written questions • Many fields, we currently only use: • uri - unique identifier - when tabled, given a uri. Later the tabled one deleted, and an answered question created with new uri • uin - not unique identifier, can be reused in different sessions, and can be missing • title – can be missing • questionText • answerText • askingMember_ses – members share the same ses Id, disambiguate by their incumbency dates • answeringMember_ses – members share the same ses Id • answeringDept_ses • dateTabled • dateOfAnswer • dateForAnswer
  • 6. Answering department ses id • 191 unique answering department ses ids • Top 5: Department of Health (10%) Home Office (8%) Ministry of Defence (6%) Foreign and Commonwealth Office (6%) Treasury (5%) • We only have 39 answering bodies in triple store • Departments evolved and changed names, need to model these • 601,991 (40.1%) questions with answering bodies not in triple store • Top 5 missing answering bodies Department of Health Department of Trade and Industry Department for Communities and Local Government Department of the Environment Department for Culture, Media and Sport • 108,128 (7.2%) have null answering dept ses id
  • 7. Asking member ses id • 2,836 unique asking member ses ids • Top 5 John Bercow (0.8%) Jim Cunningham (0.7%) Norman Baker (0.6%) Paul Flynn (0.6%) Andrew Rosindell (0.6%) • Three missing in the triple store RtHonLord Aberdare Elaine Thomson Jeff Cuthbert (National Assembly for Wales) • 6,942 (0.6%) have null asking member ses id
  • 8. Answering member ses id • 834 unique answering member ses ids • Top 5 Dawn Primarolo (1%) Adam Ingram (0.8%) Rosie Winterton (0.8%) Ben Bradshaw (0.8%) Elliot Morley (0.7%) • One missing in the triple store RtHonLord Aberdare • 6,744 (0.4%) have null answering member ses id
  • 9. Other • Days between Date Tabled and Date Of Answer • Average 14 days • Outliers: -748 days, 1317 days • Days between Date For Answer and Date Of Answer • Average 3.8 days • Outliers: -7930 days, 7895 days • Null uin value • 347671 (23%), mainly old data before 2000 • Null title value • 202213 (13%), mainly old data before 1993
  • 10. Recent data • 70,880 questions tabled since Jan 1, 2017 • Answering department • 36 unique vs. 191 (all data) • 3 not in triple store vs. 152 (all data) • 9,644 (13.6%) questions with answering bodies not in triple store vs. 40.1% (all data) • Asking member • 1025 unique vs. 2,836 (all data) • 1,970 (2.8%) missing vs. 0.6% (all data) • Answering member • 150 unique vs. 834 (all data) • 1,970 (2.8%) missing vs. 0.4% (all data) • Days between Date Tabled and Date Of Answer • Average 9 days vs. 14 days (all data) • Days between Date For Answer and Date Of Answer • Average 2.7 days vs. 3.8 days (all data)
  • 11. Querying data • Fixed query (packaged SPARQL queries) • Questions asked by a member https://api.parliament.uk/query/questions_askedby_member?member_id=4fn7q5Wl • Questions answered by a member https://api.parliament.uk/query/questions_answeredby_member?member_id=SWXSOmi9 • Questions search by terms in heading https://api.parliament.uk/query/questions_search_by_title?lowercase_string=health • OData (you can query in almost any way!) • Total number of questions https://api.parliament.uk/OData/Question/$count • Total number of answers https://api.parliament.uk/OData/Answer/$count • Questions by a member https://api.parliament.uk/OData/Member('0FqjjgNp')/AskingPersonHasQuestion • Answers by a member https://api.parliament.uk/OData/Member('0FqjjgNp')/AnsweringPersonHasAnswer • Questions asked on a date https://api.parliament.uk/OData/Question?$filter=QuestionAskedAt%20eq%202018-05-23T00:00:00Z • Questions asked between two dates https://api.parliament.uk/OData/Question?$filter=QuestionAskedAt%20gt%202018-04- 23T00:00:00Z%20and%20QuestionAskedAt%20lt%202018-04-26T00:00:00Z • Correcting answers expanded with corrected answers https://api.parliament.uk/OData/CorrectingAnswer?$expand=AnswerReplacesAnswer
  • 12. Distributions of data • Follow a power law distribution 0 2000 4000 6000 8000 10000 12000 1 40 79 118 157 196 235 274 313 352 391 430 469 508 547 586 625 664 703 742 781 820 859 898 937 976 1015 1054 1093 1132 1171 1210 1249 1288 1327 1366 1405 1444 1483 1522 1561 1600 1639 1678 1717 1756 1795 1834 1873 1912 1951 1990 2029 2068 2107 2146 2185 2224 2263 2302 2341 2380 2419 2458 2497 2536 2575 2614 2653 2692 2731 2770 2809 Distribution of number of questions for asking members 0 2000 4000 6000 8000 10000 12000 14000 16000 1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301 316 331 346 361 376 391 406 421 436 451 466 481 496 511 526 541 556 571 586 601 616 631 646 661 676 691 706 721 736 751 766 781 796 811 826 Distribution of number of questions for answering members
  • 14. 0 200 400 600 800 1000 1200 1400 1600 1800 Distribution of number of questions for tabling date 0 200 400 600 800 1000 1200 1400 1600 1800 1/6/2017 0:00 2/6/2017 0:003/6/2017 0:00 4/6/2017 0:00 5/6/2017 0:00 6/6/2017 0:00 7/6/2017 0:00 8/6/2017 0:00 9/6/2017 0:00 10/6/2017 0:00 11/6/2017 0:00 12/6/2017 0:00 1/6/2018 0:00 2/6/2018 0:003/6/2018 0:00 4/6/2018 0:00 5/6/2018 0:00 Distribution of number of questions for tabling date (January 2017 to Now)
  • 15. 0 500 1000 1500 2000 2500 3000 Distribution of number of questions for answering date 0 200 400 600 800 1000 1200 1/3/2017 0:00 2/3/2017 0:003/3/2017 0:00 4/3/2017 0:00 5/3/2017 0:00 6/3/2017 0:00 7/3/2017 0:00 8/3/2017 0:00 9/3/2017 0:00 10/3/2017 0:00 11/3/2017 0:00 12/3/2017 0:00 1/3/2018 0:00 2/3/2018 0:003/3/2018 0:00 4/3/2018 0:00 5/3/2018 0:00 Distribution of number of questions for answering date (January 2017 to Now)
  • 16. 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 -40 -20 0 20 40 60 80 100 120 140 160 180 Distribution of number of questions vs. days between date for and of answer 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 Distribution of number of questions vs. days between table date and date of answer
  • 18. Member question network • A way to get an overview of question data • Nodes: 2,893 members • Edges: • 175,484 (member A’s question answered by member B) • Properties of the network (using Python NetworkX) • Average Node Degree: 121.3 • Network diameter: 6 • Network radius: 3 • Average shortest path length: 2.6 • Clustering coefficient: 0.3 • Network density: 0.04 • Network Centre: • Earl Attlee, Lord Hylton, Lord Wallace of Saltaire, Lord Stoddart of Swindon, Earl Howe, Lord Bates, Lord Patten, Lord Pearson of Rannoch, Lord Hoyle, Lord Howell of Guildford, Earl of Shrewsbury, Lord Davies of Oldham, Baroness Chalker of Wallasey, Lord Braine of Wheatley, Lord Waddington, Baroness Neville-Rolfe A B C10 5 1 250
  • 19. All data - 2,893 nodes, 175,484 edges
  • 20. Abortion – 1,281 questions House of Commons House of Lords
  • 21. Brexit – 420 questions House of Commons
  • 23. • We are only scratching the surface of the goldmine • More question data to import • Other data fields to import • Subject indexing and related items data to import • Other types of data to import • Much more to learn from the data • Some ideas • Incorporate answering departments, and terms and topics in answer networks • Improve network visualisation • Navigation, link direction, weights, zoom in to view details of members etc • Public can access question data through data platform, and do fantastic research and discovery!
  • 24. Further reading • https://pds.blog.parliament.uk/2017/06/23/a-new-data-service-for-parliament/ • https://pds.blog.parliament.uk/2018/01/24/accessing-semantic-data-with-odata- web-interface/ • https://github.com/ukparliament/ontologies/tree/master/question-and-answer • https://medium.com/@langsamu/api-parliament-uk-7b87597019a4 • http://odata.github.io/ • http://www.iaeng.org/IJCS/issues_v43/issue_2/IJCS_43_2_03.pdf