4. Written question data
• 1.5 million written questions
• Many fields, we currently only use:
• uri - unique identifier
- when tabled, given a uri. Later the tabled one deleted, and an answered question created with
new uri
• uin - not unique identifier, can be reused in different sessions, and can be missing
• title – can be missing
• questionText
• answerText
• askingMember_ses – members share the same ses Id, disambiguate by their incumbency dates
• answeringMember_ses – members share the same ses Id
• answeringDept_ses
• dateTabled
• dateOfAnswer
• dateForAnswer
6. Answering department ses id
• 191 unique answering department ses ids
• Top 5:
Department of Health (10%)
Home Office (8%)
Ministry of Defence (6%)
Foreign and Commonwealth Office (6%)
Treasury (5%)
• We only have 39 answering bodies in triple store
• Departments evolved and changed names, need to model these
• 601,991 (40.1%) questions with answering bodies not in triple store
• Top 5 missing answering bodies
Department of Health
Department of Trade and Industry
Department for Communities and Local Government
Department of the Environment
Department for Culture, Media and Sport
• 108,128 (7.2%) have null answering dept ses id
7. Asking member ses id
• 2,836 unique asking member ses ids
• Top 5
John Bercow (0.8%)
Jim Cunningham (0.7%)
Norman Baker (0.6%)
Paul Flynn (0.6%)
Andrew Rosindell (0.6%)
• Three missing in the triple store
RtHonLord Aberdare
Elaine Thomson
Jeff Cuthbert (National Assembly for Wales)
• 6,942 (0.6%) have null asking member ses id
8. Answering member ses id
• 834 unique answering member ses ids
• Top 5
Dawn Primarolo (1%)
Adam Ingram (0.8%)
Rosie Winterton (0.8%)
Ben Bradshaw (0.8%)
Elliot Morley (0.7%)
• One missing in the triple store
RtHonLord Aberdare
• 6,744 (0.4%) have null answering member ses id
9. Other
• Days between Date Tabled and Date Of Answer
• Average 14 days
• Outliers: -748 days, 1317 days
• Days between Date For Answer and Date Of Answer
• Average 3.8 days
• Outliers: -7930 days, 7895 days
• Null uin value
• 347671 (23%), mainly old data before 2000
• Null title value
• 202213 (13%), mainly old data before 1993
10. Recent data
• 70,880 questions tabled since Jan 1, 2017
• Answering department
• 36 unique vs. 191 (all data)
• 3 not in triple store vs. 152 (all data)
• 9,644 (13.6%) questions with answering bodies not in triple store vs. 40.1% (all data)
• Asking member
• 1025 unique vs. 2,836 (all data)
• 1,970 (2.8%) missing vs. 0.6% (all data)
• Answering member
• 150 unique vs. 834 (all data)
• 1,970 (2.8%) missing vs. 0.4% (all data)
• Days between Date Tabled and Date Of Answer
• Average 9 days vs. 14 days (all data)
• Days between Date For Answer and Date Of Answer
• Average 2.7 days vs. 3.8 days (all data)
11. Querying data
• Fixed query (packaged SPARQL queries)
• Questions asked by a member
https://api.parliament.uk/query/questions_askedby_member?member_id=4fn7q5Wl
• Questions answered by a member
https://api.parliament.uk/query/questions_answeredby_member?member_id=SWXSOmi9
• Questions search by terms in heading
https://api.parliament.uk/query/questions_search_by_title?lowercase_string=health
• OData (you can query in almost any way!)
• Total number of questions https://api.parliament.uk/OData/Question/$count
• Total number of answers https://api.parliament.uk/OData/Answer/$count
• Questions by a member https://api.parliament.uk/OData/Member('0FqjjgNp')/AskingPersonHasQuestion
• Answers by a member https://api.parliament.uk/OData/Member('0FqjjgNp')/AnsweringPersonHasAnswer
• Questions asked on a date
https://api.parliament.uk/OData/Question?$filter=QuestionAskedAt%20eq%202018-05-23T00:00:00Z
• Questions asked between two dates
https://api.parliament.uk/OData/Question?$filter=QuestionAskedAt%20gt%202018-04-
23T00:00:00Z%20and%20QuestionAskedAt%20lt%202018-04-26T00:00:00Z
• Correcting answers expanded with corrected answers
https://api.parliament.uk/OData/CorrectingAnswer?$expand=AnswerReplacesAnswer
12. Distributions of data
• Follow a power law distribution
0
2000
4000
6000
8000
10000
12000
1
40
79
118
157
196
235
274
313
352
391
430
469
508
547
586
625
664
703
742
781
820
859
898
937
976
1015
1054
1093
1132
1171
1210
1249
1288
1327
1366
1405
1444
1483
1522
1561
1600
1639
1678
1717
1756
1795
1834
1873
1912
1951
1990
2029
2068
2107
2146
2185
2224
2263
2302
2341
2380
2419
2458
2497
2536
2575
2614
2653
2692
2731
2770
2809
Distribution of number of questions for asking members
0
2000
4000
6000
8000
10000
12000
14000
16000
1
16
31
46
61
76
91
106
121
136
151
166
181
196
211
226
241
256
271
286
301
316
331
346
361
376
391
406
421
436
451
466
481
496
511
526
541
556
571
586
601
616
631
646
661
676
691
706
721
736
751
766
781
796
811
826
Distribution of number of questions for answering members
14. 0
200
400
600
800
1000
1200
1400
1600
1800
Distribution of number of questions for tabling date
0
200
400
600
800
1000
1200
1400
1600
1800
1/6/2017 0:00 2/6/2017 0:003/6/2017 0:00 4/6/2017 0:00 5/6/2017 0:00 6/6/2017 0:00 7/6/2017 0:00 8/6/2017 0:00 9/6/2017 0:00 10/6/2017
0:00
11/6/2017
0:00
12/6/2017
0:00
1/6/2018 0:00 2/6/2018 0:003/6/2018 0:00 4/6/2018 0:00 5/6/2018 0:00
Distribution of number of questions for tabling date (January 2017 to Now)
15. 0
500
1000
1500
2000
2500
3000
Distribution of number of questions for answering date
0
200
400
600
800
1000
1200
1/3/2017 0:00 2/3/2017 0:003/3/2017 0:00 4/3/2017 0:00 5/3/2017 0:00 6/3/2017 0:00 7/3/2017 0:00 8/3/2017 0:00 9/3/2017 0:00 10/3/2017
0:00
11/3/2017
0:00
12/3/2017
0:00
1/3/2018 0:00 2/3/2018 0:003/3/2018 0:00 4/3/2018 0:00 5/3/2018 0:00
Distribution of number of questions for answering date (January 2017 to Now)
16. 0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
-40 -20 0 20 40 60 80 100 120 140 160 180
Distribution of number of questions vs. days between date for and of answer
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175
Distribution of number of questions vs. days between table date and date of answer
18. Member question network
• A way to get an overview of question data
• Nodes: 2,893 members
• Edges:
• 175,484 (member A’s question answered by member B)
• Properties of the network (using Python NetworkX)
• Average Node Degree: 121.3
• Network diameter: 6
• Network radius: 3
• Average shortest path length: 2.6
• Clustering coefficient: 0.3
• Network density: 0.04
• Network Centre:
• Earl Attlee, Lord Hylton, Lord Wallace of Saltaire, Lord Stoddart of Swindon, Earl Howe,
Lord Bates, Lord Patten, Lord Pearson of Rannoch, Lord Hoyle, Lord Howell of Guildford,
Earl of Shrewsbury, Lord Davies of Oldham, Baroness Chalker of Wallasey, Lord Braine of
Wheatley, Lord Waddington, Baroness Neville-Rolfe
A
B
C10 5
1
250
23. • We are only scratching the surface of the goldmine
• More question data to import
• Other data fields to import
• Subject indexing and related items data to import
• Other types of data to import
• Much more to learn from the data
• Some ideas
• Incorporate answering departments, and terms and topics in answer
networks
• Improve network visualisation
• Navigation, link direction, weights, zoom in to view details of members etc
• Public can access question data through data platform, and do
fantastic research and discovery!