Andrej Zwitter - The speed of development in Big Data and associated phenomena, such as social media, has surpassed the capacity of the average consumer to understand his or her actions and their knock-on effects. Responses will have to take into account these limitations and shift the responsibility for ethical conduct to the engineering and data scientist side of data driven innovation. A code of conduct is not enough - innovation in a datafied society needs to abide by principles guiding "ethics by design" and responsible use of data.
1. A CODE OF CONDUCT FOR
BIG DATA INNOVATION
DATA INNOVATION AND ETHICS BY DESIGN
2. HOW DIFFERENT IS BIG DATA
• Proclivities and Differences:
• From Tidy to Messy
• From Causality to Correlation
• From Individuals to Groups (from PII to DII)
• From Individual Agency to Networked Agency
• From Regulation to Nudging
• From Prediction to Prevention
• A code of conduct for Big Data or Data Science in general
3. DATA HANDLING, ANALYSIS,
REPRESENTATION
• Biases in datasets – non-intentional biases; intentional biases;
structural biases
• Data cleaning – raw / technically correct / tidy / aggregate / meta –
DATA
• Analysis – input-output disconnect
• Machine learning – supervised / unsupervised
• Objectiveness of dataset, cleaning process & analysis
• Representation
• Values or Graphs: audience dependency, audience ability, interpretation and
application
• Output-action disconnect (leads further into an action-input disconnect)
4. HUMAN(E) BIG DATA
• Technical and ethical standards as prerequisites of
professionalism
• Basic principles of the practice
• Skillful execution
• Loyalty to the trade
• Representation of the trade
• Basic principles of societal ethical standards
• Harm principle
• Trust principle
• Charity principle
5. DATA SCIENCE CODE OF CONDUCT (DSA)
• Rule 2 – Competence
• Rule 3 – Scope of Data Science Professional Services Between Client and
Data Scientist
• Rule 4 – Communication with Clients
• Rule 5 – Confidential Information
• Rule 6 – Conflicts of Interest
• Rule 7 – Duties to Prospective Client
• Rule 8 – Data Science Evidence, Quality of Data and Quality of Evidence
• Rule 9 – Misconduct
6. DATA FOR HUMANITY
• Big data as a tool in need of rules (Zicari & Zwitter 2015)
• Passive and active duties dependent on the profession
• Do no harm
• Ensure peaceful coexistence
• Help people in need
• Protect the environment
• Eliminate discrimination
http://www.bigdata.uni-frankfurt.de/dataforhumanity/
7. UNIVERSAL CODE OF PROFESSIONAL
CONDUCT
Do no Harm
Harm contingent
on:
- Definition of
Society
- Membership in
Society
- Societal needs
are more
pressing
Do your best
Contractual
Obligation to:
Employer and
Clients
Natural
obligation to:
Family or Society
and Class of
Work
The right reasons
Intentio Recta &
Virtues:
•Justice
•Prudence
•Courage
•Temperance
Living with your deeds
Ex ante
perspective on
decisions
individually and
as a group
Living with
society
Reciprocal and
relative
responsibility in
relation to role in
society
(Trust)
(Leadership)
(Decision-
making-power)
What is ethics by design?
User experience comes first?
User safety comes first?
User rights come first?
All quite unclear propositions.
Isn't safety and rights a form of user experience?
Could too much care for rights limit the user experience?
Isn't the user itself responsible for her own actions?
We must realize that big data, like any other tool, can be used for good and bad purposes. In this sense, the decision by the European Court of Justice against the Safe Harbour Agreement on human rights grounds is understandable.
States, international organizations and private actors now employ big data in a variety of spheres. It is important that all those who profit from big data are aware of their moral responsibility. For this reason, the Data for Humanity Initiative was established, with the goal of disseminating an ethical code of conduct for big data use. This initiative advances five fundamental ethical principles for big data users
States are out of their depth
Corporations are steered towards profit. Profit is easier to accumulate with power over customers.
If corporations do not abide by ethical standards, individuals have to be even more ethical.
Indeed, ethics by design would suggest that only so much data is produced and stored as strictly necessary. Because the power that data (and eventually knowledge) gives to the data collectors lacks any checks and balances. However, the lure of omnipotence through omniscience is very powerful. Data captured by sensors around us has become so pervasive and detailed that any human action is captured by sensors in our vicinity whether cellphones, sensors in our cars and our homes etc. The internet of things projects us into the digital world. For now, a lot of this sensory data that does not come from our phones or personal devices is disassociated with our identity. However, RFID chips or similar identifiers in our phones or even our bodies, for example, might allow external sensors to associate any data with individuals. Already today, we are adding to our biological DNA a digital DNA that describes our behavior, our preferences, and our characters. A DNA that is in parts accessible to anyone with the right technical skills. In this sense we have really reached the era of the homo digitalis.
Not going throught these issues in detail. These are complex matters that aim to illustrate that even research has not yet come to a clear cut conclusion.
Biases in datasets
Non-intentional biases: sensor placement, conventional collection, sensor unit of analysis, north-south divide
Intentional biases: target group, preconceptions, confirmation bias
Structural bias: representation as discrimination, but actually result of socio-economic structural discrimination effects represented in the data
Data cleaning -
technical standards of cleaning, how to fill gaps, what levels of aggregation/analysis, meta-data and the loss of context
Machine learning -
supervised learning - biases in the person
Unsupervised learning - biases in the data
nudging effects – bias cascade in analysis given two or more datasets that contain reinforcing biases.
Analysis, input-output disconnect: the dashboard problem, looking at the dashboard as a representation of reality rather as an image pained by an artist (data visualization)
Believe in objectiveness of data, analysis etc.
Representation:
Visualization is an art rather than a science, its interpretation and action guidance relies solely on the analytic skills of the interpreter and his/her ability to recontextualize the data.
Output action disconnect is the discrepancy between the output, its interpretation and suggested path of action versus the actual path of action.
Action output disconnect is the same discrepancy when feeding action dependent data back into the loop ("evidence based learning") the human factor as the big black box variable.
Any social media platform is governing our behavior through code and through terms of use. Together, codes and terms of use have become the laws that regulate the cyber space. At the same time the amount of information that private corporations are collecting about us is staggering. This of course includes the Facebooks and Googles, but it also includes all the data warehouses and entities that collect and sell information about anyone in bulk to advertisers, but also sometimes to criminals. Of the latter entities, the average person knows hardly anything. These private corporations have increasingly gained the power to inform what we know and how we feel about it (think of Facebook's Mood study) - this results in what we call Big Nudging (that is the engineering of desired behavior through stimuli based on insights gained into people's preferences). Similarly government aim to gain the same power. Governments we can to some degree control through democratic mechanisms. Private companies cannot be controlled in the same manner. One way to control them is by disseminating information about their (sometimes unethical) actions to the wider public in order to pressure them through market mechanisms. Another is for states to enforce upon private corporations to uphold the same human rights standards. The latter has two problems: 1. Corporations have a tendency to escape national jurisdictions by providing services from other countries with less rigid regulations (the internet knows no boarders); 2. With the ascent of Big Data and Big Nudging, what we need are new conceptions and rights that do not yet exist, such as group privacy (i.e. rights of groups against group profiling and invasion of their collective, shared privacy).
Most of all, however, we need people with a strong ethical compass, who put the wellbeing and freedom of individuals and of society above all, to lead these big corporations. This will mean to put ethics by design before financial gains.
DSA code
Rule 3 shockingly bad: abide by the clients wishes, abide by the law. Too little
I think to regulate the internet is quite impossible without also sacrificing much of its advantages. That there is a tendency of some to use the new tools modern commutation technology provides to gain power over others is an obvious risk. However, it helps to realize that their power is mostly based on having knowledge and controlling knowledge of people. In my opinion, the best way to counteract this kind of power is through the dissemination of knowledge to everyone, from the school kids to the pensioners, from the lawyers to the IT and computer engineers. In essence, the more the average person knows about the technology that shapes their daily life, the more she is able to make conscious choices. This knowledge should also contain what are the ethical baselines of our society. Ethics by design simply means that designing new technology and software engineers have not only to uphold ethical standards of their profession but also ethical standards that pertain to society at large. If this is clear, then engineers in service of the big private corporations will think twice before they implement engineering marvels that might have unethical societal implications.
Data for humanity:
1. “Do no harm”. The digital footprint that everyone now leaves behind exposes individuals, social groups and society as a whole to a certain degree of transparency and vulnerability. Those who have access to the insights afforded by big data must not harm third parties.
2. Ensure that data is used in such a way that the results will foster the peaceful coexistence of humanity. The selection of content and access to data influences the world view of a society. Peaceful coexistence is only possible if data scientists are aware of their responsibility to provide even and unbiased access to data.
3. Use data to help people in need. In addition to being economically beneficial, innovation in the sphere of big data could also create additional social value. In the age of global connectivity, it is now possible to create innovative big data tools which could help to support people in need.
4. Use data to protect nature and reduce pollution of the environment. One of the biggest achievements of big data analysis is the development of efficient processes and synergy effects. Big data can only offer a sustainable economic and social future if such methods are also used to create and maintain a healthy and stable natural environment.
Use data to eliminate discrimination and intolerance and to create a fair system of social coexistence. Social media has created a strengthened social network. This can only lead to long-term global stability if it is built on the principles of fairness, equality and justice.
The engineer is the expert on the subject
Opt out are tools that shift the responsibility to the laymen.
"Do not shift expert decisions to laymen"
Take responsibility as an expert and responsibility as a member of society.
Enable that people are clearly presented with a choice
Identify the purpose of the innovation
Assess the societal impact of the innovation
Avoid/Reduce negative societal impacts
Take responsibility for ethical impacts
Design
Monitor & Evaluate