Andrej Zwitter - The speed of development in Big Data and associated phenomena, such as social media, has surpassed the capacity of the average consumer to understand his or her actions and their knock-on effects. Responses will have to take into account these limitations and shift the responsibility for ethical conduct to the engineering and data scientist side of data driven innovation. A code of conduct is not enough - innovation in a datafied society needs to abide by principles guiding "ethics by design" and responsible use of data.
What is ethics by design? User experience comes first? User safety comes first? User rights come first? All quite unclear propositions. Isn't safety and rights a form of user experience? Could too much care for rights limit the user experience? Isn't the user itself responsible for her own actions?
We must realize that big data, like any other tool, can be used for good and bad purposes. In this sense, the decision by the European Court of Justice against the Safe Harbour Agreement on human rights grounds is understandable.
States, international organizations and private actors now employ big data in a variety of spheres. It is important that all those who profit from big data are aware of their moral responsibility. For this reason, the Data for Humanity Initiative was established, with the goal of disseminating an ethical code of conduct for big data use. This initiative advances five fundamental ethical principles for big data users
States are out of their depth Corporations are steered towards profit. Profit is easier to accumulate with power over customers.
If corporations do not abide by ethical standards, individuals have to be even more ethical.
Indeed, ethics by design would suggest that only so much data is produced and stored as strictly necessary. Because the power that data (and eventually knowledge) gives to the data collectors lacks any checks and balances. However, the lure of omnipotence through omniscience is very powerful. Data captured by sensors around us has become so pervasive and detailed that any human action is captured by sensors in our vicinity whether cellphones, sensors in our cars and our homes etc. The internet of things projects us into the digital world. For now, a lot of this sensory data that does not come from our phones or personal devices is disassociated with our identity. However, RFID chips or similar identifiers in our phones or even our bodies, for example, might allow external sensors to associate any data with individuals. Already today, we are adding to our biological DNA a digital DNA that describes our behavior, our preferences, and our characters. A DNA that is in parts accessible to anyone with the right technical skills. In this sense we have really reached the era of the homo digitalis.
Not going throught these issues in detail. These are complex matters that aim to illustrate that even research has not yet come to a clear cut conclusion.
Biases in datasets Non-intentional biases: sensor placement, conventional collection, sensor unit of analysis, north-south divide Intentional biases: target group, preconceptions, confirmation bias Structural bias: representation as discrimination, but actually result of socio-economic structural discrimination effects represented in the data
Data cleaning - technical standards of cleaning, how to fill gaps, what levels of aggregation/analysis, meta-data and the loss of context
Machine learning - supervised learning - biases in the person Unsupervised learning - biases in the data nudging effects – bias cascade in analysis given two or more datasets that contain reinforcing biases.
Analysis, input-output disconnect: the dashboard problem, looking at the dashboard as a representation of reality rather as an image pained by an artist (data visualization)
Believe in objectiveness of data, analysis etc.
Representation: Visualization is an art rather than a science, its interpretation and action guidance relies solely on the analytic skills of the interpreter and his/her ability to recontextualize the data.
Output action disconnect is the discrepancy between the output, its interpretation and suggested path of action versus the actual path of action.
Action output disconnect is the same discrepancy when feeding action dependent data back into the loop ("evidence based learning") the human factor as the big black box variable.
DSA code Rule 3 shockingly bad: abide by the clients wishes, abide by the law. Too little
I think to regulate the internet is quite impossible without also sacrificing much of its advantages. That there is a tendency of some to use the new tools modern commutation technology provides to gain power over others is an obvious risk. However, it helps to realize that their power is mostly based on having knowledge and controlling knowledge of people. In my opinion, the best way to counteract this kind of power is through the dissemination of knowledge to everyone, from the school kids to the pensioners, from the lawyers to the IT and computer engineers. In essence, the more the average person knows about the technology that shapes their daily life, the more she is able to make conscious choices. This knowledge should also contain what are the ethical baselines of our society. Ethics by design simply means that designing new technology and software engineers have not only to uphold ethical standards of their profession but also ethical standards that pertain to society at large. If this is clear, then engineers in service of the big private corporations will think twice before they implement engineering marvels that might have unethical societal implications.
Data for humanity: 1. “Do no harm”. The digital footprint that everyone now leaves behind exposes individuals, social groups and society as a whole to a certain degree of transparency and vulnerability. Those who have access to the insights afforded by big data must not harm third parties.
2. Ensure that data is used in such a way that the results will foster the peaceful coexistence of humanity. The selection of content and access to data influences the world view of a society. Peaceful coexistence is only possible if data scientists are aware of their responsibility to provide even and unbiased access to data.
3. Use data to help people in need. In addition to being economically beneficial, innovation in the sphere of big data could also create additional social value. In the age of global connectivity, it is now possible to create innovative big data tools which could help to support people in need.
4. Use data to protect nature and reduce pollution of the environment. One of the biggest achievements of big data analysis is the development of efficient processes and synergy effects. Big data can only offer a sustainable economic and social future if such methods are also used to create and maintain a healthy and stable natural environment.
Use data to eliminate discrimination and intolerance and to create a fair system of social coexistence. Social media has created a strengthened social network. This can only lead to long-term global stability if it is built on the principles of fairness, equality and justice.
The engineer is the expert on the subject Opt out are tools that shift the responsibility to the laymen.
"Do not shift expert decisions to laymen"
Take responsibility as an expert and responsibility as a member of society.
Enable that people are clearly presented with a choice
Identify the purpose of the innovation Assess the societal impact of the innovation Avoid/Reduce negative societal impacts Take responsibility for ethical impacts Design Monitor & Evaluate
Big Data and Ethical Innovation
A CODE OF CONDUCT FOR
BIG DATA INNOVATION
DATA INNOVATION AND ETHICS BY DESIGN
HOW DIFFERENT IS BIG DATA
• Proclivities and Differences:
• From Tidy to Messy
• From Causality to Correlation
• From Individuals to Groups (from PII to DII)
• From Individual Agency to Networked Agency
• From Regulation to Nudging
• From Prediction to Prevention
• A code of conduct for Big Data or Data Science in general
DATA HANDLING, ANALYSIS,
• Biases in datasets – non-intentional biases; intentional biases;
• Data cleaning – raw / technically correct / tidy / aggregate / meta –
• Analysis – input-output disconnect
• Machine learning – supervised / unsupervised
• Objectiveness of dataset, cleaning process & analysis
• Values or Graphs: audience dependency, audience ability, interpretation and
• Output-action disconnect (leads further into an action-input disconnect)
HUMAN(E) BIG DATA
• Technical and ethical standards as prerequisites of
• Basic principles of the practice
• Skillful execution
• Loyalty to the trade
• Representation of the trade
• Basic principles of societal ethical standards
• Harm principle
• Trust principle
• Charity principle
DATA SCIENCE CODE OF CONDUCT (DSA)
• Rule 2 – Competence
• Rule 3 – Scope of Data Science Professional Services Between Client and
• Rule 4 – Communication with Clients
• Rule 5 – Confidential Information
• Rule 6 – Conflicts of Interest
• Rule 7 – Duties to Prospective Client
• Rule 8 – Data Science Evidence, Quality of Data and Quality of Evidence
• Rule 9 – Misconduct
DATA FOR HUMANITY
• Big data as a tool in need of rules (Zicari & Zwitter 2015)
• Passive and active duties dependent on the profession
• Do no harm
• Ensure peaceful coexistence
• Help people in need
• Protect the environment
• Eliminate discrimination
UNIVERSAL CODE OF PROFESSIONAL
Do no Harm
- Definition of
- Membership in
- Societal needs
Do your best
Family or Society
and Class of
The right reasons
Intentio Recta &
Living with your deeds
as a group
relation to role in