SlideShare a Scribd company logo
1 of 108
How Graphs Help
Investigative Journalists to
Connect the Dots
Michael.Hunger@neo4j.com
YOW! Conference Australia
December 2019
(Michael Hunger)-[:WORKS_FOR]->(Neo4j)
michael@neo4j.com | @mesirii
Java Champion - Head of Developer Relations @Neo4j
What enables IJ like PanamaPapers?
1. Whistleblower + Data Leak
2. Journalistic Collaboration
3. Technology to handle the data
The Data
Whistleblowers risking a lot to expose the truth
Our world today
• Billions of exchanges
• Messages, data, transactions
• Sometimes hidden in plain sight
• Between people, companies,
organizations, governments
• Which includes criminals
panamapapers.sueddeutsche.de/en/
INSIDE THE 2.6 TBInside the 2.6 TB of Data
The data had everything
Mossack Fonseca – Mosfon – Panama – „Black Hole“
Jürgen Mossack – Ramon Fonseca
Est. 1977 – Data from 1977 to 2015
The offshore model
You Australians are pretty good at it
The Collaboration
The ICIJ - A global trust network
• Individual Reporters
• Working the Paper trail
• Much like an detective story
• Long turnaround times
• Local impact
• Large amounts of data were not easily shareable
Investigative Journalism
Investigative Journalism Today
• Benefit from whistleblowers & leaks
• Sharing a large amount of data
• Use technology and data engineers
• Collaborate globally (Trust!)
• Corrobate suspicions with
other sources locally
• Affects the world at large
• „Golden Age of IJ, never been as important“
Organization of ca. 200 journalists
Based in 65 countries
“Our aim is to bring journalists from different countries
together in teams - eliminating rivalry and promoting
collaboration. Together, we aim to be the
world’s best cross-border investigative team.”
icij.org/about
Collaboration
Supported by OSS
Tools & Encryption
+370 journalists in 80 countries
Panama Papers Timeline
• early 2015
First contact John Doe with SZ
• Spring 2015
Involving ICIJ
• Summer 2015
Start of investigations
• April 4. 2016
Public Launch
#panamapapers
Exposed the offshore holdings
of 12 current and former
world leaders.
Dealings of 128 more
politicians and public officials
around the world.
Exposure of hidden secrets
Main goals, achieved:
1) Uncover the truth
2) Assure whistle blower
safety
The Tech
Behind the ICIJ investigation
INSIDE THE 2.6 TBRemember the 2.6 TB of data?
+370 journalists
+100 media organizations
80 countries
1 Year
Data Team:
3 Data Journalists +
3 Developers !
Who was working on it?
POWER
{}Raw
Text
Raw
Files
?
Meta-Data
Database
Search Discovery
Data Processing
Nuix Investigator
• OCR
• Entity Extraction
• Analytical tools
• Philantrophic Donation
to ICIJ
Nuix is an Australian (founded 2000 Sydney) company focused on data
extraction from unstructured sources.
3 million files
x
10 seconds per
file
=
1 yr / 35 servers
= 1.5 weeks
Nuix
Investigator
Lucene syntax
queries with proximity
matching!
400
users
Disconnected Documents
Context is King name: “John”
last: „Miller“
role: „Negotiator“
name: "Maria"
last: "Osara"name: “Some Media Ltd”
value: “$70M”
PERSON
PERSON$
@
PERSON
PERSON
name: ”Jose"
last: “Pereia“
position: “Governor“
name: “Alice”
last: „Smith“
role: „Advisor“
Context is King
MENTIONS
name: “John”
last: „Miller“
role: „Negotiator“
name: "Maria"
last: "Osara"
since:
Jan 10, 2011
name: “Some Media Ltd”
value: “$70M”
PERSON
PERSON$
@
PERSON
PERSON
name: ”Jose"
last: “Pereia“
position: “Governor“
name: “Alice”
last: „Smith“
role: „Advisor“
Journalists say: „It‘s like Magic“
Need to store and query
our connections!
Real, inferred and integrated
Neo4j
A native graph database
• Manage and store your
connected data as a graph
• Query relationships
easily and quickly
• Evolve model and applications
to support new requirements and
insights
• Built to solve relational pains
Whiteboard to Graph
NODE
key: “value”
properties
Property Graph Model
Nodes
• The entities in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
RELATIONSHIP
NODE NODE
key: “value”
properties
key: “value”
properties
key: “value”
properties$
Neo4j: All about Patterns
(:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"})
KNOWS
Dan Ann
NODE NODE
LABEL PROPERTY
neo4j.com/developer/cypher
LABEL PROPERTY
Neo4j: Create Patterns
CREATE (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"})
KNOWS
Dan Ann
NODE NODE
LABEL PROPERTY
neo4j.com/developer/cypher
LABEL PROPERTY
Cypher: Clauses
CREATE
(:Intermediary {name:“Deutsche Bank“})
-[:REPRESENTS]->(e:Entity {name:“...“})
-[:LOCATED]->(:Address {address:“...“})
-[:IN]->(:Country {name:“PAN“})
Cypher: Find Patterns
MATCH (:Person { name:"Dan"} ) -[:KNOWS]-> (who:Person) RETURN who
KNOWS
Dan ???
LABEL
NODE NODE
LABEL PROPERTY ALIAS ALIAS
neo4j.com/developer/cypher
Cypher: Clauses
MATCH
(o:Officer)-[owns]->(e:Entity)<--(a:Address)
WHERE a.address CONTAINS „Sydney“
RETURN o.name, owns.shares, e.name
Getting Data into Neo4j
• Bulk Load from CSV Files
• Update Graph from
• Web APIs (JSON,XML)
• Other Databases
• CSV Files
• User Activity (Logs, Callbacks)
,
,,
Import Demo – CSV dump
==> /Users/mh/Downloads/panama/import/Addresses.csv <==
address,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID
27 ROSEWOOD DRIVE #16-19 SINGAPORE 737920,6991059DFFB057DF310B9BF31CC4A0E6,The Panama Papers data is current through
2015,SGP,Singapore,14000001,Panama Papers
==> /Users/mh/Downloads/panama/import/Entities.csv <==
name,original_name,former_name,jurisdiction,jurisdiction_description,company_type,address,internal_id,incorporation_date,inactivation_date,
struck_off_date,dorm_date,status,service_provider,ibcRUC,country_codes,countries,note,valid_until,node_id:ID,sourceID
"TIANSHENG INDUSTRY AND TRADING CO., LTD.","TIANSHENG INDUSTRY AND TRADING CO., LTD.",,SAM,Samoa,,ORION HOUSE SERVICES (HK) LIMITED ROOM 1401;
14/F.; WORLD COMMERCE CENTRE; HARBOUR CITY; 7-11 CANTON ROAD; TSIM SHA TSUI; KOWLOON; HONG KONG,1001256,23-MAR-2006,
18-FEB-2013,15-FEB-2013,,Defaulted,Mossack Fonseca,25221,HKG,Hong Kong,,The Panama Papers data is current through 2015,10000001,Panama Papers
==> /Users/mh/Downloads/panama/import/Intermediaries.csv <==
name,internal_id,address,valid_until,country_codes,countries,status,node_id:ID,sourceID
"MICHAEL PAPAGEORGE, MR.",10001,MICHAEL PAPAGEORGE; MR. 106 NICHOLSON STREET BROOKLYN PRETORIA 0002; GAUTENG (PWV) SOUTH AFRICA,
The Panama Papers data is current through 2015,ZAF,South Africa,ACTIVE,11000001,Panama Papers
==> /Users/mh/Downloads/panama/import/Officers.csv <==
name,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID
KIM SOO IN,E72326DEA50F1A9C2876E112AAEB42BC,The Panama Papers data is current through 2015,KOR,"Korea, Republic of",12000001,Panama Papers
==> /Users/mh/Downloads/panama/import/all_edges.csv <==
node_id:START_ID,rel_type:TYPE,node_id:END_ID
11000001,intermediary of,10208879
Import Demo - Run
$NEO4J_HOME/bin/neo4j-import --into $DATA/panama.db
--nodes:Address $DATA/Addresses_fixed.csv
--nodes:Entity $DATA/Entities.csv
--nodes:Intermediary $DATA/Intermediaries.csv
--nodes:Officer $DATA/Officers.csv
--relationships $DATA/all_edges_header.csv,$DATA/all_edges_cleaned.csv
IMPORT DONE in 20s 747ms. Imported:
839434 nodes
1253582 relationships
8211010 properties
+-----------------------------+
| labels(n) | count(*) |
+-----------------------------+
| ["Officer"] | 344455 |
| ["Entity"] | 319150 |
| ["Address"] | 151054 |
| ["Intermediary"] | 23636 |
+-----------------------------+
The Basic ICIJ Data Model
The Real ICIJ Data Model
Visualized with Linkurious UI
Data is available
Data is available
Data is available
• 785,000 Offshore Leaks Companies from several investigations
• For online browsing and visualization
• offshoreleaks.icij.org
• As CSV dump download
• As Neo4j Database download
• offshoreleaks.icij.org/pages/database
• As Neo4j sandboxes sandbox.neo4j.com
Data exposed as interactive Visualization
• Public figures and leaders
• Different shell companies & involvements
Try it yourself? sandbox.neo4j.com
Demo Time
sandbox.neo4j.com
More steps for the ICIJ and all of us
• Data integration with other sources
• Entity extraction
• Email pattern analysis
• Content & Data mining
• Machine learning
• Alerts with real time news / social media
• Investigative recommendations
• Active search for new sources ...
Current Investigation of Mossack Fonseca
Jürgen Mossack – Ramon Fonseca
Arrested, free on bail, ongoing investigations
• Flow to US tax havens
• Transparancy laws UK, EU
• $1.3bn taxes recouped
publicly
• Investigations into banks, public figures, companies
• Criminal cases solved & ongoing
World Wide Results of the Offshore leaks investigations
icij.org/investigations/panama-papers/panama-papers-helps-recover-more-than-1-2-billion-around-the-world/
Australia
Most recent SEB Bank Investigation in Sweden
• Operations of
Nordic banks
In Baltic states
• Danske Bank,
Swedbank, SEB
• Resignations,
Investigations
• Straw-men mentioned in reporting can be found
in offshore leaks db
• SEB bank in Nov 2019 !
Murder Daphne Caruana Galizia
• Maltese Investigative
Reporter
• Reported on Panama Papers
appearances of influential
Maltese Politicians
• Murdered Oct 16 2017
with car bomb
• „The Daphne Report“
• PM finally resigning
en.wikipedia.org/wiki/Daphne_Caruana_Galizia
Murder Jan Kuciak
• Slovak Investigative
Reporter
• Reported on criminal
behavior of businessmen
• Jan & fiancé shot Feb 2018
• Massive reactions +
political crisis
• PM and several ministers
resigned
en.wikipedia.org/wiki/Murder_of_J%C3%A1n_Kuciak
The ICIJ didn’t stop there
#BahamasLeak
Read & Watch More
How can you investigate
large complex data
using Graphs ?
Apply full set of available tools.
Source: John Swain - Twitter Analytics Right Relevance Talk
Russia Twitter Trolls
democrats-intelligence.house.gov/uploadedfiles/exhibit_b.pdf
● 2752 Twitter accounts tied to Russia’s
Internet Research Agency
● Accounts suspended by Twitter
○ Data deleted
● What were they tweeting about?
Internet Research Agency
345k Tweets, 41k Users (454 Russian Trolls)
Your typical American Citizen?
Your typical Local News Publication?
Your typical Local Political Party?
@LeroyLovesUSA
@TEN_GOP
@ClevelandOnline
Your typical Russian Troll
Your typical Russian Troll
Your typical Russian Troll
@LeroyLovesUSA
@TEN_GOP
@ClevelandOnline
Natural Language Processing
With Cypher and Neo4j
AnnotationsNLP w/ Graph Databases
AnnotationsNLP w/ Graph Databases
NLP
Process
http://www.lyonwj.com/2017/11/15/entity-extraction-russian-troll-tweets-neo4j/
Graph Algorithms
Gain new insights from context & topology
Graph & ML Algorithms in Neo4j+35
neo4j.com/
graph-algorithms-
book/
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity
Neo4j
Native Graph
Database
Analytics
Integrations
Cypher Query
Language
Wide Range of
APOC Procedures
Optimized
Graph Algorithms
Inferred Relationships
AMPLIFIED
MATCH (r1:Troll)-[:POSTED]->(:Tweet)
<-[:RETWEETED]-(:Tweet)
<-[:POSTED]-(r2:Troll)
WHERE r1 <> r2
WITH r1,r2, count(*) as freq
MERGE (r2)-[a:AMPLIFIED]->(r1)
SET a.weight = freq
PageRank on AMPLIFIED Graph
CALL algo.pageRank('Troll', 'AMPLIFIED')
MATCH (t:Troll)
WITH t ORDER BY t.pagerank DESC LIMIT 1
MATCH path = (t)-[:AMPLIFIED*2]-()
RETURN path
PageRank on AMPLIFIED Graph
Graph Visualization
Graph Visualizations
Centrality & community detection
AMPLIFIED relationships
Node size → PageRank
Color → community detection
Rel Thickness → weight
Graph Visualization
github.com/neo4j-contrib/neovis.js
DIY?
Neo4j Drivers & Integrations
• Drivers for most
programming languages
• Bolt: binary wire protocol
• Out-of-the-box integrations for
Spring Data, GraphQL, Kafka
• Pluggable into rich data
visualization frameworks
JavaScript Java .NET Python GO, ....
Drivers
Bolt
neo4j.com/developer/language-guides/
Minimal Example: JavaScript -> Visualization
driver.session()
.run(`MATCH (n:Troll)-[:AMPLIFIED]->(m)
RETURN id(n) as source, id(m) as target`)
.then(function (result) {
const links = result.records.map(r => {
return {source:r.get('source').toNumber(),
target:r.get('target').toNumber()}});
session.close();
const ids = new Set();
links.forEach(l => {ids.add(l.source);ids.add(l.target);});
const nodes = Array.from(ids).map(id => {return {id:id}})
const graphData = { nodes: nodes, links: links};
const elem = document.getElementById('3d-graph');
ForceGraph3D()(elem).graphData(graphData);
}) medium.com/neo4j/tagged/data-visualization
Twitter Import
gist.github.com/jexp/
dc59ea550186d49e5e17ff3a08d5ec5b
Tools for Investigative
Journalists
ICIJ Datashare
• datashare.icij.org
• Local installation
• Collaboration
• Text extraction
• Entity Recognition
github.com/ICIJ/datashare
OCCRP Investigative Dashboard
Browse / Search / Visualize
OCCRP Aleph
173M entries from271 datasets aleph.occrp.org github.com/alephdata
GraphCommons
• Graph based modeling for
researchers and journalists
• Intuitive, collaborative
Graph creation
• Embedding in Websites
Encourage Sharing
What will YOU connect?
• User and Social Networks ?
• Money, Accounts, Contracts ?
• Products, Prices, Reviews, Tags ?
• Software, Dependencies, Services ?
• Machines, Devices, Sensors ?
• Genes, Proteins, Reactions ?
• Laws, Regulations ?
neo4j.com/
books
Want to learn more? - Free ebooks!
Thank you! Questions?
Learn more:
neo4j.com/developer
Me: @mesirii | @neo4j

More Related Content

Similar to How Graphs Help Investigative Journalists Connect the Dots

Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksEC-Council
 
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestSylvain Carle
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksEC-Council
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Open Data Innovation from GEO DATA Perspective
Open Data Innovation from GEO DATA  PerspectiveOpen Data Innovation from GEO DATA  Perspective
Open Data Innovation from GEO DATA PerspectiveSerdar Temiz
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019Neo4j
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATAJ T "Tom" Johnson
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataMartin Kaltenböck
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectPRELIDA Project
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsSloan Carne
 

Similar to How Graphs Help Investigative Journalists Connect the Dots (20)

Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
Here Comes Everything
Here Comes EverythingHere Comes Everything
Here Comes Everything
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael Banks
 
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark QuinslandData Con LA 2018 - From the Panama Papers by Mark Quinsland
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Spark
SparkSpark
Spark
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFest
 
Defending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael BanksDefending Against 1,000,000 Cyber Attacks by Michael Banks
Defending Against 1,000,000 Cyber Attacks by Michael Banks
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Open Data Innovation from GEO DATA Perspective
Open Data Innovation from GEO DATA  PerspectiveOpen Data Innovation from GEO DATA  Perspective
Open Data Innovation from GEO DATA Perspective
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
It's not the documents; it's the DATA
It's not the documents; it's the DATAIt's not the documents; it's the DATA
It's not the documents; it's the DATA
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 

More from jexp

Looming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdfLooming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdfjexp
 
Easing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line toolsEasing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line toolsjexp
 
Looming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in JavaLooming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in Javajexp
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxjexp
 
Neo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesNeo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesjexp
 
The Home Office. Does it really work?
The Home Office. Does it really work?The Home Office. Does it really work?
The Home Office. Does it really work?jexp
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVMjexp
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Editionjexp
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jjexp
 
A Game of Data and GraphQL
A Game of Data and GraphQLA Game of Data and GraphQL
A Game of Data and GraphQLjexp
 
Querying Graphs with GraphQL
Querying Graphs with GraphQLQuerying Graphs with GraphQL
Querying Graphs with GraphQLjexp
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Futurejexp
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metricsjexp
 

More from jexp (20)

Looming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdfLooming Marvelous - Virtual Threads in Java Javaland.pdf
Looming Marvelous - Virtual Threads in Java Javaland.pdf
 
Easing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line toolsEasing the daily grind with the awesome JDK command line tools
Easing the daily grind with the awesome JDK command line tools
 
Looming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in JavaLooming Marvelous - Virtual Threads in Java
Looming Marvelous - Virtual Threads in Java
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Neo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFilesNeo4j Connector Apache Spark FiNCENFiles
Neo4j Connector Apache Spark FiNCENFiles
 
The Home Office. Does it really work?
The Home Office. Does it really work?The Home Office. Does it really work?
The Home Office. Does it really work?
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVM
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
 
Refactoring, 2nd Edition
Refactoring, 2nd EditionRefactoring, 2nd Edition
Refactoring, 2nd Edition
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4j
 
A Game of Data and GraphQL
A Game of Data and GraphQLA Game of Data and GraphQL
A Game of Data and GraphQL
 
Querying Graphs with GraphQL
Querying Graphs with GraphQLQuerying Graphs with GraphQL
Querying Graphs with GraphQL
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Future
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metrics
 

Recently uploaded

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Recently uploaded (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

How Graphs Help Investigative Journalists Connect the Dots

  • 1. How Graphs Help Investigative Journalists to Connect the Dots Michael.Hunger@neo4j.com YOW! Conference Australia December 2019
  • 2. (Michael Hunger)-[:WORKS_FOR]->(Neo4j) michael@neo4j.com | @mesirii Java Champion - Head of Developer Relations @Neo4j
  • 3. What enables IJ like PanamaPapers? 1. Whistleblower + Data Leak 2. Journalistic Collaboration 3. Technology to handle the data
  • 4.
  • 5. The Data Whistleblowers risking a lot to expose the truth
  • 6. Our world today • Billions of exchanges • Messages, data, transactions • Sometimes hidden in plain sight • Between people, companies, organizations, governments • Which includes criminals
  • 8.
  • 9.
  • 10.
  • 11. INSIDE THE 2.6 TBInside the 2.6 TB of Data
  • 12. The data had everything
  • 13. Mossack Fonseca – Mosfon – Panama – „Black Hole“ Jürgen Mossack – Ramon Fonseca Est. 1977 – Data from 1977 to 2015
  • 14.
  • 16. You Australians are pretty good at it
  • 17. The Collaboration The ICIJ - A global trust network
  • 18. • Individual Reporters • Working the Paper trail • Much like an detective story • Long turnaround times • Local impact • Large amounts of data were not easily shareable Investigative Journalism
  • 19. Investigative Journalism Today • Benefit from whistleblowers & leaks • Sharing a large amount of data • Use technology and data engineers • Collaborate globally (Trust!) • Corrobate suspicions with other sources locally • Affects the world at large • „Golden Age of IJ, never been as important“
  • 20. Organization of ca. 200 journalists Based in 65 countries “Our aim is to bring journalists from different countries together in teams - eliminating rivalry and promoting collaboration. Together, we aim to be the world’s best cross-border investigative team.” icij.org/about
  • 22.
  • 23. +370 journalists in 80 countries
  • 24. Panama Papers Timeline • early 2015 First contact John Doe with SZ • Spring 2015 Involving ICIJ • Summer 2015 Start of investigations • April 4. 2016 Public Launch
  • 26. Exposed the offshore holdings of 12 current and former world leaders. Dealings of 128 more politicians and public officials around the world. Exposure of hidden secrets
  • 27. Main goals, achieved: 1) Uncover the truth 2) Assure whistle blower safety
  • 28. The Tech Behind the ICIJ investigation
  • 29. INSIDE THE 2.6 TBRemember the 2.6 TB of data?
  • 30. +370 journalists +100 media organizations 80 countries 1 Year Data Team: 3 Data Journalists + 3 Developers ! Who was working on it?
  • 32. Nuix Investigator • OCR • Entity Extraction • Analytical tools • Philantrophic Donation to ICIJ Nuix is an Australian (founded 2000 Sydney) company focused on data extraction from unstructured sources.
  • 33. 3 million files x 10 seconds per file = 1 yr / 35 servers = 1.5 weeks Nuix Investigator
  • 34. Lucene syntax queries with proximity matching! 400 users
  • 36. Context is King name: “John” last: „Miller“ role: „Negotiator“ name: "Maria" last: "Osara"name: “Some Media Ltd” value: “$70M” PERSON PERSON$ @ PERSON PERSON name: ”Jose" last: “Pereia“ position: “Governor“ name: “Alice” last: „Smith“ role: „Advisor“
  • 37. Context is King MENTIONS name: “John” last: „Miller“ role: „Negotiator“ name: "Maria" last: "Osara" since: Jan 10, 2011 name: “Some Media Ltd” value: “$70M” PERSON PERSON$ @ PERSON PERSON name: ”Jose" last: “Pereia“ position: “Governor“ name: “Alice” last: „Smith“ role: „Advisor“
  • 38.
  • 39. Journalists say: „It‘s like Magic“
  • 40.
  • 41. Need to store and query our connections! Real, inferred and integrated
  • 42. Neo4j A native graph database • Manage and store your connected data as a graph • Query relationships easily and quickly • Evolve model and applications to support new requirements and insights • Built to solve relational pains
  • 44. NODE key: “value” properties Property Graph Model Nodes • The entities in the graph • Can have name-value properties • Can be labeled Relationships • Relate nodes by type and direction • Can have name-value properties RELATIONSHIP NODE NODE key: “value” properties key: “value” properties key: “value” properties$
  • 45. Neo4j: All about Patterns (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"}) KNOWS Dan Ann NODE NODE LABEL PROPERTY neo4j.com/developer/cypher LABEL PROPERTY
  • 46. Neo4j: Create Patterns CREATE (:Person { name:"Dan"} ) -[:KNOWS]-> (:Person {name:"Ann"}) KNOWS Dan Ann NODE NODE LABEL PROPERTY neo4j.com/developer/cypher LABEL PROPERTY
  • 47. Cypher: Clauses CREATE (:Intermediary {name:“Deutsche Bank“}) -[:REPRESENTS]->(e:Entity {name:“...“}) -[:LOCATED]->(:Address {address:“...“}) -[:IN]->(:Country {name:“PAN“})
  • 48. Cypher: Find Patterns MATCH (:Person { name:"Dan"} ) -[:KNOWS]-> (who:Person) RETURN who KNOWS Dan ??? LABEL NODE NODE LABEL PROPERTY ALIAS ALIAS neo4j.com/developer/cypher
  • 49. Cypher: Clauses MATCH (o:Officer)-[owns]->(e:Entity)<--(a:Address) WHERE a.address CONTAINS „Sydney“ RETURN o.name, owns.shares, e.name
  • 50. Getting Data into Neo4j • Bulk Load from CSV Files • Update Graph from • Web APIs (JSON,XML) • Other Databases • CSV Files • User Activity (Logs, Callbacks) , ,,
  • 51. Import Demo – CSV dump ==> /Users/mh/Downloads/panama/import/Addresses.csv <== address,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID 27 ROSEWOOD DRIVE #16-19 SINGAPORE 737920,6991059DFFB057DF310B9BF31CC4A0E6,The Panama Papers data is current through 2015,SGP,Singapore,14000001,Panama Papers ==> /Users/mh/Downloads/panama/import/Entities.csv <== name,original_name,former_name,jurisdiction,jurisdiction_description,company_type,address,internal_id,incorporation_date,inactivation_date, struck_off_date,dorm_date,status,service_provider,ibcRUC,country_codes,countries,note,valid_until,node_id:ID,sourceID "TIANSHENG INDUSTRY AND TRADING CO., LTD.","TIANSHENG INDUSTRY AND TRADING CO., LTD.",,SAM,Samoa,,ORION HOUSE SERVICES (HK) LIMITED ROOM 1401; 14/F.; WORLD COMMERCE CENTRE; HARBOUR CITY; 7-11 CANTON ROAD; TSIM SHA TSUI; KOWLOON; HONG KONG,1001256,23-MAR-2006, 18-FEB-2013,15-FEB-2013,,Defaulted,Mossack Fonseca,25221,HKG,Hong Kong,,The Panama Papers data is current through 2015,10000001,Panama Papers ==> /Users/mh/Downloads/panama/import/Intermediaries.csv <== name,internal_id,address,valid_until,country_codes,countries,status,node_id:ID,sourceID "MICHAEL PAPAGEORGE, MR.",10001,MICHAEL PAPAGEORGE; MR. 106 NICHOLSON STREET BROOKLYN PRETORIA 0002; GAUTENG (PWV) SOUTH AFRICA, The Panama Papers data is current through 2015,ZAF,South Africa,ACTIVE,11000001,Panama Papers ==> /Users/mh/Downloads/panama/import/Officers.csv <== name,icij_id,valid_until,country_codes,countries,node_id:ID,sourceID KIM SOO IN,E72326DEA50F1A9C2876E112AAEB42BC,The Panama Papers data is current through 2015,KOR,"Korea, Republic of",12000001,Panama Papers ==> /Users/mh/Downloads/panama/import/all_edges.csv <== node_id:START_ID,rel_type:TYPE,node_id:END_ID 11000001,intermediary of,10208879
  • 52. Import Demo - Run $NEO4J_HOME/bin/neo4j-import --into $DATA/panama.db --nodes:Address $DATA/Addresses_fixed.csv --nodes:Entity $DATA/Entities.csv --nodes:Intermediary $DATA/Intermediaries.csv --nodes:Officer $DATA/Officers.csv --relationships $DATA/all_edges_header.csv,$DATA/all_edges_cleaned.csv IMPORT DONE in 20s 747ms. Imported: 839434 nodes 1253582 relationships 8211010 properties +-----------------------------+ | labels(n) | count(*) | +-----------------------------+ | ["Officer"] | 344455 | | ["Entity"] | 319150 | | ["Address"] | 151054 | | ["Intermediary"] | 23636 | +-----------------------------+
  • 53. The Basic ICIJ Data Model
  • 54. The Real ICIJ Data Model
  • 56. Data is available Data is available
  • 57. Data is available • 785,000 Offshore Leaks Companies from several investigations • For online browsing and visualization • offshoreleaks.icij.org • As CSV dump download • As Neo4j Database download • offshoreleaks.icij.org/pages/database • As Neo4j sandboxes sandbox.neo4j.com
  • 58.
  • 59. Data exposed as interactive Visualization • Public figures and leaders • Different shell companies & involvements
  • 60. Try it yourself? sandbox.neo4j.com
  • 62.
  • 63. More steps for the ICIJ and all of us • Data integration with other sources • Entity extraction • Email pattern analysis • Content & Data mining • Machine learning • Alerts with real time news / social media • Investigative recommendations • Active search for new sources ...
  • 64. Current Investigation of Mossack Fonseca Jürgen Mossack – Ramon Fonseca Arrested, free on bail, ongoing investigations
  • 65. • Flow to US tax havens • Transparancy laws UK, EU • $1.3bn taxes recouped publicly • Investigations into banks, public figures, companies • Criminal cases solved & ongoing World Wide Results of the Offshore leaks investigations icij.org/investigations/panama-papers/panama-papers-helps-recover-more-than-1-2-billion-around-the-world/
  • 67. Most recent SEB Bank Investigation in Sweden • Operations of Nordic banks In Baltic states • Danske Bank, Swedbank, SEB • Resignations, Investigations • Straw-men mentioned in reporting can be found in offshore leaks db • SEB bank in Nov 2019 !
  • 68. Murder Daphne Caruana Galizia • Maltese Investigative Reporter • Reported on Panama Papers appearances of influential Maltese Politicians • Murdered Oct 16 2017 with car bomb • „The Daphne Report“ • PM finally resigning en.wikipedia.org/wiki/Daphne_Caruana_Galizia
  • 69. Murder Jan Kuciak • Slovak Investigative Reporter • Reported on criminal behavior of businessmen • Jan & fiancé shot Feb 2018 • Massive reactions + political crisis • PM and several ministers resigned en.wikipedia.org/wiki/Murder_of_J%C3%A1n_Kuciak
  • 70. The ICIJ didn’t stop there #BahamasLeak
  • 71. Read & Watch More
  • 72. How can you investigate large complex data using Graphs ? Apply full set of available tools.
  • 73. Source: John Swain - Twitter Analytics Right Relevance Talk
  • 74.
  • 75. Russia Twitter Trolls democrats-intelligence.house.gov/uploadedfiles/exhibit_b.pdf ● 2752 Twitter accounts tied to Russia’s Internet Research Agency ● Accounts suspended by Twitter ○ Data deleted ● What were they tweeting about?
  • 77. 345k Tweets, 41k Users (454 Russian Trolls)
  • 78. Your typical American Citizen? Your typical Local News Publication? Your typical Local Political Party? @LeroyLovesUSA @TEN_GOP @ClevelandOnline
  • 79. Your typical Russian Troll Your typical Russian Troll Your typical Russian Troll @LeroyLovesUSA @TEN_GOP @ClevelandOnline
  • 80.
  • 81.
  • 84. AnnotationsNLP w/ Graph Databases NLP Process
  • 86. Graph Algorithms Gain new insights from context & topology
  • 87. Graph & ML Algorithms in Neo4j+35 neo4j.com/ graph-algorithms- book/ Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
  • 88. Neo4j Native Graph Database Analytics Integrations Cypher Query Language Wide Range of APOC Procedures Optimized Graph Algorithms
  • 90. MATCH (r1:Troll)-[:POSTED]->(:Tweet) <-[:RETWEETED]-(:Tweet) <-[:POSTED]-(r2:Troll) WHERE r1 <> r2 WITH r1,r2, count(*) as freq MERGE (r2)-[a:AMPLIFIED]->(r1) SET a.weight = freq PageRank on AMPLIFIED Graph
  • 91. CALL algo.pageRank('Troll', 'AMPLIFIED') MATCH (t:Troll) WITH t ORDER BY t.pagerank DESC LIMIT 1 MATCH path = (t)-[:AMPLIFIED*2]-() RETURN path PageRank on AMPLIFIED Graph
  • 92.
  • 94. Graph Visualizations Centrality & community detection AMPLIFIED relationships Node size → PageRank Color → community detection Rel Thickness → weight
  • 96. DIY?
  • 97. Neo4j Drivers & Integrations • Drivers for most programming languages • Bolt: binary wire protocol • Out-of-the-box integrations for Spring Data, GraphQL, Kafka • Pluggable into rich data visualization frameworks JavaScript Java .NET Python GO, .... Drivers Bolt neo4j.com/developer/language-guides/
  • 98. Minimal Example: JavaScript -> Visualization driver.session() .run(`MATCH (n:Troll)-[:AMPLIFIED]->(m) RETURN id(n) as source, id(m) as target`) .then(function (result) { const links = result.records.map(r => { return {source:r.get('source').toNumber(), target:r.get('target').toNumber()}}); session.close(); const ids = new Set(); links.forEach(l => {ids.add(l.source);ids.add(l.target);}); const nodes = Array.from(ids).map(id => {return {id:id}}) const graphData = { nodes: nodes, links: links}; const elem = document.getElementById('3d-graph'); ForceGraph3D()(elem).graphData(graphData); }) medium.com/neo4j/tagged/data-visualization
  • 99.
  • 102. ICIJ Datashare • datashare.icij.org • Local installation • Collaboration • Text extraction • Entity Recognition github.com/ICIJ/datashare
  • 103. OCCRP Investigative Dashboard Browse / Search / Visualize
  • 104. OCCRP Aleph 173M entries from271 datasets aleph.occrp.org github.com/alephdata
  • 105. GraphCommons • Graph based modeling for researchers and journalists • Intuitive, collaborative Graph creation • Embedding in Websites Encourage Sharing
  • 106. What will YOU connect? • User and Social Networks ? • Money, Accounts, Contracts ? • Products, Prices, Reviews, Tags ? • Software, Dependencies, Services ? • Machines, Devices, Sensors ? • Genes, Proteins, Reactions ? • Laws, Regulations ?
  • 107. neo4j.com/ books Want to learn more? - Free ebooks!
  • 108. Thank you! Questions? Learn more: neo4j.com/developer Me: @mesirii | @neo4j