The graph/network domain has been driven by the creativity of numerous individuals from disparate areas of the academic and the commercial sector. Examples of contributing academic disciplines include mathematics, physics, sociology, and computer science. Given the interdisciplinary nature of the domain, it is difficult for any single individual to objectively realize and speak about the space as a whole. Any presentation of the ideas is ultimately biased by the formal training and expertise of the individual. For this reason, I will simply present on the domain from my perspective---from my personal experiences. More specifically, from my perspective biased by cognitive and computer science.
This is an autobiographical lecture on my life (so far) with graphs/networks.
1. A Perspective on
Graph Theory and Network Science
Marko A. Rodriguez
http://markorodriguez.com
http://twitter.com/twarko
http://www.slideshare.net/slidarko
Santa Fe Public School District – Santa Fe, New Mexico – July 6, 2010
July 5, 2010
2. Abstract
The graph/network domain has been driven by the creativity of numerous
individuals from disparate areas of the academic and the commercial
sector. Examples of contributing academic disciplines include mathematics,
physics, sociology, and computer science. Given the interdisciplinary nature
of the domain, it is difficult for any single individual to objectively realize
and speak about the space as a whole. Any presentation of the ideas is
ultimately biased by the formal training and expertise of the individual. For
this reason, I will simply present on the domain from my
perspective—from my personal experiences. More specifically, from my
perspective biased by cognitive and computer science.
This is an autobiographical lecture on my life (so far) with
graphs/networks.
3. The Graph/Network
The term graph is used primarily in mathematics and the term network is used primarily
in physics. Both refer to a type of structure in which there exists vertices (i.e. nodes,
dots) and edges (i.e. links, lines). There are numerous types of graphs/networks which
yield more or less expressivity (i.e. more or less structure).
4. The Purpose of a Graph for Mathematicians
• Mathematicians are concerned with the abstract structure of a graph.
• Mathematicians define operations to analyze and manipulate graphs.
Moreover, they develop theorems based upon structural axioms.
5. The Purpose of a Network for Physicists
• Physicists are concerned with modeling real-world structures with
networks.
• Physicists define algorithms that compress the information in a network
to more simple values (e.g. statistical analysis).
6. Much of the World has a Graphical/Network Structure
• Social networks: define how persons interact (collaborators, friends,
kins).
• Biological networks: define how biological components interact
(protein, food chains, gene regulation).
• Transportation networks: define how cities are joined by air and road
routes.
• Dependency networks: define how software modules use each other.
• Communication networks: define the relationships between Internet
routers.
• Language networks: define the relationships between words.
7. The Tour
• University of California at San Diego (1997-2001)
• University of California at Santa Cruz (2001-2007)
• Vrije Universiteit Brussel (2004-2005)
• Los Alamos National Laboratory (2005-2010)
• AT&T Interactive (2010-Present)
8. Undergrad at the University of California at San Diego
• Studied Cognitive Science (B.S.) and Computer Music (Minor) at the
University of California at San Diego. (1997-2001)
9. Cognitive Science at UCSD
• Neural networks: simplified models of how the brain encodes and
processes information.1
Neural networks exclude seemingly non-relevant aspects of the
biological counterpart (e.g. neurotransmitters, axon/soma/dendrite
distinctions).
No two signals on the brain are ever the same, yet we perceive a
consistent (object-oriented) world.
Can be generally applied to classification irrespective of the signal
being “human oriented” (e.g. non-sensory information).
Neural networks are usually trained through experience.
1
Please see: http://arxiv.org/abs/0811.3584
10. Cognitive Science at UCSD
Neural Network
Classification of the Signal
Signal from the World
Mice cortical networks are grown on multi-electrode arrays in order to study the
information properties of the structure through its development (left – done at LANL
during my PostDoc). Artificial neural networks are simplified models of the sufficient
components needed to process and classify information (right).
11. Computer Music at UCSD
• Spatial compositions: focused on the composition of music which
accounted for/represented sound in 3D space.
Amplitude (loud/quiet), pitch (high/low), timbre (guitar/drum), but
what about music beyond stereo (left/right)?
Developed algorithms to “trick the ear” into hearing sounds at
particular points in space.
Made use of a data flow sound processing language called Max/MSP
(see http://cycling74.com/).
∗ Data flow languages allow one to define “process graphs”
(dependencies between functions represented as a graph).
12. Computer Music at UCSD
My data flows programs (left) take/generate sound, process it algorithmically, and emit it
through a 6-channel circular surround sound system (right). My senior thesis was a live
concert using a computer music system I developed called Monkey Space Colony 6.
13. Graduate at the University of California at Santa Cruz
• Studied Computer Science (M.S. and Ph.D.) at the University of
California at Santa Cruz. (2001-2007)
14. Collective Intelligence at UCSC
• Collective decision making: applications of collective intelligence to
the design of techo-government architectures.2 (2001-2004)
We do not have the same restrictions as our founding fathers
(e.g. communication limited by space).
Is it possible to remove the representative layer of government by
leveraging expertise/representation in social networks?
What does a modern day direct democracy look like?
Can any actively participating subset of the population yield an
accurate model of the population as a whole.
Maintaining fidelity in that subset model is the point of dynamically
distributed democracy.
2
Please see: 1.) http://arxiv.org/abs/cs/0412047 2.) http://arxiv.org/abs/cs/0609034 3.)
http://arxiv.org/abs/0901.3929 4.) http://escholarship.org/uc/item/04h3h1cr
15. Collective Intelligence at UCSC
0.20
correct decisions
0.00 0.05 0.10 0.15 0.95
direct democracy
dynamically distributed democracy
0.80
proportion oferror
0.65
dynamically distributed democracy
direct democracy
0.50
100 90 80 70 60 50 40 30 20 10
100 90 80 70 60 50 40 30 20 10 0
0
percentage of active citizens
percentage of active citizens (n)
Fig. 5. The relationship between k and evote for direct democracy (gray
k
line) and dynamically distributed democracy (black line). The plot provides
People do not vote for a representative. Instead, theyproportion of identical, correct decisions over a ideas they respect
the maintain a ego-network of whose simulation that was run in
with 1000 artificially generated networks composed of 100 citizens each.
certain domains (e.g. health care, military, etc.). People in one’s network can be friends, family members, Fig. 6. A visuali
citizen’s color deno
scientists, public figures, etc. Any one, through the Internet, can vote on any decision. However, the is 1, and purple i
Reingold layout.
moment they abstain from voting, their vote power is transferred stated, lettheir network (according to the
As previously through x ∈ [0, 1]n denote the political
tendency of each citizen in this population, where xi is the
domain of decision). Power aggregates at those that participate in theand, for the purpose of simulation, is
tendency of citizen i current decision.
determined from a uniform distribution. Assume that every 1
n “vote power
citizen in a population of n citizens uses some social network- such that the to
based system to create links to those individuals that they 1. Let y ∈ Rn+
believe reflect their tendency the best. In practice, these links flowed to each
may point to a close friend, a relative, or some public figure a ∈ {0, 1}n de
whose political tendencies resonate with the individual. In in the current
other words, representatives are any citizens, not political values of a are
16. Visiting Researcher at the Vrije Universiteit Brussel
• Studied collective intelligence as a Visiting Researcher at the Center
for Evolution, Complexity, and Cognition of the Vrije Universiteit Brussel.
(2004-2005)
17. Collective Intelligence at the Vrije Universiteit Brussel
• Automating the scholarly process: Designed algorithms that exploit
bibliographic networks in order to support the scholarly communication
process. (2004-2005)3
Can the network of scholars, articles, journals, universities, conferences,
funding sources, etc. be leveraged to algorithmically support the
scholarly process?
∗ Can you find me articles related to my interests?
∗ Can you find me collaborators to work with me on my ideas?
∗ Can you find me a venue to publish my work in?
∗ Can you find me experts to peer-review a submitted article?
∗ Can you find me people to talk to (and concepts to talk about) at
the conference I’m going to?
3
Please see: 1.) http://arxiv.org/abs/cs/0601121 2.) http://arxiv.org/abs/cs/0605112 3.)
http://arxiv.org/abs/0905.1594
18. Collective Intelligence at the Vrije Universiteit Brussel
Example: Determining experts to peer-review an article can be done automatically and
with a sensitivity to conflict of interest situations. The spreading activation algorithm
used is analogous, in many ways, to neural networks. Can we think of the networks we (as
a society) implicitly create as a some sort of “collective neural substrate?” Can we then
apply similar algorithms that are found in biological systems? Can our implicitly generated
networks serve as a substrate for problem-solving?
19. Graduate Researcher at Los Alamos National Laboratory
• Studied bibliometrics as a graduate student on the Digital Library
Research and Prototyping Team of the Los Alamos National Laboratory.
(2005-2007)
20. Bibliometrics at Los Alamos National Laboratory
• Bibliometrics: the study of the scholarly process through the digital
footprint left by scholars — (“the science of science”) (2005-2007)4
Wrote my dissertation while with the Digital Library Research and
Prototyping Team (Johan Bollen, Herbert Van de Sompel, and Alberto
Pepe). A very fruitful time in my academic career.
Continued my work with problem-solving in scholarly networks.
Studied how scholars use information by studying how they download
articles (see http://mesur.org).
4
Please see: 1.) http://arxiv.org/abs/cs/0601030 2.) http://arxiv.org/abs/0708.1150
3.) http://arxiv.org/abs/0804.3791 4.) http://arxiv.org/abs/0801.2345 5.) http://
arxiv.org/abs/0807.0023 6.) http://dx.doi.org/10.1371/journal.pone.0004803 7.) http:
//arxiv.org/abs/0911.4223 8.) http://arxiv.org/abs/cs/0605110
21. Bibliometrics at Los Alamos National Laboratory
Each vertex (node) is a particular journal. Colors denote the journal domain. A directed edge (link) denotes
that a scholar read an article in journal A then one in journal B . This map provides us a collectively
generated representation of the knowledge transfer between domains (i.e. “folksonomy” of domains).
22. Web of Data at Los Alamos National Laboratory
• Web of Data: the representation of the world’s data within the global
URI (super class of URL) address space.5
For the most part, data is local to a computer with no easy way for
data on one computer to reference data on another.
∗ The World Wide Web provided a way to link documents across
computers, but what about data?
By placing data “on the Web” in a similar manner to how we place
documents on the Web, we can turn the Web into a distributed
database.
∗ This heterogenous network/graph of data opens the door to new
types of problem-solving.
5
Please see: 1.) http://arxiv.org/abs/0904.0027 2.) http://arxiv.org/abs/0908.0373 3.)
http://arxiv.org/abs/1006.1080 4.) http://arxiv.org/abs/0905.3378 5.) http://arxiv.org/
abs/0704.3395 6.) http://arxiv.org/abs/0802.3492 7.) http://arxiv.org/abs/0903.0194
23. Web of Data at Los Alamos National Laboratory
data set domain data set domain data set domain
audioscrobbler music govtrack government pubguide books
bbclatertotp music homologene biology qdos social
bbcplaycountdata music ibm computer rae2001 computer
bbcprogrammes media ieee computer rdfbookmashup books
budapestbme computer interpro biology rdfohloh social
chebi biology jamendo music resex computer
crunchbase business laascnrs computer riese government
dailymed medical libris books semanticweborg computer
dblpberlin computer lingvoj reference semwebcentral social
dblphannover computer linkedct medical siocsites social
dblprkbexplorer computer linkedmdb movie surgeradio music
dbpedia general magnatune music swconferencecorpus computer
doapspace social musicbrainz music taxonomy reference
drugbank medical myspacewrapper social umbel general
eurecom computer opencalais reference uniref biology
eurostat government opencyc general unists biology
flickrexporter images openguides reference uscensusdata government
flickrwrappr images pdb biology virtuososponger reference
foafprofiles social pfam biology w3cwordnet reference
freebase general pisa computer wikicompany business
geneid biology prodom biology worldfactbook government
geneontology biology projectgutenberg books yago general
geonames geographic prosite biology ...
24. Web of Data at Los Alamos National Laboratory
homologenekegg projectgutenberg
homologenekegg projectgutenberg
symbol libris
symbol libris
bbcjohnpeel
unists
chebi
cas
diseasome dailymed w3cwordnet cas bbcjohnpeel
diseasome dailymed
pubchem
mgi
hgnc
omim unists
eurostat
wikicompany geospecies w3cwordnet
geneid
drugbank chebi
worldfactbook
reactome
pubmed
magnatune
opencyc hgnc
freebase
pubchem eurostat
uniparc linkedct
taxonomy
uniprot
geneontology
interpro mgi omim wikicompany geospecies
uniref pdb umbel
pfam
yago
dbpedia geneid govtrack
bbclatertotp
prosite
reactome drugbank worldfactbook
prodom flickrwrappropencalais
uscensusdata magnatune
pubmed
surgeradio opencyc
uniparc lingvoj linkedmdb
virtuososponger
freebase
rdfbookmashup linkedct
uniprot musicbrainz
taxonomy dblpberlinswconferencecorpus geonames interpro
myspacewrapper
uniref revyu geneontology pubguide pdb umbel
rdfohloh
jamendo
yago
bbcplaycountdata
pfam dbpedia bbclatertotp govtrack
semanticweborg siocsites riese
foafprofiles prosite
dblphannover openguides prodom
audioscrobbler bbcprogrammes
flickrwrappropencalais
crunchbase
doapspace
uscensusdata
surgeradio
budapestbme
flickrexporter
qdos lingvoj linkedmdb
virtuososponger
semwebcentral
eurecom ecssouthampton
dblprkbexplorer
newcastle
rdfbookmashup
pisa
rae2001
eprints
irittoulouse
swconferencecorpus geonames musicbrainz myspacewrapper
laascnrs acm citeseer
ieee
dblpberlin pubguide
resex
ibm
revyu jamendo
rdfohloh
bbcplaycountdata
Each vertex (node) represents a data set. A directed edge (link) denotes that data set A
semanticweborg siocsites riese
foafprofiles
makes reference to data in data set B . openguides audioscrobbler bbcprogrammes
dblphannover
crunchbase
doapspace
flickrexporter
budapestbme qdos
semwebcentral
eurecom
25. Web of Data at Los Alamos National Laboratory
Application 1 Application 2 Application 3 Application 1 Application 2 Application 3
processes processes processes
processes processes processes
Web of Data
structures structures structures
structures structures structures
127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3
Data is currently in silos (left). For example, Amazon.com can only recommend other
Amazon.com products. What about recommending a job to take based upon the books
you read, the people you know, etc. (right). Can a collectively generated model of the
world help people to find their place in the life? (http://bit.ly/cLWL3F)
26. Web of Data at Los Alamos National Laboratory
urn:uuid:
rdf:type demo:Human
4fa0f752
hasMethod
"example"^^xsd:string
Method
urn:uuid: xsd:boolean RVM xsd:boolean
hasMethodName
6e400b42
[1] [1]
hasBlock
methodReuse halt
Block
urn:uuid:
4e0bada0 programLocation Fhat
nextInst
operandTop hasFrame
Equals returnTop
urn:uuid: Block
51b8d4a0 urn:uuid: [0..1] [0..1] [0..1]
falseInst currentFrame
67bbd072 [0..1] [0..1]
nextInst Operand
Instruction ReturnStack
Branch Block nextInst
Stack
hasLeft
urn:uuid: urn:uuid: PushValue rdf:rest rdf:rest blockTop
trueInst rdf:first [0..1] [0..*]
51b8d4a0 610eb4b0 rdf:first
urn:uuid:
6d451a1e [0..1]
hasRight nextInst [0..1] forFrame Frame
[1]
LocalDirect PushValue hasValue
rdfs:Resource Instruction
urn:uuid: urn:uuid: LocalDirect rdf:li
54e14d4c 5c4d5bc2 urn:uuid:
LocalDirect [0..*]
62e8b8dc
hasURI urn:uuid: hasValue Frame
[0..1] Block [0..1]
5869b878
LocalDirect hasURI nextInst Variable
Stack
hasURI urn:uuid:
"a"^^xsd:string
6425e5ec rdf:rest hasSymbol hasValue fromBlock
nextInst rdf:first
"2"^^xsd:int
hasURI
"marko"^^xsd:string [0..1] [1] [0..*] [1]
Return
urn:uuid:
urn:uuid: 008e999a
0748e1c6
"1"^^xsd:int Block xsd:string rdfs:Resource Block
Return
A more esoteric body of work was developed at this time that dealt with the encoding of
not only data into the Web of Data, but also process. This included the distributed
representation of computing instructions (left) and virtual machines (right).
27. PostDoc Researcher at Los Alamos National Laboratory
• Studied graph theory and ethics as a Director’s Fellow PostDoc at
the Center for Nonlinear Studies of the Los Alamos National Laboratory.
(2007-2010)
28. Path Algebra at Los Alamos National Laboratory
• Path Algebra: concerned with how to move through a graph in an
intelligent, directed manner in order to solve problems using graphs.6
The algebra contains a set of elements: vertices and edges.
The algebra contains a set of operations: traverse, filter, clip, merge,
split, not, etc.
The algebra provides a theory for how to develop graph traversal
engines (i.e. graph processors).
6
Please see: 1.) http://arxiv.org/abs/0806.2274 2.) http://arxiv.org/abs/0803.4355 3.)
http://gremlin.tinkerpop.com 4.) http://pipes.tinkerpop.com
29. Path Algebra at Los Alamos National Laboratory
The general theme of controlling how a walker moves through a graph has numerous
applications including searching, ranking, scoring, recommendation, etc. within a graph.
30. Eudaemonics at Los Alamos National Laboratory
• Eudaemonics: an ethical theory stating that it is everyone’s moral
responsibility to be “happy” (i.e. to live engaged in the world). See the
work of Aristotle and David L. Norton.7
Are recommender systems evolving to become eudaemonic engines?
∗ Movies (e.g. NetFlix), books (e.g. GoodReads), life partners
(e.g. Match.com), careers (e.g. Montster), etc.
∗ Can we interrelate all this data and traverse it for problem-solving?
7
Please see: 1.) http://arxiv.org/abs/0903.0200 2.) http://arxiv.org/abs/0904.0027
31. Graph Systems Architect at AT&T Interactive
• Work in theoretical and applied models of problem-solving with graph
traversals and graph databases. (2010-present)
32. Graphs at AT&T Interactive
• Graph Traversal: the development of theories and applications of graph
traversals in real-world problem-solving situations.8
Continue to work on path algebra (extensions to include a non-matrix
based, ring theoretic model and a diffusion model).
Continue to work on open source graph-related technologies to support
graph related efforts at AT&Ti (see http://www.tinkerpop.com).
• Recommender Systems: the development of applications for real-time,
“themed” recommendations (i.e. a problem-solving graph engine).
AT&Ti maintains a collection of interesting data sets.
Make use of such data for numerous types of recommendation.
8
Please see: 1.) http://arxiv.org/abs/1004.1001 2.) http://arxiv.org/abs/1006.2361
33. Conclusion
• Graphs/networks touch numerous disciplines.
• Many aspects of the world can be modeled as a graph/network.
• Graph traversal algorithms show promise as a general-purpose
style/pattern for computing.