How to Troubleshoot Apps for the Modern Connected Worker
Scoda openrefine-directordata
1. A
recipe
for
grabbing
director
informa-on
from
OpenCorporates
using
OpenRefine
given
an
OpenCorporates
company
ID
or
OpenCorporates
company
page
URL
For
more
informa<on,
contact:
schoolOfData.org
1
2. Here’s
the
start
of
thing
we’re
star<ng
with
–
a
list
of
companies…
2
3. Here’s
the
sort
of
thing
we
want
–
lists
of
directors
associated
with
each
company
(where
that
informa<on
is
available).
3
4. The
first
step
is
to
create
a
web
address/URL
to
call
the
OpenCorporates
API
and
ask
it
for
data
about
a
par<cular
company.
OpenRefine
can
create
a
new
column
populated
with
the
contents
of
calls
made
to
a
URL
contained
in,
or
generated
from,
another
column.
4
5. The
URLs
should
take
the
form:
h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID
If
you
already
have
company
page
URLs
in
a
column,
add
column
based
on
that
column
using:
value.replace(‘h"p://’,’h"p://api”)
If
you
have
JURISDICTION/COMPANY_ID
in
a
column,
use
the
formula:
“h"p://api.opencorporates.com/companies/”+value
5
6. The
data
comes
back
as
JSON
data,
which
we
will
need
to
process.
Each
JSON
result
contains
the
data
for
a
single
company.
The
data
rela<ng
to
the
directors
can
be
found
as
a
list
down
the
path
value.parseJson()['results']['company']
['officers’]
6
7. Let’s
parse
the
JSON
data
an
put
the
directors
informa<on
into
another
column…
7
8. What
we
are
aiming
for
is
a
contrivance
based
on
the
form:
32866743::SIMON
ALAN
CONSTANT-‐GLEMAS::director::2010-‐04-‐07::null
32866744::KARIN
JACQUELINE
HAWKINS::director::2006-‐01-‐17::2012-‐02-‐22
32866745::ANDREW
WILLIAM
LONGDEN::director::2003-‐11-‐03::null
…
where
we
list
director
ID,
name,
posi<on,
appointment
date,
termina<on
date.
8
9. This
func<on
will
parse
the
data
into
string
with
the
form:
32866743::SIMON
ALAN
CONSTANT-‐GLEMAS::director::2010-‐04-‐07::null||
32866744::KARIN
JACQUELINE
HAWKINS::director::2006-‐01-‐17::2012-‐02-‐22||
32866745::ANDREW
WILLIAM
LONGDEN::director::2003-‐11-‐03::null||…
The
func<on
reads
as
follows:
“for
each
officer,
join
their
ID,
name,
posi<on,
start
date
and
end
data
with
::,
then
join
each
of
these
director
descrip<ons
using
||”.
The
use
of
two
different
–
and
hopefully
unique
–
delimiters
means
we
can
split
the
data
on
each
delimiter
type
separately.
9
10. The
parsed
data
is
put
into
a
new
column
in
this
combined
list
form.
10
11. We
can
then
split
the
data
so
that
we
create
a
new
row
for
each
director
using
the
delimiter
we
defined:
||
11
12. Note
that
values
from
the
other
columns
will
not
be
copied
into
any
newly
created
rows
–
we
will
have
to
do
that
ourselves
either
now,
or
later.
12
13. For
each
director,
we
now
want
to
split
their
details
out
across
several
columns,
one
for
each
data
field
(ID,
name,
posi<on,
appointment
date,
termina<on
date).
13
14. We
can
do
this
by
splijng
on
the
other
separator
type
we
used:
::
14
15. The
newly
created
columns
are
labeled
with
automa<cally
generated
names.
It
would
probably
make
sense
to
rename
them
to
something
slightly
more
convenient.
15
16. Finally,
we
can
do
a
likle
more
<dying.
For
any
columns
we
want
to
export,
such
as
company
name,
or
company
ID,
we
can
Fill
down
using
the
corresponding
values
from
the
original
row
the
directors’
informa<on
was
pulled
from.
16