SlideShare a Scribd company logo
1 of 67
Download to read offline
Course	
  Introduc-on	
  
Designing	
  Data	
  Bases	
  with	
  Advanced	
  Data	
  Models	
  	
  
Dr. Fabio Fumarola
Enterprise	
  compu-ng	
  evolu-on	
  
•  We’ve	
  spent	
  several	
  years	
  in	
  the	
  world	
  of	
  enterprise	
  
compu-ng	
  
•  We’ve	
  seen	
  many	
  things	
  change	
  in:	
  
–  Languages,	
  Architectures,	
  
–  PlaAorms,	
  and	
  Processes.	
  
•  But	
  in	
  one	
  thing	
  we	
  stayed	
  constant	
  –	
  “rela-onal	
  
database	
  stored	
  the	
  data”	
  
•  The	
  data	
  storage	
  ques-on	
  for	
  architects	
  was:	
  “which	
  
rela-onal	
  database	
  to	
  use”	
  
1	
  
The	
  stability	
  of	
  the	
  reign	
  
•  Why?	
  
•  An	
  organiza-on’s	
  data	
  lasts	
  much	
  longer	
  than	
  its	
  
programs	
  (COBOL?)	
  
•  It’s	
  valuable	
  to	
  have	
  stable	
  data	
  storage	
  which	
  is	
  
accessible	
  from	
  many	
  applica-ons	
  
•  In	
  the	
  last	
  decades	
  RDBMS	
  have	
  been	
  successful	
  in	
  
solving	
  problems	
  related	
  to	
  storing,	
  serving	
  and	
  
processing	
  data.	
  
2	
  
FROM	
  BUSINESS	
  TO	
  DECISION	
  
SUPPORT	
  
Before	
  going	
  deep	
  on	
  the	
  course	
  arguments	
  let’s	
  understand	
  the	
  
evolu-on	
  of	
  decision	
  support	
  systems.	
  
	
  
	
  
3	
  
From	
  business	
  to	
  decision	
  support:	
  ‘60	
  
•  Star-ng	
  from	
  ’60	
  data	
  were	
  stored	
  using	
  magne-c	
  
disks.	
  
•  Supported	
  analysis	
  where	
  sta-c,	
  only	
  aggregated	
  and	
  
pre[y	
  limited	
  
•  For	
  instance,	
  it	
  was	
  possible	
  to	
  extract	
  the	
  total	
  
amount	
  of	
  last	
  month	
  sales	
  
4	
  
From	
  business	
  to	
  decision	
  support:	
  ‘80	
  
•  With	
  rela-onal	
  databases	
  and	
  SQL,	
  data	
  analysis	
  start	
  
to	
  be	
  somehow	
  dynamics.	
  
•  SQL	
  allows	
  us	
  to	
  extract	
  data	
  at	
  detailed	
  and	
  
aggregated	
  level	
  
•  Transac-onal	
  ac-vi-es	
  are	
  stored	
  in	
  Online	
  
Transac-on	
  Process	
  databases	
  	
  
•  OLTP	
  are	
  used	
  in	
  several	
  applica-ons	
  such	
  as	
  orders,	
  
salary,	
  invoices…	
  
5	
  
From	
  business	
  to	
  decision	
  support:	
  ‘80	
  
•  The	
  best	
  hypothesis	
  was	
  that	
  the	
  described	
  modules	
  
are	
  included	
  into	
  Enterprise	
  Resource	
  Planning	
  (ERP)	
  
sobware	
  
•  Examples	
  of	
  such	
  vendors	
  are	
  SAP,	
  Microsob,	
  HP	
  and	
  
Oracle.	
  
•  Normally,	
  what	
  	
  happen	
  is	
  that	
  each	
  module	
  is	
  
implemented	
  as	
  an	
  ad-­‐hoc	
  sobware	
  with	
  is	
  own	
  
database.	
  	
  
•  Cons:	
  Data	
  representa-on	
  and	
  integra-on.	
  
6	
  
OLTP	
  design	
  
•  Such	
  kind	
  of	
  databases	
  are	
  designed	
  to	
  be:	
  
–  Strongly	
  normalized,	
  
–  Fast	
  in	
  inser-ng	
  data.	
  
•  However,	
  data	
  normaliza-on:	
  
–  Do	
  not	
  foster	
  the	
  read	
  of	
  huge	
  quan-ty	
  of	
  data,	
  
–  Increase	
  the	
  number	
  of	
  tables	
  used	
  to	
  store	
  records.	
  
7	
  
Foster	
  decision	
  support	
  
•  In	
  order	
  to	
  support	
  “Decisions”	
  we	
  need	
  to	
  extract	
  
de-­‐normalized	
  data	
  via	
  several	
  JOINS.	
  
•  Moreover,	
  opera-onal	
  databases	
  offer	
  a	
  no/limited	
  
visibility	
  on	
  historical	
  data.	
  
Considera-ons:	
  
•  These	
  factors	
  make	
  hard	
  data	
  analysis	
  made	
  on	
  OLTP	
  
databases…	
  
8	
  
From	
  business	
  to	
  decision	
  support:	
  ‘90	
  
•  Thus	
  in	
  ’90	
  bore	
  databases	
  designed	
  to	
  support	
  
analysis	
  from	
  OLTP	
  databases.	
  
•  This	
  is	
  the	
  arise	
  of	
  Data	
  Warehouses	
  
•  DWs	
  are	
  central	
  repositories	
  of	
  integrated	
  data	
  from	
  
one	
  or	
  more	
  disparate	
  sources.	
  
•  They	
  store	
  current	
  and	
  historical	
  data	
  and	
  are	
  used	
  
for	
  crea-ng	
  trending	
  reports	
  for	
  management	
  
repor-ng	
  such	
  as	
  annual	
  and	
  quarterly	
  comparisons.	
  
9	
  
Types	
  of	
  DWs	
  systems	
  
•  Data	
  Mart	
  	
  
–  is	
  a	
  simple	
  form	
  of	
  a	
  data	
  warehouse	
  that	
  is	
  focused	
  on	
  a	
  
single	
  subject	
  (or	
  func-onal	
  area),	
  such	
  as	
  sales,	
  finance	
  or	
  
marke-ng.	
  
•  Online	
  analy-cal	
  processing	
  (OLAP):	
  	
  
–  is	
  characterized	
  by	
  a	
  rela-vely	
  low	
  volume	
  of	
  transac-ons.	
  	
  
–  Queries	
  are	
  oben	
  very	
  complex	
  and	
  involve	
  aggrega-ons.	
  
–  OLAP	
  databases	
  store	
  aggregated,	
  historical	
  data	
  in	
  mul--­‐
dimensional	
  schemas.	
  
10	
  
Types	
  of	
  DWs	
  systems	
  
•  Predic-ve	
  analysis	
  
–  Predic-ve	
  analysis	
  is	
  about	
  finding	
  and	
  quan-fying	
  hidden	
  
pa[erns	
  in	
  the	
  data	
  using	
  complex	
  mathema-cal	
  models	
  
that	
  can	
  be	
  used	
  to	
  predict	
  future	
  outcomes.	
  	
  
–  Predic-ve	
  analysis	
  is	
  different	
  from	
  OLAP	
  in	
  that	
  OLAP	
  
focuses	
  on	
  historical	
  data	
  analysis	
  and	
  is	
  reac-ve	
  in	
  
nature,	
  while	
  predic-ve	
  analysis	
  focuses	
  on	
  the	
  future.	
  	
  
11	
  
Data	
  Warehouse	
  
•  Data	
  stored	
  in	
  DWs	
  are	
  the	
  star-ng	
  point	
  for	
  the	
  
Business	
  Intelligence	
  (BI).	
  
•  Def.	
  “Business	
  intelligence	
  (BI)	
  is	
  the	
  set	
  of	
  
techniques	
  and	
  tools	
  for	
  the	
  transforma7on	
  of	
  raw	
  
data	
  into	
  meaningful	
  and	
  useful	
  informa7on	
  for	
  
business	
  analysis	
  purposes”.	
  
•  With	
  the	
  evolu-on	
  of	
  BI	
  systems	
  we	
  moved	
  from	
  SQL	
  
based	
  analysis	
  to	
  Visual	
  Instruments.	
  
12	
  
Example	
  DW	
  Cube	
  
13	
  
OLAP	
  features	
  
•  OLAP	
  systems	
  support	
  data	
  explora-on	
  through	
  drill-­‐
down,	
  drill-­‐up,	
  slicing	
  e	
  dicing	
  opera-ons.	
  
•  However,	
  we	
  s-ll	
  have	
  an	
  historical	
  point	
  of	
  view	
  of	
  
what	
  happened	
  in	
  the	
  business	
  but	
  now	
  on	
  what	
  is	
  
happening.	
  
•  We	
  cannot	
  make	
  predic-on	
  on	
  the	
  future	
  
14	
  
From	
  business	
  to	
  decision	
  support:	
  ‘00	
  
•  Star-ng	
  from	
  2000	
  it	
  arises	
  the	
  necessity	
  to	
  do	
  
predic-ve	
  analysis.	
  
•  The	
  techniques	
  in	
  this	
  scenario	
  are	
  in	
  the	
  field	
  of	
  Data	
  
Mining	
  
•  Data	
  Mining	
  is	
  the	
  computa-onal	
  process	
  of	
  
discovering	
  pa[erns	
  in	
  large	
  dataset	
  involving	
  
methods	
  at	
  the	
  intersec-on	
  of	
  ar-ficial	
  intelligence,	
  
machine	
  learning,	
  sta-s-cs,	
  and	
  database	
  systems.	
  
15	
  
From	
  business	
  to	
  decision	
  support:	
  ‘00	
  
•  The	
  overall	
  goal	
  of	
  the	
  data	
  mining	
  process	
  is	
  to	
  
extract	
  novel	
  informa-on	
  from	
  a	
  data	
  set	
  and	
  
transform	
  it	
  into	
  understandable	
  knowledge.	
  
•  Aside	
  from	
  the	
  raw	
  analysis	
  step,	
  it	
  involves	
  database	
  
and	
  data	
  management	
  aspects,	
  data	
  pre-­‐processing,	
  
model	
  and	
  inference	
  considera-ons,	
  interes-ngness	
  
metrics,	
  complexity	
  considera-ons,	
  post-­‐processing	
  
of	
  discovered	
  structures,	
  visualiza-on,	
  and	
  online	
  
upda-ng.	
  
16	
  
From	
  business	
  to	
  decision	
  support:	
  Recap	
  
•  In	
  the	
  last	
  decades	
  RDBMS	
  have	
  been	
  successful	
  in	
  
solving	
  problems	
  related	
  to	
  storing,	
  serving	
  and	
  
processing	
  data.	
  
	
  
17	
  
From	
  business	
  to	
  decision	
  support:	
  Recap	
  
•  Vendors	
  such	
  as	
  Oracle,	
  Ver-ca,	
  Teradata,	
  Microsob	
  
and	
  IBM	
  proposed	
  their	
  solu-on	
  based	
  on	
  Rela-onal	
  
Algebra	
  and	
  SQL.	
  
•  With	
  Data	
  Mining	
  we	
  are	
  able	
  to	
  extract	
  knowledge	
  
from	
  data	
  which	
  can	
  be	
  used	
  to	
  support	
  predic-ve	
  
analysis	
  	
  (Data	
  Mining	
  is	
  not	
  only	
  predic-on!!!).	
  
18	
  
Challenges	
  of	
  Scale	
  Differ	
  
THERE	
  IS	
  SOMETHING	
  THAT	
  DOES	
  
NOT	
  WORK!	
  
But	
  
20	
  
1.	
  Scaling	
  Up	
  Databases	
  
A	
  ques-on	
  I’m	
  oben	
  asked	
  about	
  Heroku	
  is:	
  “How	
  do	
  you	
  scale	
  the	
  SQL	
  
database?”	
  There’s	
  a	
  lot	
  of	
  things	
  I	
  can	
  say	
  about	
  using	
  caching,	
  sharding,	
  and	
  
other	
  techniques	
  to	
  take	
  load	
  off	
  the	
  database.	
  But	
  the	
  actual	
  answer	
  is:	
  we	
  
don’t.	
  SQL	
  databases	
  are	
  fundamentally	
  non-­‐scalable,	
  and	
  there	
  is	
  no	
  magical	
  
pixie	
  dust	
  that	
  we,	
  or	
  anyone,	
  can	
  sprinkle	
  on	
  them	
  to	
  suddenly	
  make	
  them	
  
scale.	
  	
  
Adam	
  Wiggins	
  Heroku	
  
	
  
Adam	
  Wiggins,	
  Heroku	
  Pa[erson,	
  David;	
  Fox,	
  Armando	
  (2012-­‐07-­‐11).	
  Engineering	
  Long-­‐LasGng	
  SoHware:	
  An	
  
Agile	
  Approach	
  Using	
  SaaS	
  and	
  Cloud	
  CompuGng,	
  Alpha	
  Edi-on	
  (Kindle	
  Loca-ons	
  1285-­‐1288).	
  Strawberry	
  
Canyon	
  LLC.	
  Kindle	
  Edi-on.	
  	
  
	
  
21	
  
2.	
  Data	
  Variety	
  
•  RDBMs	
  have	
  problems	
  with	
  Unstructured	
  and	
  Semi-­‐
Structured	
  Data	
  (varied	
  data)	
  
22	
  
3.	
  Connec-vity	
  
23	
  
4.	
  P2P	
  Knowledge	
  
24	
  
5.	
  Concurrency	
  
25	
  
6.	
  Concurrency	
  
26	
  
6.	
  Diversity	
  
27	
  
7.	
  Cloud	
  
28	
  
What	
  is	
  the	
  problem	
  with	
  RDBMs	
  
Caching	
  
Master/Slave	
  
Master/Master	
  
Cluster	
  
Table	
  Par--oning	
  
Federated	
  Tables	
  
Sharding	
  
Distributed	
  DBs	
  
29	
  
http://codefutures.com/database-sharding/
What	
  is	
  the	
  problem	
  with	
  RDBMs	
  
•  RDBMS	
  can	
  somehow	
  deal	
  with	
  this	
  aspects,	
  but	
  
they	
  have	
  issues	
  related	
  to:	
  	
  
–  expensive	
  licensing,	
  
–  requiring	
  complex	
  applica-on	
  logic,	
  
–  Dealing	
  with	
  evolving	
  data	
  models	
  
•  There	
  were	
  a	
  need	
  for	
  systems	
  that	
  could:	
  
–  work	
  with	
  different	
  kind	
  of	
  data	
  format,	
  
–  Do	
  not	
  require	
  strict	
  schema,	
  
–  and	
  are	
  easily	
  scalable.	
  
30	
  
NOSQL:	
  THE	
  NEW	
  CHALLENGER!	
  
Help!!!	
  
31	
  
NoSQL	
  
•  It	
  is	
  born	
  out	
  of	
  a	
  need	
  to	
  handle	
  large	
  data	
  volumes	
  
•  It	
  forces	
  a	
  fundamental	
  shib	
  to	
  building	
  large	
  
hardware	
  plaAorms	
  through	
  clusters	
  of	
  commodity	
  
servers.	
  
•  This	
  need	
  raises	
  from	
  the	
  difficul-es	
  of	
  making	
  
applica-on	
  code	
  play	
  well	
  with	
  rela-onal	
  databases	
  
32	
  
NoSQL	
  
•  The	
  term	
  “NoSQL”	
  is	
  very	
  ill-­‐defined.	
  	
  
•  It’s	
  generally	
  applied	
  to	
  a	
  number	
  of	
  recent	
  non	
  
rela-onal	
  databases:	
  Cassandra,	
  Mongo,	
  Neo4j,	
  
Hbase	
  and	
  Redis,…	
  
•  They	
  embrace	
  	
  
–  schemaless	
  data,	
  	
  
–  run	
  on	
  a	
  cluster,	
  	
  
–  and	
  have	
  the	
  ability	
  to	
  trade	
  off	
  tradi-onal	
  consistency	
  for	
  
other	
  useful	
  proper-es	
  
33	
  
Why	
  are	
  NoSQL	
  Databases	
  
Interes-ng	
  
1.	
  	
  Applica-on	
  development	
  produc-vity:	
  
–  A	
  lot	
  of	
  applica-on	
  development	
  is	
  spent	
  on	
  mapping	
  data	
  
between	
  in	
  memory	
  data	
  structures	
  and	
  rela-onal	
  
databases	
  
–  A	
  NoSQL	
  database	
  may	
  provide	
  a	
  data	
  model	
  that	
  can	
  
simplify	
  that	
  interac-on	
  resul-ng	
  in	
  less	
  code	
  to	
  write,	
  
debug,	
  and	
  evolve.	
  
34	
  
Why	
  are	
  NoSQL	
  Databases	
  
Interes-ng	
  
2.  Large-­‐scale	
  data:	
  
–  Organiza-ons	
  are	
  finding	
  it	
  valuable	
  to	
  capture	
  mode	
  
data	
  and	
  process	
  it	
  more	
  quickly.	
  
–  They	
  are	
  finding	
  it	
  expensive	
  to	
  do	
  so	
  with	
  rela-onal	
  
databases.	
  
–  NoSQL	
  database	
  are	
  more	
  economic	
  if	
  ran	
  on	
  large	
  
cluster	
  of	
  many	
  smaller	
  an	
  cheaper	
  machines.	
  
–  Many	
  NoSQL	
  database	
  are	
  designed	
  to	
  run	
  on	
  clusters,	
  so	
  
they	
  be[er	
  fit	
  on	
  Big	
  Data	
  scenarios.	
  
35	
  
WHY	
  NOSQL	
  
Internet	
  Hypertext,	
  RSS,	
  Wikis,	
  blogs,	
  wikis,	
  tagging,	
  user	
  generated	
  
content,	
  RDF,	
  ontologies	
  
36	
  
Connectedness	
  
37	
  
The	
  Value	
  of	
  Rela-onal	
  Databases	
  
•  Rela-onal	
  databases	
  have	
  become	
  such	
  an	
  
embedded	
  part	
  of	
  our	
  compu-ng	
  culture.	
  
•  What	
  are	
  the	
  benefits	
  they	
  provide?	
  
38	
  
Getng	
  at	
  Persistent	
  Data	
  
•  The	
  most	
  obvious	
  value	
  of	
  a	
  database	
  is	
  keeping	
  
large	
  amount	
  of	
  persistent	
  data	
  
•  Two	
  areas	
  of	
  memory:	
  
–  Main	
  memory:	
  fast,	
  vola-le,	
  limited	
  in	
  space	
  and	
  lose	
  data	
  
when	
  it	
  loses	
  the	
  power	
  
–  Backing	
  store:	
  larger	
  but	
  slower,	
  commonly	
  seen	
  as	
  a	
  disk	
  
39	
  
Getng	
  at	
  Persistent	
  Data	
  
•  The	
  most	
  obvious	
  value	
  of	
  a	
  database	
  is	
  keeping	
  
large	
  amount	
  of	
  persistent	
  data	
  
•  Two	
  areas	
  of	
  memory:	
  
–  Main	
  memory:	
  fast,	
  vola-le,	
  limited	
  in	
  space	
  and	
  lose	
  data	
  
when	
  it	
  loses	
  the	
  power	
  
–  Backing	
  store:	
  larger	
  but	
  slower,	
  commonly	
  seen	
  as	
  a	
  disk	
  
40	
  
Getng	
  at	
  Persistent	
  Data	
  
•  The	
  backing	
  store	
  can	
  be	
  organized	
  in	
  all	
  sort	
  of	
  
ways.	
  
•  For	
  many	
  produc-vity	
  applica-ons	
  (such	
  as	
  word	
  
processors)	
  it	
  is	
  a	
  file	
  in	
  the	
  file	
  system.	
  
•  For	
  most	
  enterprise	
  applica-ons,	
  however,	
  the	
  
backing	
  store	
  is	
  a	
  database.	
  
•  A	
  database	
  allows	
  more	
  flexibility	
  than	
  a	
  file	
  system.	
  
41	
  
Concurrency	
  
•  Concurrency	
  is	
  notoriously	
  difficult	
  to	
  get	
  right.	
  
•  Object	
  oriented	
  is	
  not	
  the	
  right	
  programming	
  model	
  
to	
  deal	
  with	
  concurrency.	
  
•  Since	
  enterprise	
  applica-ons	
  can	
  have	
  a	
  lot	
  of	
  
concurrent	
  users,	
  there	
  is	
  a	
  lot	
  of	
  rooms	
  for	
  bad	
  
things	
  to	
  happen.	
  
•  Rela-on	
  databases	
  have	
  transac-ons	
  that	
  help	
  
mi-ga-ng	
  this	
  problem,	
  but….	
  
42	
  
Concurrency	
  
•  You	
  s-ll	
  have	
  to	
  deal	
  with	
  transac-onal	
  error	
  when	
  
you	
  try	
  to	
  book	
  a	
  room	
  that	
  is	
  just	
  gone.	
  
•  The	
  reality	
  is	
  that	
  the	
  transac-onal	
  mechanism	
  has	
  
worked	
  well	
  to	
  contain	
  the	
  complexity	
  of	
  
concurrency.	
  
•  Transac-on	
  with	
  rollback	
  allows	
  as	
  to	
  deal	
  with	
  
errors.	
  
43	
  
Integra-on	
  
•  Enterprise	
  applica-ons	
  live	
  in	
  a	
  rich	
  ecosystem	
  	
  
•  mul-ple	
  applica-on	
  wri[en	
  by	
  different	
  teams	
  need	
  to	
  
collaborate	
  in	
  order	
  to	
  get	
  things	
  done	
  
•  This	
  collabora-on	
  is	
  done	
  via	
  data	
  sharing.	
  
•  A	
  common	
  way	
  to	
  do	
  this	
  is	
  shared	
  database	
  integraGon	
  
[Hohpe	
  and	
  Woolf]	
  where	
  mul-ple	
  applica-ons	
  store	
  their	
  
data	
  into	
  a	
  single	
  database	
  
•  Using	
  a	
  single	
  database,	
  allows	
  all	
  the	
  applica-on	
  to	
  share	
  
data	
  easily,	
  while	
  the	
  database	
  concurrency	
  control	
  
applica-ons	
  such	
  as	
  users.	
  
44	
  
A	
  (Mostly)	
  Standard	
  Model	
  
•  Rela-onal	
  database	
  have	
  succeeded	
  because	
  they	
  
have	
  a	
  standard	
  model	
  
•  As	
  a	
  result,	
  developers	
  and	
  database	
  professionals	
  
can	
  apply	
  the	
  same	
  knowledge	
  in	
  several	
  projects.	
  
•  Although	
  there	
  are	
  differences	
  between	
  different	
  
RDBMs,	
  the	
  core	
  mechanism	
  remain	
  the	
  same.	
  
45	
  
Impedance	
  Mismatch	
  
•  It	
  is	
  the	
  difference	
  between	
  the	
  rela-onal	
  model	
  and	
  
the	
  in-­‐memory	
  data	
  structures.	
  
•  Rela-onal	
  data	
  organizes	
  data	
  into	
  table	
  and	
  rows	
  
(rela-on	
  and	
  tuples).	
  	
  
–  A	
  tuple	
  is	
  a	
  set	
  of	
  name-­‐value	
  pairs	
  
–  A	
  rela-on	
  is	
  a	
  set	
  of	
  tuples.	
  
•  All	
  the	
  SQL	
  opera-ons	
  consume	
  and	
  return	
  rela-ons.	
  
46	
  
Impedance	
  Mismatch:	
  Example	
  
47	
  
Impedance	
  Mismatch	
  
•  Tuples	
  and	
  rela-on	
  provides	
  elegance	
  and	
  simplicity,	
  
but	
  it	
  also	
  introduces	
  limita-ons.	
  
•  In	
  par-cular,	
  the	
  values	
  in	
  a	
  rela-onal	
  tuple	
  have	
  to	
  
be	
  simple.	
  
•  They	
  cannot	
  contain	
  any	
  structure,	
  such	
  as	
  nested	
  
record	
  or	
  a	
  list.	
  
•  This	
  limita-on	
  is	
  not	
  true	
  for	
  in	
  memory	
  data-­‐
structures.	
  
48	
  
Impedance	
  Mismatch	
  
•  As	
  a	
  result,	
  if	
  we	
  want	
  to	
  use	
  richer	
  in-­‐memory	
  data	
  
structure,	
  we	
  have	
  to	
  translate	
  it	
  to	
  a	
  rela-onal	
  
representa-on	
  to	
  store	
  in	
  on	
  disk.	
  
•  While	
  object-­‐oriented	
  language	
  succeeded,	
  object-­‐
oriented	
  databases	
  faded	
  into	
  obscurity.	
  
•  Impedance	
  mismatch	
  has	
  been	
  made	
  much	
  easier	
  to	
  
deal	
  with	
  Object-­‐Rela-onal	
  Mapping	
  	
  (ORM)	
  
frameworks	
  such	
  as	
  Eclipse-­‐Link,	
  Hibernate	
  and	
  
other	
  JPA	
  implementa-ons.	
  
49	
  
Impedance	
  Mismatch	
  
50	
  
•  ORMs	
  remove	
  a	
  lot	
  of	
  work,	
  but	
  can	
  become	
  a	
  
problem	
  when	
  people	
  try	
  to	
  ignore:	
  
–  the	
  database,	
  and	
  
–  query	
  performance	
  suffer	
  
•  This	
  is	
  where	
  NoSQL	
  database	
  works	
  greatly,	
  why?	
  
Applica-on	
  and	
  Integra-on	
  DBs	
  
•  This	
  is	
  a	
  event	
  that	
  happen	
  several	
  -mes	
  in	
  SW	
  
projects.	
  
•  In	
  this	
  scenario,	
  the	
  database	
  acts	
  as	
  an	
  integra-on	
  
database.	
  
•  The	
  downsides	
  to	
  share	
  database	
  are.	
  
51	
  
Applica-on	
  and	
  Integra-on	
  DBs	
  
•  The	
  downsides	
  to	
  share	
  database	
  are:	
  
–  Its	
  structure	
  tend	
  to	
  be	
  more	
  complex	
  than	
  any	
  single	
  
applica-on	
  needs,	
  
–  If	
  an	
  applica-on	
  want	
  to	
  make	
  changes	
  to	
  its	
  data	
  storage,	
  
it	
  needs	
  to	
  coordinate	
  with	
  all	
  the	
  other	
  applica-ons,	
  
–  Performance	
  degrada-on	
  due	
  to	
  huge	
  number	
  of	
  access	
  
–  Errors	
  in	
  database	
  usage	
  since	
  it	
  is	
  accessed	
  by	
  applica-on	
  
wri[en	
  by	
  different	
  teams.	
  
•  This	
  is	
  different	
  from	
  single	
  applica-on	
  databases	
  
52	
  
Applica-on	
  and	
  Integra-on	
  DBs	
  
•  Interoperability	
  concerns	
  can	
  now	
  shit	
  to	
  interfaces	
  
of	
  the	
  applica-on	
  allowing	
  interac-on	
  over	
  HTTTP.	
  
•  This	
  is	
  what	
  happen	
  with	
  micro-­‐services	
  (
h[p://www.-kalk.com/java/micro-­‐services/)	
  
•  Micro-­‐services	
  and	
  Web	
  services	
  in	
  general	
  enable	
  a	
  
form	
  of	
  communica-ons	
  based	
  on	
  data.	
  
•  Data	
  is	
  represented	
  as	
  documents	
  using	
  before	
  XML	
  
and	
  now	
  JSON	
  format.	
  
53	
  
Applica-on	
  and	
  Integra-on	
  DBs	
  
•  If	
  you	
  are	
  going	
  to	
  use	
  service	
  integra-on	
  using	
  text	
  
over	
  HTTP	
  is	
  the	
  way	
  to	
  go.	
  
•  However,	
  if	
  we	
  are	
  dealing	
  with	
  performance	
  there	
  
are	
  binary	
  protocols.	
  
54	
  
A[ack	
  of	
  the	
  Clusters	
  
•  In	
  2000s	
  several	
  large	
  web	
  proper-es	
  drama-cally	
  
increase	
  in	
  scale:	
  
–  Websites	
  started	
  tracking	
  ac-vity	
  and	
  structure	
  in	
  detail	
  
(analy-cs)	
  
–  Large	
  sets	
  of	
  data	
  appeared:	
  links,	
  social	
  networks,	
  ac-vity	
  
in	
  logs,	
  mapping	
  data.	
  
–  With	
  this	
  growth	
  in	
  data	
  came	
  a	
  growth	
  in	
  users	
  
•  Coping	
  with	
  this	
  increase	
  in	
  data	
  and	
  traffic	
  required	
  
more	
  compu-ng	
  resources.	
  
55	
  
A[ack	
  of	
  the	
  Clusters	
  
•  There	
  were	
  to	
  choices:	
  
–  Scaling	
  up	
  
–  Scaling	
  out	
  
•  Scaling	
  up	
  implies	
  bigger	
  machines,	
  more	
  processors,	
  
disk	
  storage	
  and	
  memory	
  ($$$).	
  
•  Scaling	
  out	
  was	
  the	
  alterna-ve:	
  
–  Use	
  a	
  lot	
  of	
  small	
  machine	
  in	
  a	
  cluster.	
  
–  It	
  is	
  cheap	
  and	
  also	
  more	
  resilient	
  (individual	
  failures)	
  
56	
  
A[ack	
  of	
  the	
  Clusters	
  
•  This	
  revealed	
  a	
  new	
  problem,	
  rela-onal	
  databases	
  
are	
  note	
  designed	
  to	
  run	
  on	
  clusters.	
  
•  Clustered	
  rela-onal	
  databases	
  (e.g.	
  Oracle	
  RAC	
  or	
  
Microsob	
  SQL	
  Server)	
  work	
  on	
  the	
  concept	
  of	
  shared	
  
disk	
  subsystem.	
  
•  RDBMs	
  can	
  also	
  run	
  on	
  separate	
  server	
  with	
  
different	
  sets	
  of	
  data	
  (sharding)	
  
57	
  
A[ack	
  of	
  the	
  Clusters	
  
•  However,	
  it	
  needs	
  an	
  applica-on	
  to	
  control	
  the	
  
sharded-­‐database.	
  
•  Also	
  we	
  lose	
  querying,	
  referen-al	
  integrity,	
  
transac-ons	
  or	
  consistency	
  control	
  cross	
  shard.	
  
•  These	
  technical	
  issues	
  are	
  exacerbated	
  by	
  licensing	
  
cots.	
  
•  This	
  mismatch	
  between	
  DBs	
  and	
  clusters	
  led	
  some	
  
organiza-ons	
  to	
  consider	
  different	
  solu-ons	
  
58	
  
The	
  emergence	
  of	
  NoSQL	
  
•  Two	
  companies	
  in	
  par-cular	
  –	
  Google	
  and	
  Amazon	
  –	
  
have	
  been	
  very	
  influen-al.	
  
•  They	
  were	
  capturing	
  a	
  large	
  amount	
  of	
  data	
  and	
  
their	
  business	
  is	
  on	
  data	
  management	
  
•  Both	
  companies	
  produces	
  influen-al	
  papers:	
  
–  BigTable	
  from	
  Google	
  
–  Dynamo	
  DB	
  from	
  Amazon	
  
59	
  
The	
  emergence	
  of	
  NoSQL	
  
•  As	
  part	
  of	
  innova-on	
  in	
  data	
  management	
  system,	
  several	
  
new	
  technologies	
  where	
  built:	
  
–  2003	
  -­‐	
  Google	
  File	
  System,	
  
–  2004	
  -­‐	
  MapReduce,	
  
–  2006	
  -­‐	
  BigTable,	
  
–  2007	
  -­‐	
  Amazon	
  DynamoDB	
  
–  2012	
  Google	
  Cloud	
  Engine	
  
•  Each	
  solved	
  different	
  use	
  cases	
  and	
  had	
  a	
  different	
  set	
  of	
  
assump-ons.	
  
•  All	
  these	
  mark	
  the	
  beginning	
  of	
  a	
  different	
  way	
  of	
  thinking	
  
about	
  data	
  management.	
  
60	
  
The	
  emergence	
  of	
  NoSQL	
  
•  It	
  is	
  irony	
  that	
  the	
  term	
  “NoSQL”	
  appeared	
  in	
  late	
  
90s	
  from	
  a	
  rela-onal	
  database	
  made	
  by	
  Carlo	
  Strozzi.	
  
•  The	
  name	
  comes	
  from	
  the	
  fact	
  that	
  it	
  does	
  not	
  used	
  
SQL	
  as	
  query	
  language.	
  
•  However,	
  the	
  usage	
  of	
  “NoSQL”	
  as	
  we	
  consider	
  
today	
  come	
  from	
  a	
  meetup	
  on	
  2009	
  in	
  San	
  Francisco.	
  
•  They	
  want	
  a	
  term	
  that	
  can	
  be	
  used	
  as	
  Twi[er	
  
hashtag.	
  #NoSQL	
  
61	
  
NoSQL	
  Characteris-cs	
  
1.  They	
  don’t	
  use	
  SQL.	
  (HBase,	
  Cassandra,	
  Redis…)	
  
2.  They	
  are	
  generally	
  open-­‐source	
  projects.	
  
3.  Most	
  of	
  them	
  are	
  designed	
  to	
  run	
  on	
  clusters.	
  	
  
4.  RDBMs	
  used	
  ACID	
  transac-ons	
  to	
  handle	
  
consistency	
  across	
  the	
  whole	
  database.	
  NoSQL	
  
resort	
  to	
  other	
  op-ons	
  (CAP	
  theorem).	
  
5.  No	
  all	
  are	
  cluster	
  oriented	
  (Graph	
  DBs)	
  
6.  NoSQL	
  operate	
  without	
  a	
  schema.	
  (schema	
  free)	
  
62	
  
The	
  emergence	
  of	
  NoSQL	
  
•  NoSQL	
  does	
  not	
  stands	
  for	
  Not-­‐Only	
  SQL.	
  
•  It	
  is	
  be[er	
  to	
  NoSQL	
  as	
  a	
  movement	
  rather	
  than	
  a	
  
techonology.	
  
•  RDBMs	
  are	
  not	
  going	
  away.	
  
•  The	
  change	
  is	
  that	
  rela-onal	
  databases	
  are	
  an	
  op-on	
  
•  This	
  point	
  of	
  view	
  is	
  oben	
  referred	
  to	
  as	
  polyglot	
  
persistence	
  
63	
  
The	
  emergence	
  of	
  NoSQL	
  
•  Instead	
  of	
  just	
  picking	
  a	
  rela-onal	
  database,	
  we	
  need	
  
to	
  understand:	
  
1.  The	
  nature	
  of	
  the	
  data	
  we	
  are	
  storing,	
  and	
  
2.  How	
  we	
  want	
  to	
  manipulate	
  it.	
  
•  In	
  order	
  to	
  deal	
  with	
  this	
  change	
  most	
  organiza-ons	
  
need	
  to	
  shib	
  from	
  integra-on	
  database	
  to	
  
applica-on	
  database.	
  
•  In	
  this	
  course	
  we	
  concentrate	
  on	
  Big	
  Data	
  running	
  
on	
  clusters.	
  
64	
  
The	
  emergence	
  of	
  NoSQL	
  
•  The	
  Big	
  Data	
  concerns	
  have	
  created	
  an	
  opportunity	
  
for	
  people	
  to	
  think	
  freshly	
  about	
  their	
  storage	
  needs.	
  
•  NoSQL	
  help	
  developer	
  produc-vity	
  by	
  simplifying	
  
their	
  database	
  access	
  even	
  if	
  they	
  have	
  no	
  need	
  to	
  
scale	
  beyond	
  single	
  machine.	
  
65	
  
66	
  

More Related Content

What's hot

Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsnehabsairam
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2Fabio Fumarola
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsnehabsairam
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingnurmeen1
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-PresentationShubham Tomar
 

What's hot (20)

Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortals
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
NoSql
NoSqlNoSql
NoSql
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Nosql
NosqlNosql
Nosql
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
OLAP
OLAPOLAP
OLAP
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 

Viewers also liked

Neo4j - 7 databases in 7 weeks
Neo4j - 7 databases in 7 weeksNeo4j - 7 databases in 7 weeks
Neo4j - 7 databases in 7 weeksLandier Nicolas
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Michael Mathioudakis
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory Fabio Fumarola
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Make Sure Your Applications Crash
Make Sure Your  Applications CrashMake Sure Your  Applications Crash
Make Sure Your Applications CrashMoshe Zadka
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQLMike Crabb
 

Viewers also liked (8)

Neo4j - 7 databases in 7 weeks
Neo4j - 7 databases in 7 weeksNeo4j - 7 databases in 7 weeks
Neo4j - 7 databases in 7 weeks
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Make Sure Your Applications Crash
Make Sure Your  Applications CrashMake Sure Your  Applications Crash
Make Sure Your Applications Crash
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 

Similar to 1. Introduction to the Course "Designing Data Bases with Advanced Data Models (NoSQL & Hadoop)"

Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingSamraiz Tejani
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiProfessor Lili Saghafi
 

Similar to 1. Introduction to the Course "Designing Data Bases with Advanced Data Models (NoSQL & Hadoop)" (20)

Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Lecture1
Lecture1Lecture1
Lecture1
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 

More from Fabio Fumarola

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2Fabio Fumarola
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2Fabio Fumarola
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases LabFabio Fumarola
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases LabFabio Fumarola
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and DockerFabio Fumarola
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbtFabio Fumarola
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and consFabio Fumarola
 

More from Fabio Fumarola (17)

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:211. From Hadoop to Spark 1:2
11. From Hadoop to Spark 1:2
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases Lab
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab8b. Column Oriented Databases Lab
8b. Column Oriented Databases Lab
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
3 Git
3 Git3 Git
3 Git
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Hbase an introduction
Hbase an introductionHbase an introduction
Hbase an introduction
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
08 datasets
08 datasets08 datasets
08 datasets
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 

1. Introduction to the Course "Designing Data Bases with Advanced Data Models (NoSQL & Hadoop)"

  • 1. Course  Introduc-on   Designing  Data  Bases  with  Advanced  Data  Models     Dr. Fabio Fumarola
  • 2. Enterprise  compu-ng  evolu-on   •  We’ve  spent  several  years  in  the  world  of  enterprise   compu-ng   •  We’ve  seen  many  things  change  in:   –  Languages,  Architectures,   –  PlaAorms,  and  Processes.   •  But  in  one  thing  we  stayed  constant  –  “rela-onal   database  stored  the  data”   •  The  data  storage  ques-on  for  architects  was:  “which   rela-onal  database  to  use”   1  
  • 3. The  stability  of  the  reign   •  Why?   •  An  organiza-on’s  data  lasts  much  longer  than  its   programs  (COBOL?)   •  It’s  valuable  to  have  stable  data  storage  which  is   accessible  from  many  applica-ons   •  In  the  last  decades  RDBMS  have  been  successful  in   solving  problems  related  to  storing,  serving  and   processing  data.   2  
  • 4. FROM  BUSINESS  TO  DECISION   SUPPORT   Before  going  deep  on  the  course  arguments  let’s  understand  the   evolu-on  of  decision  support  systems.       3  
  • 5. From  business  to  decision  support:  ‘60   •  Star-ng  from  ’60  data  were  stored  using  magne-c   disks.   •  Supported  analysis  where  sta-c,  only  aggregated  and   pre[y  limited   •  For  instance,  it  was  possible  to  extract  the  total   amount  of  last  month  sales   4  
  • 6. From  business  to  decision  support:  ‘80   •  With  rela-onal  databases  and  SQL,  data  analysis  start   to  be  somehow  dynamics.   •  SQL  allows  us  to  extract  data  at  detailed  and   aggregated  level   •  Transac-onal  ac-vi-es  are  stored  in  Online   Transac-on  Process  databases     •  OLTP  are  used  in  several  applica-ons  such  as  orders,   salary,  invoices…   5  
  • 7. From  business  to  decision  support:  ‘80   •  The  best  hypothesis  was  that  the  described  modules   are  included  into  Enterprise  Resource  Planning  (ERP)   sobware   •  Examples  of  such  vendors  are  SAP,  Microsob,  HP  and   Oracle.   •  Normally,  what    happen  is  that  each  module  is   implemented  as  an  ad-­‐hoc  sobware  with  is  own   database.     •  Cons:  Data  representa-on  and  integra-on.   6  
  • 8. OLTP  design   •  Such  kind  of  databases  are  designed  to  be:   –  Strongly  normalized,   –  Fast  in  inser-ng  data.   •  However,  data  normaliza-on:   –  Do  not  foster  the  read  of  huge  quan-ty  of  data,   –  Increase  the  number  of  tables  used  to  store  records.   7  
  • 9. Foster  decision  support   •  In  order  to  support  “Decisions”  we  need  to  extract   de-­‐normalized  data  via  several  JOINS.   •  Moreover,  opera-onal  databases  offer  a  no/limited   visibility  on  historical  data.   Considera-ons:   •  These  factors  make  hard  data  analysis  made  on  OLTP   databases…   8  
  • 10. From  business  to  decision  support:  ‘90   •  Thus  in  ’90  bore  databases  designed  to  support   analysis  from  OLTP  databases.   •  This  is  the  arise  of  Data  Warehouses   •  DWs  are  central  repositories  of  integrated  data  from   one  or  more  disparate  sources.   •  They  store  current  and  historical  data  and  are  used   for  crea-ng  trending  reports  for  management   repor-ng  such  as  annual  and  quarterly  comparisons.   9  
  • 11. Types  of  DWs  systems   •  Data  Mart     –  is  a  simple  form  of  a  data  warehouse  that  is  focused  on  a   single  subject  (or  func-onal  area),  such  as  sales,  finance  or   marke-ng.   •  Online  analy-cal  processing  (OLAP):     –  is  characterized  by  a  rela-vely  low  volume  of  transac-ons.     –  Queries  are  oben  very  complex  and  involve  aggrega-ons.   –  OLAP  databases  store  aggregated,  historical  data  in  mul--­‐ dimensional  schemas.   10  
  • 12. Types  of  DWs  systems   •  Predic-ve  analysis   –  Predic-ve  analysis  is  about  finding  and  quan-fying  hidden   pa[erns  in  the  data  using  complex  mathema-cal  models   that  can  be  used  to  predict  future  outcomes.     –  Predic-ve  analysis  is  different  from  OLAP  in  that  OLAP   focuses  on  historical  data  analysis  and  is  reac-ve  in   nature,  while  predic-ve  analysis  focuses  on  the  future.     11  
  • 13. Data  Warehouse   •  Data  stored  in  DWs  are  the  star-ng  point  for  the   Business  Intelligence  (BI).   •  Def.  “Business  intelligence  (BI)  is  the  set  of   techniques  and  tools  for  the  transforma7on  of  raw   data  into  meaningful  and  useful  informa7on  for   business  analysis  purposes”.   •  With  the  evolu-on  of  BI  systems  we  moved  from  SQL   based  analysis  to  Visual  Instruments.   12  
  • 15. OLAP  features   •  OLAP  systems  support  data  explora-on  through  drill-­‐ down,  drill-­‐up,  slicing  e  dicing  opera-ons.   •  However,  we  s-ll  have  an  historical  point  of  view  of   what  happened  in  the  business  but  now  on  what  is   happening.   •  We  cannot  make  predic-on  on  the  future   14  
  • 16. From  business  to  decision  support:  ‘00   •  Star-ng  from  2000  it  arises  the  necessity  to  do   predic-ve  analysis.   •  The  techniques  in  this  scenario  are  in  the  field  of  Data   Mining   •  Data  Mining  is  the  computa-onal  process  of   discovering  pa[erns  in  large  dataset  involving   methods  at  the  intersec-on  of  ar-ficial  intelligence,   machine  learning,  sta-s-cs,  and  database  systems.   15  
  • 17. From  business  to  decision  support:  ‘00   •  The  overall  goal  of  the  data  mining  process  is  to   extract  novel  informa-on  from  a  data  set  and   transform  it  into  understandable  knowledge.   •  Aside  from  the  raw  analysis  step,  it  involves  database   and  data  management  aspects,  data  pre-­‐processing,   model  and  inference  considera-ons,  interes-ngness   metrics,  complexity  considera-ons,  post-­‐processing   of  discovered  structures,  visualiza-on,  and  online   upda-ng.   16  
  • 18. From  business  to  decision  support:  Recap   •  In  the  last  decades  RDBMS  have  been  successful  in   solving  problems  related  to  storing,  serving  and   processing  data.     17  
  • 19. From  business  to  decision  support:  Recap   •  Vendors  such  as  Oracle,  Ver-ca,  Teradata,  Microsob   and  IBM  proposed  their  solu-on  based  on  Rela-onal   Algebra  and  SQL.   •  With  Data  Mining  we  are  able  to  extract  knowledge   from  data  which  can  be  used  to  support  predic-ve   analysis    (Data  Mining  is  not  only  predic-on!!!).   18  
  • 20. Challenges  of  Scale  Differ  
  • 21. THERE  IS  SOMETHING  THAT  DOES   NOT  WORK!   But   20  
  • 22. 1.  Scaling  Up  Databases   A  ques-on  I’m  oben  asked  about  Heroku  is:  “How  do  you  scale  the  SQL   database?”  There’s  a  lot  of  things  I  can  say  about  using  caching,  sharding,  and   other  techniques  to  take  load  off  the  database.  But  the  actual  answer  is:  we   don’t.  SQL  databases  are  fundamentally  non-­‐scalable,  and  there  is  no  magical   pixie  dust  that  we,  or  anyone,  can  sprinkle  on  them  to  suddenly  make  them   scale.     Adam  Wiggins  Heroku     Adam  Wiggins,  Heroku  Pa[erson,  David;  Fox,  Armando  (2012-­‐07-­‐11).  Engineering  Long-­‐LasGng  SoHware:  An   Agile  Approach  Using  SaaS  and  Cloud  CompuGng,  Alpha  Edi-on  (Kindle  Loca-ons  1285-­‐1288).  Strawberry   Canyon  LLC.  Kindle  Edi-on.       21  
  • 23. 2.  Data  Variety   •  RDBMs  have  problems  with  Unstructured  and  Semi-­‐ Structured  Data  (varied  data)   22  
  • 30. What  is  the  problem  with  RDBMs   Caching   Master/Slave   Master/Master   Cluster   Table  Par--oning   Federated  Tables   Sharding   Distributed  DBs   29   http://codefutures.com/database-sharding/
  • 31. What  is  the  problem  with  RDBMs   •  RDBMS  can  somehow  deal  with  this  aspects,  but   they  have  issues  related  to:     –  expensive  licensing,   –  requiring  complex  applica-on  logic,   –  Dealing  with  evolving  data  models   •  There  were  a  need  for  systems  that  could:   –  work  with  different  kind  of  data  format,   –  Do  not  require  strict  schema,   –  and  are  easily  scalable.   30  
  • 32. NOSQL:  THE  NEW  CHALLENGER!   Help!!!   31  
  • 33. NoSQL   •  It  is  born  out  of  a  need  to  handle  large  data  volumes   •  It  forces  a  fundamental  shib  to  building  large   hardware  plaAorms  through  clusters  of  commodity   servers.   •  This  need  raises  from  the  difficul-es  of  making   applica-on  code  play  well  with  rela-onal  databases   32  
  • 34. NoSQL   •  The  term  “NoSQL”  is  very  ill-­‐defined.     •  It’s  generally  applied  to  a  number  of  recent  non   rela-onal  databases:  Cassandra,  Mongo,  Neo4j,   Hbase  and  Redis,…   •  They  embrace     –  schemaless  data,     –  run  on  a  cluster,     –  and  have  the  ability  to  trade  off  tradi-onal  consistency  for   other  useful  proper-es   33  
  • 35. Why  are  NoSQL  Databases   Interes-ng   1.    Applica-on  development  produc-vity:   –  A  lot  of  applica-on  development  is  spent  on  mapping  data   between  in  memory  data  structures  and  rela-onal   databases   –  A  NoSQL  database  may  provide  a  data  model  that  can   simplify  that  interac-on  resul-ng  in  less  code  to  write,   debug,  and  evolve.   34  
  • 36. Why  are  NoSQL  Databases   Interes-ng   2.  Large-­‐scale  data:   –  Organiza-ons  are  finding  it  valuable  to  capture  mode   data  and  process  it  more  quickly.   –  They  are  finding  it  expensive  to  do  so  with  rela-onal   databases.   –  NoSQL  database  are  more  economic  if  ran  on  large   cluster  of  many  smaller  an  cheaper  machines.   –  Many  NoSQL  database  are  designed  to  run  on  clusters,  so   they  be[er  fit  on  Big  Data  scenarios.   35  
  • 37. WHY  NOSQL   Internet  Hypertext,  RSS,  Wikis,  blogs,  wikis,  tagging,  user  generated   content,  RDF,  ontologies   36   Connectedness  
  • 38. 37  
  • 39. The  Value  of  Rela-onal  Databases   •  Rela-onal  databases  have  become  such  an   embedded  part  of  our  compu-ng  culture.   •  What  are  the  benefits  they  provide?   38  
  • 40. Getng  at  Persistent  Data   •  The  most  obvious  value  of  a  database  is  keeping   large  amount  of  persistent  data   •  Two  areas  of  memory:   –  Main  memory:  fast,  vola-le,  limited  in  space  and  lose  data   when  it  loses  the  power   –  Backing  store:  larger  but  slower,  commonly  seen  as  a  disk   39  
  • 41. Getng  at  Persistent  Data   •  The  most  obvious  value  of  a  database  is  keeping   large  amount  of  persistent  data   •  Two  areas  of  memory:   –  Main  memory:  fast,  vola-le,  limited  in  space  and  lose  data   when  it  loses  the  power   –  Backing  store:  larger  but  slower,  commonly  seen  as  a  disk   40  
  • 42. Getng  at  Persistent  Data   •  The  backing  store  can  be  organized  in  all  sort  of   ways.   •  For  many  produc-vity  applica-ons  (such  as  word   processors)  it  is  a  file  in  the  file  system.   •  For  most  enterprise  applica-ons,  however,  the   backing  store  is  a  database.   •  A  database  allows  more  flexibility  than  a  file  system.   41  
  • 43. Concurrency   •  Concurrency  is  notoriously  difficult  to  get  right.   •  Object  oriented  is  not  the  right  programming  model   to  deal  with  concurrency.   •  Since  enterprise  applica-ons  can  have  a  lot  of   concurrent  users,  there  is  a  lot  of  rooms  for  bad   things  to  happen.   •  Rela-on  databases  have  transac-ons  that  help   mi-ga-ng  this  problem,  but….   42  
  • 44. Concurrency   •  You  s-ll  have  to  deal  with  transac-onal  error  when   you  try  to  book  a  room  that  is  just  gone.   •  The  reality  is  that  the  transac-onal  mechanism  has   worked  well  to  contain  the  complexity  of   concurrency.   •  Transac-on  with  rollback  allows  as  to  deal  with   errors.   43  
  • 45. Integra-on   •  Enterprise  applica-ons  live  in  a  rich  ecosystem     •  mul-ple  applica-on  wri[en  by  different  teams  need  to   collaborate  in  order  to  get  things  done   •  This  collabora-on  is  done  via  data  sharing.   •  A  common  way  to  do  this  is  shared  database  integraGon   [Hohpe  and  Woolf]  where  mul-ple  applica-ons  store  their   data  into  a  single  database   •  Using  a  single  database,  allows  all  the  applica-on  to  share   data  easily,  while  the  database  concurrency  control   applica-ons  such  as  users.   44  
  • 46. A  (Mostly)  Standard  Model   •  Rela-onal  database  have  succeeded  because  they   have  a  standard  model   •  As  a  result,  developers  and  database  professionals   can  apply  the  same  knowledge  in  several  projects.   •  Although  there  are  differences  between  different   RDBMs,  the  core  mechanism  remain  the  same.   45  
  • 47. Impedance  Mismatch   •  It  is  the  difference  between  the  rela-onal  model  and   the  in-­‐memory  data  structures.   •  Rela-onal  data  organizes  data  into  table  and  rows   (rela-on  and  tuples).     –  A  tuple  is  a  set  of  name-­‐value  pairs   –  A  rela-on  is  a  set  of  tuples.   •  All  the  SQL  opera-ons  consume  and  return  rela-ons.   46  
  • 49. Impedance  Mismatch   •  Tuples  and  rela-on  provides  elegance  and  simplicity,   but  it  also  introduces  limita-ons.   •  In  par-cular,  the  values  in  a  rela-onal  tuple  have  to   be  simple.   •  They  cannot  contain  any  structure,  such  as  nested   record  or  a  list.   •  This  limita-on  is  not  true  for  in  memory  data-­‐ structures.   48  
  • 50. Impedance  Mismatch   •  As  a  result,  if  we  want  to  use  richer  in-­‐memory  data   structure,  we  have  to  translate  it  to  a  rela-onal   representa-on  to  store  in  on  disk.   •  While  object-­‐oriented  language  succeeded,  object-­‐ oriented  databases  faded  into  obscurity.   •  Impedance  mismatch  has  been  made  much  easier  to   deal  with  Object-­‐Rela-onal  Mapping    (ORM)   frameworks  such  as  Eclipse-­‐Link,  Hibernate  and   other  JPA  implementa-ons.   49  
  • 51. Impedance  Mismatch   50   •  ORMs  remove  a  lot  of  work,  but  can  become  a   problem  when  people  try  to  ignore:   –  the  database,  and   –  query  performance  suffer   •  This  is  where  NoSQL  database  works  greatly,  why?  
  • 52. Applica-on  and  Integra-on  DBs   •  This  is  a  event  that  happen  several  -mes  in  SW   projects.   •  In  this  scenario,  the  database  acts  as  an  integra-on   database.   •  The  downsides  to  share  database  are.   51  
  • 53. Applica-on  and  Integra-on  DBs   •  The  downsides  to  share  database  are:   –  Its  structure  tend  to  be  more  complex  than  any  single   applica-on  needs,   –  If  an  applica-on  want  to  make  changes  to  its  data  storage,   it  needs  to  coordinate  with  all  the  other  applica-ons,   –  Performance  degrada-on  due  to  huge  number  of  access   –  Errors  in  database  usage  since  it  is  accessed  by  applica-on   wri[en  by  different  teams.   •  This  is  different  from  single  applica-on  databases   52  
  • 54. Applica-on  and  Integra-on  DBs   •  Interoperability  concerns  can  now  shit  to  interfaces   of  the  applica-on  allowing  interac-on  over  HTTTP.   •  This  is  what  happen  with  micro-­‐services  ( h[p://www.-kalk.com/java/micro-­‐services/)   •  Micro-­‐services  and  Web  services  in  general  enable  a   form  of  communica-ons  based  on  data.   •  Data  is  represented  as  documents  using  before  XML   and  now  JSON  format.   53  
  • 55. Applica-on  and  Integra-on  DBs   •  If  you  are  going  to  use  service  integra-on  using  text   over  HTTP  is  the  way  to  go.   •  However,  if  we  are  dealing  with  performance  there   are  binary  protocols.   54  
  • 56. A[ack  of  the  Clusters   •  In  2000s  several  large  web  proper-es  drama-cally   increase  in  scale:   –  Websites  started  tracking  ac-vity  and  structure  in  detail   (analy-cs)   –  Large  sets  of  data  appeared:  links,  social  networks,  ac-vity   in  logs,  mapping  data.   –  With  this  growth  in  data  came  a  growth  in  users   •  Coping  with  this  increase  in  data  and  traffic  required   more  compu-ng  resources.   55  
  • 57. A[ack  of  the  Clusters   •  There  were  to  choices:   –  Scaling  up   –  Scaling  out   •  Scaling  up  implies  bigger  machines,  more  processors,   disk  storage  and  memory  ($$$).   •  Scaling  out  was  the  alterna-ve:   –  Use  a  lot  of  small  machine  in  a  cluster.   –  It  is  cheap  and  also  more  resilient  (individual  failures)   56  
  • 58. A[ack  of  the  Clusters   •  This  revealed  a  new  problem,  rela-onal  databases   are  note  designed  to  run  on  clusters.   •  Clustered  rela-onal  databases  (e.g.  Oracle  RAC  or   Microsob  SQL  Server)  work  on  the  concept  of  shared   disk  subsystem.   •  RDBMs  can  also  run  on  separate  server  with   different  sets  of  data  (sharding)   57  
  • 59. A[ack  of  the  Clusters   •  However,  it  needs  an  applica-on  to  control  the   sharded-­‐database.   •  Also  we  lose  querying,  referen-al  integrity,   transac-ons  or  consistency  control  cross  shard.   •  These  technical  issues  are  exacerbated  by  licensing   cots.   •  This  mismatch  between  DBs  and  clusters  led  some   organiza-ons  to  consider  different  solu-ons   58  
  • 60. The  emergence  of  NoSQL   •  Two  companies  in  par-cular  –  Google  and  Amazon  –   have  been  very  influen-al.   •  They  were  capturing  a  large  amount  of  data  and   their  business  is  on  data  management   •  Both  companies  produces  influen-al  papers:   –  BigTable  from  Google   –  Dynamo  DB  from  Amazon   59  
  • 61. The  emergence  of  NoSQL   •  As  part  of  innova-on  in  data  management  system,  several   new  technologies  where  built:   –  2003  -­‐  Google  File  System,   –  2004  -­‐  MapReduce,   –  2006  -­‐  BigTable,   –  2007  -­‐  Amazon  DynamoDB   –  2012  Google  Cloud  Engine   •  Each  solved  different  use  cases  and  had  a  different  set  of   assump-ons.   •  All  these  mark  the  beginning  of  a  different  way  of  thinking   about  data  management.   60  
  • 62. The  emergence  of  NoSQL   •  It  is  irony  that  the  term  “NoSQL”  appeared  in  late   90s  from  a  rela-onal  database  made  by  Carlo  Strozzi.   •  The  name  comes  from  the  fact  that  it  does  not  used   SQL  as  query  language.   •  However,  the  usage  of  “NoSQL”  as  we  consider   today  come  from  a  meetup  on  2009  in  San  Francisco.   •  They  want  a  term  that  can  be  used  as  Twi[er   hashtag.  #NoSQL   61  
  • 63. NoSQL  Characteris-cs   1.  They  don’t  use  SQL.  (HBase,  Cassandra,  Redis…)   2.  They  are  generally  open-­‐source  projects.   3.  Most  of  them  are  designed  to  run  on  clusters.     4.  RDBMs  used  ACID  transac-ons  to  handle   consistency  across  the  whole  database.  NoSQL   resort  to  other  op-ons  (CAP  theorem).   5.  No  all  are  cluster  oriented  (Graph  DBs)   6.  NoSQL  operate  without  a  schema.  (schema  free)   62  
  • 64. The  emergence  of  NoSQL   •  NoSQL  does  not  stands  for  Not-­‐Only  SQL.   •  It  is  be[er  to  NoSQL  as  a  movement  rather  than  a   techonology.   •  RDBMs  are  not  going  away.   •  The  change  is  that  rela-onal  databases  are  an  op-on   •  This  point  of  view  is  oben  referred  to  as  polyglot   persistence   63  
  • 65. The  emergence  of  NoSQL   •  Instead  of  just  picking  a  rela-onal  database,  we  need   to  understand:   1.  The  nature  of  the  data  we  are  storing,  and   2.  How  we  want  to  manipulate  it.   •  In  order  to  deal  with  this  change  most  organiza-ons   need  to  shib  from  integra-on  database  to   applica-on  database.   •  In  this  course  we  concentrate  on  Big  Data  running   on  clusters.   64  
  • 66. The  emergence  of  NoSQL   •  The  Big  Data  concerns  have  created  an  opportunity   for  people  to  think  freshly  about  their  storage  needs.   •  NoSQL  help  developer  produc-vity  by  simplifying   their  database  access  even  if  they  have  no  need  to   scale  beyond  single  machine.   65  
  • 67. 66