SlideShare a Scribd company logo
1 of 56
Download to read offline
From Oracle to
MongoDB
A real use case at
Telefónica PDI




Pablo Enfedaque
pev@tid.es
06.10.2012
Content
      Introduction
      • Telefónica PDI. Who?
01
   • Personalisation Server. Why? What?

      The SQL version
      • Data model and architecture 
02
   • Integrations, problems and improvements
      The NoSQL version
      • Data model and architecture
03
   • Performance boost
      • The bad
      Conclusions
      • Conclusions
04
   • Personal thoughts
01
Título del capítulo
Introduction
Máximo 3 líneas
01
      Telefónica PDI. Who?


      •  Telefónica
             §  Fifth largest telecommunications company in the world
             §  Operations in Europe (7 countries), the United States and Latin America
                (15 countries)


      •  Telefónica Digital
             §  Web and mobile digital contents and services division


      •  Product Development and Innovation unit
             §  Formerly Telefónica R&D
             §  Product & service development, platforms development, research,
                technology strategy, user experience and deployment & operation
             §  Around 70 different on going projects at all time.




Telefónica PDI
                                 4
01
      Personalisation Server. What?


      •  User profiling system

      •  Machine learning

      •  Recommendations

      •  Customer’s profile storage


Telefónica PDI
                       5
01
      Opt-in and profile module. Why?


      •  Users data, profile and permissions, was scattered across different
           storages




                              • Gender
              IPTV service
   • Film and music preferences
                                                                 So you want to
                  Mobile      • Permission to contact by SMS?
      know my
                  service
    • Gender
                            address…
                                                                    AGAIN?!
             Music tickets    • Address
               service
       • Music preferences

                Location      • Address
              based offers
   • Permission to contact by SMS?



Telefónica PDI
                                      6
01
      Opt-in and profile module. Why?


      •  Users data, profile and permissions, was scattered across different
           storages




                              • Gender
              IPTV service
   • Film and music preferences

                  Mobile      • Permission to contact by SMS?
                  service
    • Gender

             Music tickets    • Address
               service
       • Music preferences

                Location      • Address
              based offers
   • Permission to contact by SMS?



Telefónica PDI
                                      7
01
      Opt-in and profile module. Why?


      •  Provide a module to become master
           customer’s data storage




                              •  Gender
              IPTV service
   •  Film and music
                                 preferences
                              •  Permission to contact
                  Mobile
                                 by SMS?
                  service
    •  Address

             Music tickets
               service

                Location
              based offers


Telefónica PDI
                                   8
01
      Opt-in and profile module. What?


      •  Features:
             §  Flexible profile definition, classified in services

             §  Profile sharing options between different services

             §  Real time API

             §  Supplementary offline batch interface

             §  Authorization system

             §  High availability

             §  Inexpensive solution & hardware




Telefónica PDI
                                  9
02
The SQL capítulo
Título del solution
Máximo 3 líneas
02
      Data model
      Services, users and their profile


      •  Services defined a set of attributes (their profile), with default
         value and data type
      •  Users were registered in services
      •  Users defined values for some of the services attributes
      •  Each attribute value had an update date to avoid overwriting newer
           changes through batch loads




Telefónica PDI
                           11
02
      Data model
      Services profile sharing matrix


      •  Services could access attributes declared inside other services
      •  There were sharing rights for read or read and write
      •  The user had to be registered in both services




Telefónica PDI
                           12
02
      Data model
      Authorization system


      •  Everything that could be accessed in the PS was a resource
      •  Roles defined access rights (read or read and write) of resources
      •  Auth users had roles
      •  Roles could include other roles




Telefónica PDI
                         13
02
      Data model
      Bonus features!


      •  Multiple IDS:
             §  Users profile could be accessed with different equivalent IDs depending
                on the service
             §  Each user ID was defined by an ID type (phone number, email, portal ID,
                hash…) and the ID value




Telefónica PDI
                                14
02
      High level logical architecture




             §  Everything running on Red Hat EL 5.4 64 bits


Telefónica PDI
                                15
02
      High level logical architecture




             §  Everything running on Red Hat EL 5.4 64 bits


Telefónica PDI
                                16
02
      Integration
      Planned integration


   •  PS replaces all customers profile and
        permissions DBs

   •  All systems access this data through
        PS real time API

   •  In special cases, some PS-consumers
        could use the batch interface.

   •  The same way new services could be
        added quite easily




Telefónica PDI
                           17
02
       Integration
       Problems arise


   •  Budget restrictions: adapt all services
        to use the API was too expensive

   •  Keep independent systems DBs and
        synchronize PS through batch

   •  Use DBs built-in massive extraction
        feature to generate daily batch files
   
   •  However… in most cases those DBs
        were not able to generate Delta
        (only changes) extractions
         §  Provide full daily snapshots!




Telefónica PDI
                              18
02
      First version performance
      Ireland


      •  1.8M customers, 180 profile attributes, 6 services
      •  Sizes
             §  Tables + indexes size: 65Gb
             §  30% of the size were indexes


      •  Batch
             §  Full DWH customer’s profile import: > 24 hours
             §  Delta extractions: 4 - 6 hours
             §  Loads and extractions performance proportional to data size


      •  API:
             §  Response time with average traffic: 110ms



Telefónica PDI
                                 19
03
The SQL capítulo
Título del solution
Second 3 líneas
Máximo version
03
      Second version
      High level logical architecture




      •  New approach: batch processes access directly DB
Telefónica PDI
                          21
03
      Second version
      Batch processes


      •  Batch processes had to
             §  Validate authentication and authorization

             §  Verify user, service and attribute existence

             §  Check equivalent IDs

             §  Validate sharing matrix rights

             §  Validate values data type

             §  Check the update date of the existing values




Telefónica PDI
                                   22
03
      Second version
      DB Batch processing




                             
                         BAs
                  O ur D



Telefónica PDI
                  23
03
      Second version
      New DB-based batch loading process


      •  Preprocess incoming batch file in BE servers
             §  Validate format, services and attributes existence and values data types
             §  Generate intermediate file with structure like target DB table


      •  Load intermediate file (Oracle’s SQL*Loader) to a temporal table
      •  Switch DB to “deferred writing”, storing all incoming modifications
      •  Merge temporal table and final table, checking values update date
      •  Replace old users attributes values table with merge result
      •  Apply deferred writing operations
Telefónica PDI
                                 24
03
      Second version
      New batch extraction process


      •  Generate a temporal DB table with format similar to final batch file.
           Two loops over users attributes values table required:
             §  Select format of the table; number and order of columns / attributes
             §  Fill the new table
      
      •  Loop the whole temporal table for final formatting (empty fields…)
      •  From batch side loop across the whole table (SELECT * FROM …)
      
      •  Write each retrieved row as a line in the resulting file



Telefónica PDI
                                 25
03
      Second version performance
      Ireland performance requirements


      •  Batch time window: 3:30 hours
             §  Full DWH load
             §  Two Delta loads
             §  Three Delta extractions



      •  API:
             §  Ireland requirement: < 500ms




Telefónica PDI
                                 26
03
      Second version performance
      Ireland


      •  1.8M customers, 180 profile attributes, 6 services
      •  Sizes
             §    Tables + indexes size: 65Gb
             §    30% of the size were indexes
             §    Temporal tables size increases almost exponentially: 15Gb and above
             §    Intermediate file size: from 700Mb to 7Gb
      •  Batch
             §    Full DWH customer’s profile import: 2:30 hours
             §    Delta extractions: 1:00 hour
             §    Loads performance worsened quickly (almost exp): 6:00 hours
             §    Extractions performance proportional to data size
             §    Concurrent batch processes may halt the DB
      •  API:
             §  Response time with average traffic: 80ms
             §  Response time while loading was unpredictable: >300ms

Telefónica PDI
                                 27
04
The SQL capítulo
Título del solution
Máximo 3 líneas
Third version
04
      Third version
      Speed up DB Batch processes




                    
               gain)
         A s (a
   Our DB

Telefónica PDI
                      29
04
      Third version
      New (second) DB-based batch loading process


      •  Minor preprocessing of incoming batch file in BE servers
             §  Just validate format
             §  No intermediate file needed!


      •  Load validated file (Oracle’s SQL*Loader) to a temporal table
      
      •  Loop the temporal table merging the values into final table, checking
           values update date and data types
             §  Use several concurrent writing jobs


      •  Store results on real table, no need to replace!
      •  No “deferred writing”!

Telefónica PDI
                                 30
04
      Third version
      Enhancements to extraction process


      •  Optimized loops to generate temporal output table.
             §  Use several concurrent writing jobs
             §  We achieved a speed-up of between 1.5 and 2


      •  Loop the whole temporal table for final formatting (empty fields…)
      
      •  Download and write lines directly inside Oracle’s sqlplus
      •  No SELECT * FROM … query from Batch side!



Telefónica PDI
                               31
04
      Third version performance
      Ireland


      •  1.8M customers, 180 profile attributes, 6 services
      •  Sizes
             §  Tables + indexes size: 65Gb
             §  30% of the size were indexes
             §  Temporal tables: 15Gb


      •  Batch
             § Full DWH customer’s profile import: 1:10 hours (vs. 2:30 hours)
             § Three Delta extractions: 2:15 hours (vs. 3:00 hours)
             § Loads and extractions performance proportional to data size
             § Concurrent batch processes not so harmful
                                                                   s
                                                               DBA
      •    API:
                                           Our
                                                                     F**K YEAH
             §  Response time with average traffic: 110ms
             §  Response time while loading: 400ms

Telefónica PDI
                                 32
04
      Third version performance
      United Kingdom


      •  25M customers, 150 profile attributes, 15 services
      •  Sizes
             §  Tables + indexes size: 700Gb
             §  40% of the size were indexes


      •  Batch
             §  Two Delta imports: < 2:00 hours
             §  Two Delta extractions: < 2:00 hours
             §  Loads and extractions performance proportional to data size


      •  API:
             §  Response time with average traffic: 90ms
           s
                                                                DBA
                                                            Our
                                                                         F**K YEAH

Telefónica PDI
                                 33
04
      Third version performance


                  Ireland
         3rd version
           2nd version
        DB size
               65Gb + 15Gb (temp)
         65Gb + > 15Gb
        Full DWH load
                  1:10 hours
             2:30 hours
        Three Delta exports
            2:15 hours
             3:00 hours
        Batch stability
              Stable, linear
 Unstable, exponential
        API response time
                  110ms
                  110ms
        API while loading
                  400ms
           Unpredictable

            United Kingdom
        3rd version
        DB size
                            700Gb
                                                               s
        Two Delta loads
              < 2:00 hours
        DBA
                                                       Our
        Three Delta exports
          < 2:00 hours
                 F**K YEAH
        API response time
                   90ms


Telefónica PDI
                            34
04
      Third version performance
      DB stats


      •  20 database tables
      •  API: several queries with up to 35 joins and even some unions
      •  Authorization: 5 joins to validate auth users access
      •  Batch:
             §  Load: 1700 lines of PL/SQL
             §  Extraction: 1200 of PL/SQL




Telefónica PDI
                               35
04
      Mission completed?




Telefónica PDI
             36
04
      Third version performance
      Mexico


      •  20M customers, 200 profile attributes, 10 services
      •  Mexico time window: 4:00 hours
             §  Full DWH load!
             §  Additional Delta feeds loads
             §  At least two Delta extractions



                                                       
                                                 D BAs
                                             Our




Telefónica PDI
                                   37
05
Título del capítulo
The NoSQL solution
Máximo 3 líneas
05
      MongoDB Data Model
      Services and their profile + sharing matrix
                  { _id : 7,
                     service_name : "root",
                     id_type : 1,
                     default_values: false,
                                                      attrib_id = service_id * 10000 + num attribs + 1
                     owned_attribs : 
                     [
                         {
                              attrib_id : 70005,
                              attrib_nane : “marketing.consent",
                              attrib_data_type : 1,
                              attrib_def_value : "no",
                              attrib_status : 1
                         },           ...
       attrib_id = service_id * 10000 + num attribs + 1
                     ],
                     shared_attribs : 
                     [
                        {attrib_id : 20144, sharing_mode : 0},
                         ...
                     ]
                  }

Telefónica PDI
                                    39
05
      MongoDB Data Model
      Users and their profile + multiple IDs
             {
                  _id : "011234"
                                               Equivalent ID document:
                  services_list :
                                            
                  [
                    _id = “id type” + “user ID”
          {
                      {
                                                          _id : “05abcd"
                           service_id : 1,
                                       ue : "011234"
                           reg_date : {"$date" : 1318040693000}
              }
                      },
                      ...
                                                        _id = “id type” + “user ID”
                  ],
                  user_values : 
                                                 attrib_id = service_id * 10000 + num attribs + 1
                  [
                      {
                           attrib_id : 10140,
                           attrib_value : "Open",
                           update_date : {"$date" : 1317110161000}
                      },
                      ...
                  ]
             }

Telefónica PDI
                                          40
05
      MongoDB Data Model
      Authorization system
                                ROLES COLLECTION:
                                                           
                                                           {
        AUTH USERS COLLECTION:
                              _id: 'PS_ADMIN_ROLE',
        
                                                    roles_resources: [
        {
                                                       {
          _id: "admin"
                                               resource_id: "admin.**”,
          auth_pswd: ”XXX",
                                          method: 'R' },
          auth_roles: ['PS_ADMIN_ROLE’, …],
                     {
          auth_uris: [
                                               resource_id: "stats.**”,
              {uri_path: "/**", method: 'R'},
                        method: 'IMPORT' },
              {uri_path: "/stats/**", method: 'RW'},
            ...
              {uri_path: "/kpis/**", method: ’IMPORT'},
        ]
              ...
                                         }
             ]
        }
                                                 RESOURCES COLLECTION:
                                                           
                                                           {
                                                             _id: "admin.**",
       Replicate uris (from resources)
                                                             role_uri: "/**"
         and methods (from roles)
                                                           }


Telefónica PDI
                               41
05
      MongoDB Data Model
      DB stats


      •  Only 5 collections
      •  API: typically 2 accesses (services and users collections)
      •  Authorization: access only 1 collection to grant access
      •  Batch: all processing done outside DB




Telefónica PDI
                            42
05
      NoSQL version
      High level logical architecture




             §  Everything running on Red Hat EL 6.2 64 bits


Telefónica PDI
                                43
05
      NoSQL version performance
      Ireland (at PDI lab)


      •  1.8M customers, 180 profile attributes, 6 services
      •  Sizes
             §  Collections + indexes size: 20Gb (vs. 65Gb)
             §  < 5% of the size are indexes (vs. 30%)


      •  Batch
             §    Full DWH customer’s profile import: 0:12 hours (vs. 1:10 hours)
             §    Three Delta extractions: 0:40 hours (vs. 2:15 hours)
             §    Loads and extractions performance proportional to data size
             §    Concurrent batch processes without performance affection


      •  API:
             §  Response time with average traffic: < 10ms (vs. 110ms)
             §  Response time while loading: the same
             §  High load (600 TPS) response time while loading: 300ms

Telefónica PDI
                                 44
05
      NoSQL version performance
      United Kingdom (at PDI lab)


      •  25M customers, 150 profile attributes, 15 services
      •  Sizes
             §  Collections + indexes size: 210Gb (vs. 700Gb)
             §  < 5% of the size were indexes


      •  Batch
             §  Two Delta imports: < 0:40 hours (vs. 2:00 hours)
             §  Loads and extractions performance proportional to data size




Telefónica PDI
                                45
05
      NoSQL version performance
      Mexico


      •  20M customers, 200 profile attributes, 15 services
      •  Sizes
             §  Collections + indexes size: 320Gb
             §  Indexes size: 1.2Gb


      •  Batch
             §  Initial Full import (20M, 40 attributes): 2:00 hours
             §  Small Full import (20M, 6 attributes): 0:40 hours


      •  API:
             §  Response time with average traffic: < 10ms (vs. 90ms)
             §  Response time while loading: the same
             §  High load (500 TPS) response time while loading: 270ms




Telefónica PDI
                                  46
04
      NoSQL version performance
                 Ireland
            NoSQL version
     SQL version
      DB size
                                20Gb
               80Gb
      Full DWH load
                     0:12 hours
         1:10 hours
      Three Delta exports
               0:40 hours
         2:15 hours
      API while loading
                    < 10ms
              400ms
      API 600TPS + loading
                  300ms
    Timeout / failure

          United Kingdom
            NoSQL version
     SQL version
      DB size
                               210Gb
              700Gb
      Two Delta loads
                  < 0:40hours
      < 2:00 hours

                 Mexico
             NoSQL version
      DB size
                               320Gb
      Initial Full load (40 attr)
       2:00 hours
                                                                     
      Daily Full load (6 attr)
          0:40 hours
           D BAs
                                                           Our
      API while loading
                    < 10ms
       API 500TPS
Telefónica PDI
     + loading
               270ms
                                               47
05
      Mission completed?




Telefónica PDI
             48
05
      The bad


      •  Batch load process was too fast
             §  To keep secondary nodes synched we needed oplog of 16 or 24Gb
             §  We had to disable journaling for the first migrations

      •  Labels of documents fields take up disc space
             §  Reduced them to just 2 chars: “attribute_id” -> “ai”
      
      •  Respect the unwritten law of at least 70% of size in RAM
      •  Take care with compound indexes, order matters
             §  You can save one index… or you can have problems
             §  Put most important key (never nullable) the first one

      •  DBAs whining and complaining about NoSQL
             §  “If we had enough RAM for all data, Oracle would outperform MongoDB”


Telefónica PDI
                                  49
05
      The ugly


      •  Second migration once the PS is already running
             §  Full import adding 30 new attributes values: 10:00 hours
             §  Full import adding 150 new attributes values: 40:00 hours


      •  Increase considerably documents size (i.e. adding lots of new values
           to the users) makes MongoDB rearrange the documents, performing
           around 5 times slower
             §  That’s a problem when you are updating 10k documents per second


      •  Solutions?
             §  Avoid this situation at all cost. Run away!
             §  Normalize users values; move to a new individual collection
             §  Prealloc the size with a faux field
                  •  You could waste space!
             §  Load in new collection, merge and swap, like we did in Oracle

Telefónica PDI
                                 50
06
Título del capítulo
Conclusions
Título del capítulo

Máximo líneas
Máximo 3 3 líneas
06
      Conclusions & personal thoughts


      •  Awesome performance boost
             §  But not all use cases fit in a MongoDB / NoSQL solution!

      •  New technology, different limitations
      •  Fear of the unknown
             §  SSDs performance?
             §  Long term performance and stability?

      •  Python + MongoDB + pymongo = fast development
             §  I mean, really fast

      •  MongoDB Monitoring Service (MMS)
      •  10gen people were very helpful
Telefónica PDI
                                52
06
      Questions?




Telefónica PDI
     53
0X
      SQL Physical architecture




             §  Scale horizontally adding more BE or DB servers or disks in the SAN
             §  Virtualized or physical servers depending on the deployment


Telefónica PDI
                                 55
0X
      MongoDB Physical architecture




             §  MongoDB arbiters running on BE servers
             §  Scale horizontally adding more BE servers or disks in the SAN
             §  Sharding may already be configured to scale adding more replica sets


Telefónica PDI
                                56

More Related Content

What's hot

eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas MongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source DatabasesIvan Zoratti
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User GuideDeon Huang
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Tugdual Grall
 
Assuring the code quality of share point solutions and apps - Matthias Einig
Assuring the code quality of share point solutions and apps - Matthias EinigAssuring the code quality of share point solutions and apps - Matthias Einig
Assuring the code quality of share point solutions and apps - Matthias EinigSPC Adriatics
 
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce Lucidworks
 
E-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comE-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comMongoDB
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB
 
Webinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDBWebinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDBMongoDB
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB
 
NoSQL Now: Postgres - The NoSQL Cake You Can Eat
NoSQL Now: Postgres - The NoSQL Cake You Can EatNoSQL Now: Postgres - The NoSQL Cake You Can Eat
NoSQL Now: Postgres - The NoSQL Cake You Can EatDATAVERSITY
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarImpetus Technologies
 
What's New in SharePoint 2016 for End Users Webinar with Intlock
What's New in SharePoint 2016 for End Users Webinar with IntlockWhat's New in SharePoint 2016 for End Users Webinar with Intlock
What's New in SharePoint 2016 for End Users Webinar with IntlockVlad Catrinescu
 
31st TWNC IP OPM and TWNOG: RDAP and RPKI
31st TWNC IP OPM and TWNOG: RDAP and RPKI31st TWNC IP OPM and TWNOG: RDAP and RPKI
31st TWNC IP OPM and TWNOG: RDAP and RPKIAPNIC
 
SharePoint 2016 Beta 2 What's new (End users and IT Pros) Microsoft Innovat...
SharePoint 2016   Beta 2 What's new (End users and IT Pros) Microsoft Innovat...SharePoint 2016   Beta 2 What's new (End users and IT Pros) Microsoft Innovat...
SharePoint 2016 Beta 2 What's new (End users and IT Pros) Microsoft Innovat...serge luca
 
API Economy, Realizing the Business Value of APIs
API Economy, Realizing the Business Value of APIsAPI Economy, Realizing the Business Value of APIs
API Economy, Realizing the Business Value of APIsColdFusionConference
 

What's hot (20)

eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
Assuring the code quality of share point solutions and apps - Matthias Einig
Assuring the code quality of share point solutions and apps - Matthias EinigAssuring the code quality of share point solutions and apps - Matthias Einig
Assuring the code quality of share point solutions and apps - Matthias Einig
 
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce
Who Moved my State? A Blob Storage Solr Story - Ilan Ginzburg, Salesforce
 
E-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.comE-Commerce and MongoDB at Backcountry.com
E-Commerce and MongoDB at Backcountry.com
 
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant IdeasMongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
MongoDB Evenings DC: MongoDB - The New Default Database for Giant Ideas
 
Webinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDBWebinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDB
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...
 
NoSQL Now: Postgres - The NoSQL Cake You Can Eat
NoSQL Now: Postgres - The NoSQL Cake You Can EatNoSQL Now: Postgres - The NoSQL Cake You Can Eat
NoSQL Now: Postgres - The NoSQL Cake You Can Eat
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
 
What's New in SharePoint 2016 for End Users Webinar with Intlock
What's New in SharePoint 2016 for End Users Webinar with IntlockWhat's New in SharePoint 2016 for End Users Webinar with Intlock
What's New in SharePoint 2016 for End Users Webinar with Intlock
 
31st TWNC IP OPM and TWNOG: RDAP and RPKI
31st TWNC IP OPM and TWNOG: RDAP and RPKI31st TWNC IP OPM and TWNOG: RDAP and RPKI
31st TWNC IP OPM and TWNOG: RDAP and RPKI
 
SharePoint 2016 Beta 2 What's new (End users and IT Pros) Microsoft Innovat...
SharePoint 2016   Beta 2 What's new (End users and IT Pros) Microsoft Innovat...SharePoint 2016   Beta 2 What's new (End users and IT Pros) Microsoft Innovat...
SharePoint 2016 Beta 2 What's new (End users and IT Pros) Microsoft Innovat...
 
API Economy, Realizing the Business Value of APIs
API Economy, Realizing the Business Value of APIsAPI Economy, Realizing the Business Value of APIs
API Economy, Realizing the Business Value of APIs
 

Viewers also liked

Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyJohn Kanagaraj
 
Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com MongoDB
 
Practical Ruby Projects With Mongo Db
Practical Ruby Projects With Mongo DbPractical Ruby Projects With Mongo Db
Practical Ruby Projects With Mongo DbAlex Sharp
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Trading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDB
Trading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDBTrading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDB
Trading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDBMongoDB
 
Mongo db with spring data document
Mongo db with spring data documentMongo db with spring data document
Mongo db with spring data documentSean Lee
 
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseTop 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseMongoDB
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMongoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 

Viewers also liked (11)

Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the ugly
 
Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com
 
Practical Ruby Projects With Mongo Db
Practical Ruby Projects With Mongo DbPractical Ruby Projects With Mongo Db
Practical Ruby Projects With Mongo Db
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Trading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDB
Trading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDBTrading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDB
Trading up: Adding Flexibility and Scalability to Bouygues Telecom with MongoDB
 
Mongo db with spring data document
Mongo db with spring data documentMongo db with spring data document
Mongo db with spring data document
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data WarehouseTop 5 Things to Know About Integrating MongoDB into Your Data Warehouse
Top 5 Things to Know About Integrating MongoDB into Your Data Warehouse
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 

Similar to From Oracle to MongoDB

Is Office 365 Right For You? Aptera Software presentation
Is Office 365 Right For You? Aptera Software presentationIs Office 365 Right For You? Aptera Software presentation
Is Office 365 Right For You? Aptera Software presentationAptera Inc
 
3 Reasons VoIP is Key to Lead Nurturing Success
3 Reasons VoIP is Key to Lead Nurturing Success3 Reasons VoIP is Key to Lead Nurturing Success
3 Reasons VoIP is Key to Lead Nurturing SuccessCole Information
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...Precisely
 
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10Waqas Ahmed Nawaz
 
O365Engage17 - Skype for Business Cloud PBX in the Real World
O365Engage17 - Skype for Business Cloud PBX in the Real WorldO365Engage17 - Skype for Business Cloud PBX in the Real World
O365Engage17 - Skype for Business Cloud PBX in the Real WorldNCCOMMS
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableTim Case
 
Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019Slobodan Sipcic
 
Lync online: How the cloud is changing the way we communicate
Lync online: How the cloud is changing the way we communicateLync online: How the cloud is changing the way we communicate
Lync online: How the cloud is changing the way we communicatePerficient, Inc.
 
Hyperledger Fabric Update - June 2018
Hyperledger Fabric Update - June 2018Hyperledger Fabric Update - June 2018
Hyperledger Fabric Update - June 2018Arnaud Le Hors
 
Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...
Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...
Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...Nordic Infrastructure Conference
 
Lync 2010 Voice Deployment
Lync 2010 Voice DeploymentLync 2010 Voice Deployment
Lync 2010 Voice DeploymentHarold Wong
 
FIWARE Global Summit - Towards an Economy of Data
FIWARE Global Summit - Towards an Economy of DataFIWARE Global Summit - Towards an Economy of Data
FIWARE Global Summit - Towards an Economy of DataFIWARE
 
Systematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesSystematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesPradeep Dalvi
 
FreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBCFreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBCTelcoBridges Inc.
 
SDN, NFV and customer centric networks
SDN, NFV and customer centric networksSDN, NFV and customer centric networks
SDN, NFV and customer centric networksPatrick Lopez
 
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and TelegrafHow to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and TelegrafInfluxData
 
FreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBCFreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBCAlan Percy
 
Lighthouse20100120
Lighthouse20100120Lighthouse20100120
Lighthouse20100120sureddy
 

Similar to From Oracle to MongoDB (20)

Is Office 365 Right For You? Aptera Software presentation
Is Office 365 Right For You? Aptera Software presentationIs Office 365 Right For You? Aptera Software presentation
Is Office 365 Right For You? Aptera Software presentation
 
3 Reasons VoIP is Key to Lead Nurturing Success
3 Reasons VoIP is Key to Lead Nurturing Success3 Reasons VoIP is Key to Lead Nurturing Success
3 Reasons VoIP is Key to Lead Nurturing Success
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 10
 
O365Engage17 - Skype for Business Cloud PBX in the Real World
O365Engage17 - Skype for Business Cloud PBX in the Real WorldO365Engage17 - Skype for Business Cloud PBX in the Real World
O365Engage17 - Skype for Business Cloud PBX in the Real World
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner Cable
 
Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019Toyota Financial Services Digital Transformation - Think 2019
Toyota Financial Services Digital Transformation - Think 2019
 
Lync online: How the cloud is changing the way we communicate
Lync online: How the cloud is changing the way we communicateLync online: How the cloud is changing the way we communicate
Lync online: How the cloud is changing the way we communicate
 
Hyperledger Fabric Update - June 2018
Hyperledger Fabric Update - June 2018Hyperledger Fabric Update - June 2018
Hyperledger Fabric Update - June 2018
 
Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...
Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...
Justin Morris - Enhancing your lync 2013 rollout to make it a killer success ...
 
Lync 2010 Voice Deployment
Lync 2010 Voice DeploymentLync 2010 Voice Deployment
Lync 2010 Voice Deployment
 
FIWARE Global Summit - Towards an Economy of Data
FIWARE Global Summit - Towards an Economy of DataFIWARE Global Summit - Towards an Economy of Data
FIWARE Global Summit - Towards an Economy of Data
 
Systematic Migration of Monolith to Microservices
Systematic Migration of Monolith to MicroservicesSystematic Migration of Monolith to Microservices
Systematic Migration of Monolith to Microservices
 
FreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBCFreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBC
 
Office365
Office365Office365
Office365
 
SDN, NFV and customer centric networks
SDN, NFV and customer centric networksSDN, NFV and customer centric networks
SDN, NFV and customer centric networks
 
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and TelegrafHow to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
 
FreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBCFreeSBC - A New Approach to the SBC
FreeSBC - A New Approach to the SBC
 
Lighthouse20100120
Lighthouse20100120Lighthouse20100120
Lighthouse20100120
 
Lighthouse 20100120
Lighthouse 20100120Lighthouse 20100120
Lighthouse 20100120
 

More from Pablo Enfedaque

EuroPython 2015 - Decorators demystified
EuroPython 2015 - Decorators demystifiedEuroPython 2015 - Decorators demystified
EuroPython 2015 - Decorators demystifiedPablo Enfedaque
 
Execution model and other must-know's
Execution model and other must-know'sExecution model and other must-know's
Execution model and other must-know'sPablo Enfedaque
 
Sprayer: low latency, reliable multichannel messaging
Sprayer: low latency, reliable multichannel messagingSprayer: low latency, reliable multichannel messaging
Sprayer: low latency, reliable multichannel messagingPablo Enfedaque
 
The (unknown) collections module
The (unknown) collections moduleThe (unknown) collections module
The (unknown) collections modulePablo Enfedaque
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to comePablo Enfedaque
 

More from Pablo Enfedaque (8)

EuroPython 2015 - Decorators demystified
EuroPython 2015 - Decorators demystifiedEuroPython 2015 - Decorators demystified
EuroPython 2015 - Decorators demystified
 
Why I miss MongoDB
Why I miss MongoDBWhy I miss MongoDB
Why I miss MongoDB
 
Python 2 vs. Python 3
Python 2 vs. Python 3Python 2 vs. Python 3
Python 2 vs. Python 3
 
Execution model and other must-know's
Execution model and other must-know'sExecution model and other must-know's
Execution model and other must-know's
 
Sprayer: low latency, reliable multichannel messaging
Sprayer: low latency, reliable multichannel messagingSprayer: low latency, reliable multichannel messaging
Sprayer: low latency, reliable multichannel messaging
 
The (unknown) collections module
The (unknown) collections moduleThe (unknown) collections module
The (unknown) collections module
 
Decorators demystified
Decorators demystifiedDecorators demystified
Decorators demystified
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to come
 

From Oracle to MongoDB

  • 1. From Oracle to MongoDB A real use case at Telefónica PDI Pablo Enfedaque pev@tid.es 06.10.2012
  • 2. Content Introduction • Telefónica PDI. Who? 01 • Personalisation Server. Why? What? The SQL version • Data model and architecture 02 • Integrations, problems and improvements The NoSQL version • Data model and architecture 03 • Performance boost • The bad Conclusions • Conclusions 04 • Personal thoughts
  • 4. 01 Telefónica PDI. Who? •  Telefónica §  Fifth largest telecommunications company in the world §  Operations in Europe (7 countries), the United States and Latin America (15 countries) •  Telefónica Digital §  Web and mobile digital contents and services division •  Product Development and Innovation unit §  Formerly Telefónica R&D §  Product & service development, platforms development, research, technology strategy, user experience and deployment & operation §  Around 70 different on going projects at all time. Telefónica PDI 4
  • 5. 01 Personalisation Server. What? •  User profiling system •  Machine learning •  Recommendations •  Customer’s profile storage Telefónica PDI 5
  • 6. 01 Opt-in and profile module. Why? •  Users data, profile and permissions, was scattered across different storages • Gender IPTV service • Film and music preferences So you want to Mobile • Permission to contact by SMS? know my service • Gender address… AGAIN?! Music tickets • Address service • Music preferences Location • Address based offers • Permission to contact by SMS? Telefónica PDI 6
  • 7. 01 Opt-in and profile module. Why? •  Users data, profile and permissions, was scattered across different storages • Gender IPTV service • Film and music preferences Mobile • Permission to contact by SMS? service • Gender Music tickets • Address service • Music preferences Location • Address based offers • Permission to contact by SMS? Telefónica PDI 7
  • 8. 01 Opt-in and profile module. Why? •  Provide a module to become master customer’s data storage •  Gender IPTV service •  Film and music preferences •  Permission to contact Mobile by SMS? service •  Address Music tickets service Location based offers Telefónica PDI 8
  • 9. 01 Opt-in and profile module. What? •  Features: §  Flexible profile definition, classified in services §  Profile sharing options between different services §  Real time API §  Supplementary offline batch interface §  Authorization system §  High availability §  Inexpensive solution & hardware Telefónica PDI 9
  • 10. 02 The SQL capítulo Título del solution Máximo 3 líneas
  • 11. 02 Data model Services, users and their profile •  Services defined a set of attributes (their profile), with default value and data type •  Users were registered in services •  Users defined values for some of the services attributes •  Each attribute value had an update date to avoid overwriting newer changes through batch loads Telefónica PDI 11
  • 12. 02 Data model Services profile sharing matrix •  Services could access attributes declared inside other services •  There were sharing rights for read or read and write •  The user had to be registered in both services Telefónica PDI 12
  • 13. 02 Data model Authorization system •  Everything that could be accessed in the PS was a resource •  Roles defined access rights (read or read and write) of resources •  Auth users had roles •  Roles could include other roles Telefónica PDI 13
  • 14. 02 Data model Bonus features! •  Multiple IDS: §  Users profile could be accessed with different equivalent IDs depending on the service §  Each user ID was defined by an ID type (phone number, email, portal ID, hash…) and the ID value Telefónica PDI 14
  • 15. 02 High level logical architecture §  Everything running on Red Hat EL 5.4 64 bits Telefónica PDI 15
  • 16. 02 High level logical architecture §  Everything running on Red Hat EL 5.4 64 bits Telefónica PDI 16
  • 17. 02 Integration Planned integration •  PS replaces all customers profile and permissions DBs •  All systems access this data through PS real time API •  In special cases, some PS-consumers could use the batch interface. •  The same way new services could be added quite easily Telefónica PDI 17
  • 18. 02 Integration Problems arise •  Budget restrictions: adapt all services to use the API was too expensive •  Keep independent systems DBs and synchronize PS through batch •  Use DBs built-in massive extraction feature to generate daily batch files •  However… in most cases those DBs were not able to generate Delta (only changes) extractions §  Provide full daily snapshots! Telefónica PDI 18
  • 19. 02 First version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes •  Batch §  Full DWH customer’s profile import: > 24 hours §  Delta extractions: 4 - 6 hours §  Loads and extractions performance proportional to data size •  API: §  Response time with average traffic: 110ms Telefónica PDI 19
  • 20. 03 The SQL capítulo Título del solution Second 3 líneas Máximo version
  • 21. 03 Second version High level logical architecture •  New approach: batch processes access directly DB Telefónica PDI 21
  • 22. 03 Second version Batch processes •  Batch processes had to §  Validate authentication and authorization §  Verify user, service and attribute existence §  Check equivalent IDs §  Validate sharing matrix rights §  Validate values data type §  Check the update date of the existing values Telefónica PDI 22
  • 23. 03 Second version DB Batch processing BAs O ur D Telefónica PDI 23
  • 24. 03 Second version New DB-based batch loading process •  Preprocess incoming batch file in BE servers §  Validate format, services and attributes existence and values data types §  Generate intermediate file with structure like target DB table •  Load intermediate file (Oracle’s SQL*Loader) to a temporal table •  Switch DB to “deferred writing”, storing all incoming modifications •  Merge temporal table and final table, checking values update date •  Replace old users attributes values table with merge result •  Apply deferred writing operations Telefónica PDI 24
  • 25. 03 Second version New batch extraction process •  Generate a temporal DB table with format similar to final batch file. Two loops over users attributes values table required: §  Select format of the table; number and order of columns / attributes §  Fill the new table •  Loop the whole temporal table for final formatting (empty fields…) •  From batch side loop across the whole table (SELECT * FROM …) •  Write each retrieved row as a line in the resulting file Telefónica PDI 25
  • 26. 03 Second version performance Ireland performance requirements •  Batch time window: 3:30 hours §  Full DWH load §  Two Delta loads §  Three Delta extractions •  API: §  Ireland requirement: < 500ms Telefónica PDI 26
  • 27. 03 Second version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables size increases almost exponentially: 15Gb and above §  Intermediate file size: from 700Mb to 7Gb •  Batch §  Full DWH customer’s profile import: 2:30 hours §  Delta extractions: 1:00 hour §  Loads performance worsened quickly (almost exp): 6:00 hours §  Extractions performance proportional to data size §  Concurrent batch processes may halt the DB •  API: §  Response time with average traffic: 80ms §  Response time while loading was unpredictable: >300ms Telefónica PDI 27
  • 28. 04 The SQL capítulo Título del solution Máximo 3 líneas Third version
  • 29. 04 Third version Speed up DB Batch processes gain) A s (a Our DB Telefónica PDI 29
  • 30. 04 Third version New (second) DB-based batch loading process •  Minor preprocessing of incoming batch file in BE servers §  Just validate format §  No intermediate file needed! •  Load validated file (Oracle’s SQL*Loader) to a temporal table •  Loop the temporal table merging the values into final table, checking values update date and data types §  Use several concurrent writing jobs •  Store results on real table, no need to replace! •  No “deferred writing”! Telefónica PDI 30
  • 31. 04 Third version Enhancements to extraction process •  Optimized loops to generate temporal output table. §  Use several concurrent writing jobs §  We achieved a speed-up of between 1.5 and 2 •  Loop the whole temporal table for final formatting (empty fields…) •  Download and write lines directly inside Oracle’s sqlplus •  No SELECT * FROM … query from Batch side! Telefónica PDI 31
  • 32. 04 Third version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables: 15Gb •  Batch § Full DWH customer’s profile import: 1:10 hours (vs. 2:30 hours) § Three Delta extractions: 2:15 hours (vs. 3:00 hours) § Loads and extractions performance proportional to data size § Concurrent batch processes not so harmful s DBA •  API: Our F**K YEAH §  Response time with average traffic: 110ms §  Response time while loading: 400ms Telefónica PDI 32
  • 33. 04 Third version performance United Kingdom •  25M customers, 150 profile attributes, 15 services •  Sizes §  Tables + indexes size: 700Gb §  40% of the size were indexes •  Batch §  Two Delta imports: < 2:00 hours §  Two Delta extractions: < 2:00 hours §  Loads and extractions performance proportional to data size •  API: §  Response time with average traffic: 90ms s DBA Our F**K YEAH Telefónica PDI 33
  • 34. 04 Third version performance Ireland 3rd version 2nd version DB size 65Gb + 15Gb (temp) 65Gb + > 15Gb Full DWH load 1:10 hours 2:30 hours Three Delta exports 2:15 hours 3:00 hours Batch stability Stable, linear Unstable, exponential API response time 110ms 110ms API while loading 400ms Unpredictable United Kingdom 3rd version DB size 700Gb s Two Delta loads < 2:00 hours DBA Our Three Delta exports < 2:00 hours F**K YEAH API response time 90ms Telefónica PDI 34
  • 35. 04 Third version performance DB stats •  20 database tables •  API: several queries with up to 35 joins and even some unions •  Authorization: 5 joins to validate auth users access •  Batch: §  Load: 1700 lines of PL/SQL §  Extraction: 1200 of PL/SQL Telefónica PDI 35
  • 36. 04 Mission completed? Telefónica PDI 36
  • 37. 04 Third version performance Mexico •  20M customers, 200 profile attributes, 10 services •  Mexico time window: 4:00 hours §  Full DWH load! §  Additional Delta feeds loads §  At least two Delta extractions D BAs Our Telefónica PDI 37
  • 38. 05 Título del capítulo The NoSQL solution Máximo 3 líneas
  • 39. 05 MongoDB Data Model Services and their profile + sharing matrix { _id : 7, service_name : "root", id_type : 1, default_values: false, attrib_id = service_id * 10000 + num attribs + 1 owned_attribs : [ { attrib_id : 70005, attrib_nane : “marketing.consent", attrib_data_type : 1, attrib_def_value : "no", attrib_status : 1 }, ... attrib_id = service_id * 10000 + num attribs + 1 ], shared_attribs : [ {attrib_id : 20144, sharing_mode : 0}, ... ] } Telefónica PDI 39
  • 40. 05 MongoDB Data Model Users and their profile + multiple IDs { _id : "011234" Equivalent ID document: services_list : [ _id = “id type” + “user ID” { { _id : “05abcd" service_id : 1, ue : "011234" reg_date : {"$date" : 1318040693000} } }, ... _id = “id type” + “user ID” ], user_values : attrib_id = service_id * 10000 + num attribs + 1 [ { attrib_id : 10140, attrib_value : "Open", update_date : {"$date" : 1317110161000} }, ... ] } Telefónica PDI 40
  • 41. 05 MongoDB Data Model Authorization system ROLES COLLECTION: { AUTH USERS COLLECTION: _id: 'PS_ADMIN_ROLE', roles_resources: [ { { _id: "admin" resource_id: "admin.**”, auth_pswd: ”XXX", method: 'R' }, auth_roles: ['PS_ADMIN_ROLE’, …], { auth_uris: [ resource_id: "stats.**”, {uri_path: "/**", method: 'R'}, method: 'IMPORT' }, {uri_path: "/stats/**", method: 'RW'}, ... {uri_path: "/kpis/**", method: ’IMPORT'}, ] ... } ] } RESOURCES COLLECTION: { _id: "admin.**", Replicate uris (from resources) role_uri: "/**" and methods (from roles) } Telefónica PDI 41
  • 42. 05 MongoDB Data Model DB stats •  Only 5 collections •  API: typically 2 accesses (services and users collections) •  Authorization: access only 1 collection to grant access •  Batch: all processing done outside DB Telefónica PDI 42
  • 43. 05 NoSQL version High level logical architecture §  Everything running on Red Hat EL 6.2 64 bits Telefónica PDI 43
  • 44. 05 NoSQL version performance Ireland (at PDI lab) •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Collections + indexes size: 20Gb (vs. 65Gb) §  < 5% of the size are indexes (vs. 30%) •  Batch §  Full DWH customer’s profile import: 0:12 hours (vs. 1:10 hours) §  Three Delta extractions: 0:40 hours (vs. 2:15 hours) §  Loads and extractions performance proportional to data size §  Concurrent batch processes without performance affection •  API: §  Response time with average traffic: < 10ms (vs. 110ms) §  Response time while loading: the same §  High load (600 TPS) response time while loading: 300ms Telefónica PDI 44
  • 45. 05 NoSQL version performance United Kingdom (at PDI lab) •  25M customers, 150 profile attributes, 15 services •  Sizes §  Collections + indexes size: 210Gb (vs. 700Gb) §  < 5% of the size were indexes •  Batch §  Two Delta imports: < 0:40 hours (vs. 2:00 hours) §  Loads and extractions performance proportional to data size Telefónica PDI 45
  • 46. 05 NoSQL version performance Mexico •  20M customers, 200 profile attributes, 15 services •  Sizes §  Collections + indexes size: 320Gb §  Indexes size: 1.2Gb •  Batch §  Initial Full import (20M, 40 attributes): 2:00 hours §  Small Full import (20M, 6 attributes): 0:40 hours •  API: §  Response time with average traffic: < 10ms (vs. 90ms) §  Response time while loading: the same §  High load (500 TPS) response time while loading: 270ms Telefónica PDI 46
  • 47. 04 NoSQL version performance Ireland NoSQL version SQL version DB size 20Gb 80Gb Full DWH load 0:12 hours 1:10 hours Three Delta exports 0:40 hours 2:15 hours API while loading < 10ms 400ms API 600TPS + loading 300ms Timeout / failure United Kingdom NoSQL version SQL version DB size 210Gb 700Gb Two Delta loads < 0:40hours < 2:00 hours Mexico NoSQL version DB size 320Gb Initial Full load (40 attr) 2:00 hours Daily Full load (6 attr) 0:40 hours D BAs Our API while loading < 10ms API 500TPS Telefónica PDI + loading 270ms 47
  • 48. 05 Mission completed? Telefónica PDI 48
  • 49. 05 The bad •  Batch load process was too fast §  To keep secondary nodes synched we needed oplog of 16 or 24Gb §  We had to disable journaling for the first migrations •  Labels of documents fields take up disc space §  Reduced them to just 2 chars: “attribute_id” -> “ai” •  Respect the unwritten law of at least 70% of size in RAM •  Take care with compound indexes, order matters §  You can save one index… or you can have problems §  Put most important key (never nullable) the first one •  DBAs whining and complaining about NoSQL §  “If we had enough RAM for all data, Oracle would outperform MongoDB” Telefónica PDI 49
  • 50. 05 The ugly •  Second migration once the PS is already running §  Full import adding 30 new attributes values: 10:00 hours §  Full import adding 150 new attributes values: 40:00 hours •  Increase considerably documents size (i.e. adding lots of new values to the users) makes MongoDB rearrange the documents, performing around 5 times slower §  That’s a problem when you are updating 10k documents per second •  Solutions? §  Avoid this situation at all cost. Run away! §  Normalize users values; move to a new individual collection §  Prealloc the size with a faux field •  You could waste space! §  Load in new collection, merge and swap, like we did in Oracle Telefónica PDI 50
  • 51. 06 Título del capítulo Conclusions Título del capítulo
 Máximo líneas Máximo 3 3 líneas
  • 52. 06 Conclusions & personal thoughts •  Awesome performance boost §  But not all use cases fit in a MongoDB / NoSQL solution! •  New technology, different limitations •  Fear of the unknown §  SSDs performance? §  Long term performance and stability? •  Python + MongoDB + pymongo = fast development §  I mean, really fast •  MongoDB Monitoring Service (MMS) •  10gen people were very helpful Telefónica PDI 52
  • 53. 06 Questions? Telefónica PDI 53
  • 54.
  • 55. 0X SQL Physical architecture §  Scale horizontally adding more BE or DB servers or disks in the SAN §  Virtualized or physical servers depending on the deployment Telefónica PDI 55
  • 56. 0X MongoDB Physical architecture §  MongoDB arbiters running on BE servers §  Scale horizontally adding more BE servers or disks in the SAN §  Sharding may already be configured to scale adding more replica sets Telefónica PDI 56