SlideShare a Scribd company logo
1 of 49
Schema Design
(and its performance implications)
Jay Runkel
Principal Solutions Architect
jay.runkel@mongodb.com
@jayrunkel
2
Agenda
1. Today’s Example
2. MongoDB Schema Design vs. Relational
3. Modeling Relationships
4. Schema Design and Performance
Today’s Example
4
Medical Records
• Collects all patient information in a central repository
• Provide central point of access for
– Patients
– Care providers: physicians, nurses, etc.
– Billing
– Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
Patient
Records
Medications
Lab Results
Procedures
Hospital
Records
Physicians
Patients
Nurses
Billing
5
Medical Record Data
• Hospitals
– have physicians
• Physicians
– Have patients
– Perform procedures
– Belong to hospitals
• Patients
– Have physicians
– Are the subject of procedures
• Procedures
– Associated with a patient
– Associated with a physician
– Have a record
– Variable meta data
• Records
– Associated with a procedure
– Binary data
– Variable fields
6
Lot of Variability
Relational View
Schema Design:
MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB versus Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field or level Any field
Schema Flexible Very structured
Complex Normalized Schemas
Complex Normalized Schemas
13
Documents are Rich Data Structures
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: ‘+447557505611’
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
Relationships
Modeling One-to-One Relationships
16
Referencing
Procedure
• patient
• date
• type
• physician
• type
Results
• dataType
• size
• content: {…}
Use two collections with a reference
Similar to relational
17
Procedure
• patient
• date
• type
• results
• equipmentId
• data1
• data2
• physician
• Results
• type
• size
• content: {…}
Embedding
Document Schema
18
Referencing
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : 134
}
Results
{
“_id” : 134
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
19
Embedding
Procedure
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}
20
Embedding
• Advantages
– Retrieve all relevant information in a single query/document
– Avoid implementing joins in application code
– Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
– Large documents mean more overhead if most fields are not relevant
– 16 MB document size limit
21
Atomicity
• Document operations are atomic
db.patients.update({_id: 12345},
{$inc : {numProcedures : 1},
$push : {procedures : “proc123”},
$set : {addr.state : “TX”}})
• No multi-document transactions
db.beginTransaction();
db.patients.update({_id: 12345}, …);
db.procedure.insert({_id: “proc123”, …});
db.records.insert({_id: “rec123”, …});
db.endTransaction();
22
Embedding
• Advantages
– Retrieve all relevant information in a single query/document
– Avoid implementing joins in application code
– Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
– Large documents mean more overhead if most fields are not relevant
– 16 MB document size limit
23
Referencing
• Advantages
– Smaller documents
– Less likely to reach 16 MB document limit
– Infrequently accessed information not accessed on every query
– No duplication of data
• Limitations
– Two queries required to retrieve information
– Cannot update related information atomically
24
One to One: General Recommendations
• Embed
– No additional data duplication
– Can query or index on
embedded field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access
fields
{
"_id" : 333,
"date" : "2003-02-09T05:00:00"),
"hospital" : “County Hills”,
"patient" : “John Doe”,
"physician" : “Stephen Smith”,
"type" : ”Chest X-ray",
”result" : {
"type" : "txt",
"size" : NumberInt(12),
"content" : {
value1: 343,
value2: “abc”,
…
}
}
}
Modeling One-to-Many Relationships
26
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Patients
Embed
One-to-Many Relationships
Modeled in 2 possible ways
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [12345, 12346]}
{
_id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…}
{
_id: 12346,
date: 2015-02-15,
type: “blood test”,
…}
Patients
Reference
Procedures
27
One to Many: General Recommendations
• Embed, when possible
– Access all information in a single query
– Take advantage of update atomicity
– No additional data duplication
– Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
– 16 MB document size
– Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
Modeling Many-to-Many Relationships
29
Many to Many
Traditional Relational Association
Join table
Physicians
name
specialty
phone
Hospitals
name
HosPhysicanRel
hospitalId
physicianId
X
Use arrays instead
30
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…},
{
id: 12346,
name: “Mary Well”,
address: {…},
…}]
}
Many-to-Many Relationships
Embedding physicians in hospitals collection
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [
{
id: 63633,
name: “Harold Green”,
address: {…},
…},
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}]
}
Data Duplication
31
{
_id: 1,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]
}
Many-to-Many Relationships
Referencing
{
id: 63633,
name: “Harold Green”,
address: {…},
…}
Hospitals
{
_id: 2,
name: “Plainmont Hospital”,
city: “Omaha”,
beds: 85,
physicians: [63633, 12345]
}
Physicians
{
id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
id: 12346,
name: “Mary Well”,
address: {…},
…}
32
Many to Many
General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
Hospitals
Reference
Physicians
What If I Want to Store Large Files in MongoDB?
34
GridFS
Driver
GridFS API
doc.jpg
(meta
data)
doc.jpg
(1)doc.jpg
(1)doc.jpg
(1)
fs.files fs.chunks
doc.jpg
mongofiles utility provides command line GridFS interface
Schema Design and Performance
Two Examples
Example 1: Hybrid Approach
Embed and Reference
37
Healthcare Example
patients
procedures
Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : ["551ac”, “343fs”]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
Tailor Schema to Queries (cont.)
{
"_id" : 593340651,
"first" : "Gregorio",
"last" : "Lang",
"addr" : {
"street" : "623 Flowers Rd",
"city" : "Groton",
"state" : "NH",
"zip" : 3266
},
"physicians" : [10387 33456],
"procedures” : [
{id : "551ac”,
type : “Chest X-ray”},
{id : “343fs”,
type : “Blood Test”}]
}
{
"_id" : "551ac”,
"date" :"2000-04-26”,
"hospital" : 161,
"patient" : 593340651,
"physician" : 10387,
"type" : "Chest X-ray",
"records" : [ “67bc6”]
}
Patient Procedure
Find all patients from NH that
have had chest x-rays
Example 2: Time Series Data
Medical Devices
41
Vital Sign Monitoring Device
Vital Signs Measured:
• Blood Pressure
• Pulse
• Blood Oxygen Levels
Produces data at regular intervals
• Once per minute
42
We have a hospital(s) of devices
43
Data From Vital Signs Monitoring Device
{
deviceId: 123456,
spO2: 88,
pulse: 74,
bp: [128, 80],
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• One document per minute per device
• Relational approach
44
Document Per Hour (By minute)
{
deviceId: 123456,
spO2: { 0: 88, 1: 90, …, 59: 92},
pulse: { 0: 74, 1: 76, …, 59: 72},
bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]},
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-minute data at the hourly level
• Update-driven workload
• 1 document per device per hour
45
Characterizing Write Differences
• Example: data generated every minute
• Recording the data for 1 patient for 1 hour:
Document Per Event
60 inserts
Document Per Hour
1 insert, 59 updates
46
Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event
1440 reads
Document Per Hour
24 reads
47
Characterizing Memory and Storage Differences
Document Per Minute Document Per Hour
Number Documents 52.6 B 876 M
Total Index Size 6364 GB 106 GB
_id index 1468 GB 24.5 GB
{ts: 1, deviceId: 1} 4895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4503 GB 618 GB
• 100K Devices
• 1 years worth of data
100000 * 365 *
24 * 60
100000 * 365 *
24
100000 * 365 *
24 * 60 * 130
100000 * 365 *
24 * 130
100000 * 365 *
24 * 60 * 92
100000 * 365 *
24 * 758
48
Summary
• Relationships can be modeled by embedding or references
• Decision should be made in context of application data and query workload
– Tailor schema to application workload
• It is okay recommended to violate RDBMS schema design principles
– No duplication of data
– Normalization
• Different schemas may result in dramatically different
– Query performance
– Hardware requirements
Questions?
jay.runkel@mongodb.com
@jayrunkel

More Related Content

What's hot

Chapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsChapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsnehabsairam
 
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBMongoDB
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model databaseJiaheng Lu
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsMongoDB
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsnehabsairam
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Jumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBJumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBMongoDB
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsnehabsairam
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB
 
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Alexandre Riazanov
 
Creating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDBCreating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDBWildan Maulana
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...MongoDB
 

What's hot (19)

Chapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortalsChapter 6(introduction to documnet databse) no sql for mere mortals
Chapter 6(introduction to documnet databse) no sql for mere mortals
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model database
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
Chapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortalsChapter 8(designing of documnt databases)no sql for mere mortals
Chapter 8(designing of documnt databases)no sql for mere mortals
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Jumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDBJumpstart: Introduction to MongoDB
Jumpstart: Introduction to MongoDB
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
 
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
 
Creating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDBCreating, Updating and Deleting Document in MongoDB
Creating, Updating and Deleting Document in MongoDB
 
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to ...
 

Viewers also liked

Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema DesignMongoDB
 
Leveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security DataLeveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security DataMongoDB
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)ibwhite
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tieagiamas
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineMongoDB
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryRetail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
First Review(Ppt)
First Review(Ppt)First Review(Ppt)
First Review(Ppt)smjagadish
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project PresentationSyed Absar
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 

Viewers also liked (15)

Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
Leveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security DataLeveraging MongoDB as a Data Store for Security Data
Leveraging MongoDB as a Data Store for Security Data
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Breaking the oracle tie
Breaking the oracle tieBreaking the oracle tie
Breaking the oracle tie
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed InventoryRetail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
Retail Reference Architecture Part 2: Real-Time, Geo Distributed Inventory
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Final Year Project
Final Year ProjectFinal Year Project
Final Year Project
 
First Review(Ppt)
First Review(Ppt)First Review(Ppt)
First Review(Ppt)
 
Final Year Project Presentation
Final Year Project PresentationFinal Year Project Presentation
Final Year Project Presentation
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 

Similar to Schema Design and Performance Implications

Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsCraig Smail
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...lucenerevolution
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusGlobus
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbMongoDB
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialMatthieu Schapranow
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-carehoot72
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationMongoDB
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Steffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXSteffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXJack Molisani
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
CDISC Presentation
CDISC PresentationCDISC Presentation
CDISC Presentationhoot72
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyMongoDB
 
GTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health dataGTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health dataMax Pumperla
 
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...Gurdal Ertek
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 

Similar to Schema Design and Performance Implications (20)

Working With Large-Scale Clinical Datasets
Working With Large-Scale Clinical DatasetsWorking With Large-Scale Clinical Datasets
Working With Large-Scale Clinical Datasets
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Accelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo dbAccelerate pharmaceutical r&d with mongo db
Accelerate pharmaceutical r&d with mongo db
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-care
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Steffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCXSteffen Frederiksen: DATA, DITA, DOCX
Steffen Frederiksen: DATA, DITA, DOCX
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
CDISC Presentation
CDISC PresentationCDISC Presentation
CDISC Presentation
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
GTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health dataGTC Europe 2017 - How to predict ICU mortality with digital health data
GTC Europe 2017 - How to predict ICU mortality with digital health data
 
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
A Data Mining Framework for the Analysis of Patient Arrivals into Healthcare ...
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Schema Design and Performance Implications

  • 1. Schema Design (and its performance implications) Jay Runkel Principal Solutions Architect jay.runkel@mongodb.com @jayrunkel
  • 2. 2 Agenda 1. Today’s Example 2. MongoDB Schema Design vs. Relational 3. Modeling Relationships 4. Schema Design and Performance
  • 4. 4 Medical Records • Collects all patient information in a central repository • Provide central point of access for – Patients – Care providers: physicians, nurses, etc. – Billing – Insurance reconciliation • Hospitals, physicians, patients, procedures, records Patient Records Medications Lab Results Procedures Hospital Records Physicians Patients Nurses Billing
  • 5. 5 Medical Record Data • Hospitals – have physicians • Physicians – Have patients – Perform procedures – Belong to hospitals • Patients – Have physicians – Are the subject of procedures • Procedures – Associated with a patient – Associated with a physician – Have a record – Variable meta data • Records – Associated with a procedure – Binary data – Variable fields
  • 9. MongoDB Relational Collections Tables Documents Rows Data Use Data Storage What questions do I have? What answers do I have? MongoDB versus Relational
  • 10. Attribute MongoDB Relational Storage N-dimensional Two-dimensional Field Values 0, 1, many, or embed Single value Query Any field or level Any field Schema Flexible Very structured
  • 13. 13 Documents are Rich Data Structures { first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
  • 16. 16 Referencing Procedure • patient • date • type • physician • type Results • dataType • size • content: {…} Use two collections with a reference Similar to relational
  • 17. 17 Procedure • patient • date • type • results • equipmentId • data1 • data2 • physician • Results • type • size • content: {…} Embedding Document Schema
  • 18. 18 Referencing Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : 134 } Results { “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }
  • 19. 19 Embedding Procedure { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } } }
  • 20. 20 Embedding • Advantages – Retrieve all relevant information in a single query/document – Avoid implementing joins in application code – Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations – Large documents mean more overhead if most fields are not relevant – 16 MB document size limit
  • 21. 21 Atomicity • Document operations are atomic db.patients.update({_id: 12345}, {$inc : {numProcedures : 1}, $push : {procedures : “proc123”}, $set : {addr.state : “TX”}}) • No multi-document transactions db.beginTransaction(); db.patients.update({_id: 12345}, …); db.procedure.insert({_id: “proc123”, …}); db.records.insert({_id: “rec123”, …}); db.endTransaction();
  • 22. 22 Embedding • Advantages – Retrieve all relevant information in a single query/document – Avoid implementing joins in application code – Update related information as a single atomic operation • MongoDB doesn’t offer multi-document transactions • Limitations – Large documents mean more overhead if most fields are not relevant – 16 MB document size limit
  • 23. 23 Referencing • Advantages – Smaller documents – Less likely to reach 16 MB document limit – Infrequently accessed information not accessed on every query – No duplication of data • Limitations – Two queries required to retrieve information – Cannot update related information atomically
  • 24. 24 One to One: General Recommendations • Embed – No additional data duplication – Can query or index on embedded field • e.g., “result.type” • Exceptional cases… • Embedding results in large documents • Set of infrequently access fields { "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } } }
  • 26. 26 { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] } Patients Embed One-to-Many Relationships Modeled in 2 possible ways { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]} { _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …} Patients Reference Procedures
  • 27. 27 One to Many: General Recommendations • Embed, when possible – Access all information in a single query – Take advantage of update atomicity – No additional data duplication – Can query or index on any field • e.g., { “phones.type”: “mobile” } • Exceptional cases: – 16 MB document size – Large number of infrequently accessed fields { _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”, …}, { id: 12346, date: 2015-02-15, type: “blood test”, …}] }
  • 29. 29 Many to Many Traditional Relational Association Join table Physicians name specialty phone Hospitals name HosPhysicanRel hospitalId physicianId X Use arrays instead
  • 30. 30 { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…}, …}, { id: 12346, name: “Mary Well”, address: {…}, …}] } Many-to-Many Relationships Embedding physicians in hospitals collection { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…}, …}, { id: 12345, name: “Joe Doctor”, address: {…}, …}] } Data Duplication
  • 31. 31 { _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346] } Many-to-Many Relationships Referencing { id: 63633, name: “Harold Green”, address: {…}, …} Hospitals { _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345] } Physicians { id: 12345, name: “Joe Doctor”, address: {…}, …} { id: 12346, name: “Mary Well”, address: {…}, …}
  • 32. 32 Many to Many General Recommendation • Use case determines whether to reference or embed: 1. Data Duplication • Embedding may result in data duplication • Duplication may be okay if reads dominate updates 2. Referencing may be required if many related items 3. Hybrid approach • Potentially do both { _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]} { _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …} Hospitals Reference Physicians
  • 33. What If I Want to Store Large Files in MongoDB?
  • 35. Schema Design and Performance Two Examples
  • 36. Example 1: Hybrid Approach Embed and Reference
  • 38. Tailor Schema to Queries (cont.) { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays
  • 39. Tailor Schema to Queries (cont.) { "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}] } { "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”] } Patient Procedure Find all patients from NH that have had chest x-rays
  • 40. Example 2: Time Series Data Medical Devices
  • 41. 41 Vital Sign Monitoring Device Vital Signs Measured: • Blood Pressure • Pulse • Blood Oxygen Levels Produces data at regular intervals • Once per minute
  • 42. 42 We have a hospital(s) of devices
  • 43. 43 Data From Vital Signs Monitoring Device { deviceId: 123456, spO2: 88, pulse: 74, bp: [128, 80], ts: ISODate("2013-10-16T22:07:00.000-0500") } • One document per minute per device • Relational approach
  • 44. 44 Document Per Hour (By minute) { deviceId: 123456, spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}, ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-minute data at the hourly level • Update-driven workload • 1 document per device per hour
  • 45. 45 Characterizing Write Differences • Example: data generated every minute • Recording the data for 1 patient for 1 hour: Document Per Event 60 inserts Document Per Hour 1 insert, 59 updates
  • 46. 46 Characterizing Read Differences • Want to graph 24 hour of vital signs for a patient: • Read performance is greatly improved Document Per Event 1440 reads Document Per Hour 24 reads
  • 47. 47 Characterizing Memory and Storage Differences Document Per Minute Document Per Hour Number Documents 52.6 B 876 M Total Index Size 6364 GB 106 GB _id index 1468 GB 24.5 GB {ts: 1, deviceId: 1} 4895 GB 81.6 GB Document Size 92 Bytes 758 Bytes Database Size 4503 GB 618 GB • 100K Devices • 1 years worth of data 100000 * 365 * 24 * 60 100000 * 365 * 24 100000 * 365 * 24 * 60 * 130 100000 * 365 * 24 * 130 100000 * 365 * 24 * 60 * 92 100000 * 365 * 24 * 758
  • 48. 48 Summary • Relationships can be modeled by embedding or references • Decision should be made in context of application data and query workload – Tailor schema to application workload • It is okay recommended to violate RDBMS schema design principles – No duplication of data – Normalization • Different schemas may result in dramatically different – Query performance – Hardware requirements