Submit Search
Upload
Data Architecture (i.e., normalization / relational algebra) and Database Security
•
0 likes
•
193 views
I
IDEAS - Int'l Data Engineering and Science Association
Follow
Presented by Samuel Berger, CIO, CLEAR Information, Inc.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 34
Recommended
SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27th
Dave Stokes
Lecture 20
Lecture 20
Shani729
Sql Lab 4 Essay
Sql Lab 4 Essay
Lorie Harris
Data Warehouse ( Dw Of Dwh )
Data Warehouse ( Dw Of Dwh )
Jenny Calhoon
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Cambridge Semantics
Discussion post· The proper implementation of a database is es.docx
Discussion post· The proper implementation of a database is es.docx
madlynplamondon
FLORIDA NATIONAL UNIVERSITYRN-BSN PROGRAMNURSING DEPARTMENTN.docx
FLORIDA NATIONAL UNIVERSITYRN-BSN PROGRAMNURSING DEPARTMENTN.docx
clydes2
Most frequently asked infosys technical interview questions and answers in 2018
Most frequently asked infosys technical interview questions and answers in 2018
nishajj
Recommended
SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27th
Dave Stokes
Lecture 20
Lecture 20
Shani729
Sql Lab 4 Essay
Sql Lab 4 Essay
Lorie Harris
Data Warehouse ( Dw Of Dwh )
Data Warehouse ( Dw Of Dwh )
Jenny Calhoon
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Cambridge Semantics
Discussion post· The proper implementation of a database is es.docx
Discussion post· The proper implementation of a database is es.docx
madlynplamondon
FLORIDA NATIONAL UNIVERSITYRN-BSN PROGRAMNURSING DEPARTMENTN.docx
FLORIDA NATIONAL UNIVERSITYRN-BSN PROGRAMNURSING DEPARTMENTN.docx
clydes2
Most frequently asked infosys technical interview questions and answers in 2018
Most frequently asked infosys technical interview questions and answers in 2018
nishajj
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
Suppose you are the information technology (IT) manager for an IT
Suppose you are the information technology (IT) manager for an IT
lisandrai1k
Data models
Data models
Usman Tariq
Aen007 Kenigsberg 091807
Aen007 Kenigsberg 091807
Dreamforce07
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
Ali Dasdan
Databases By ZAK
Databases By ZAK
Tabsheer Hasan
Database Project
Database Project
haleycockrell208
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Vasu S
Big data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
DBMS Class Presentation for English Version.
DBMS Class Presentation for English Version.
Adamjee Cantonment College
Database System.pptx
Database System.pptx
Database Homework Help
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
Ry Walker
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
Anand Sampat
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...
IQ Online Training
How to deliver effective data science projects
How to deliver effective data science projects
IDEAS - Int'l Data Engineering and Science Association
Digital cracks in banking--Sid Nandi
Digital cracks in banking--Sid Nandi
IDEAS - Int'l Data Engineering and Science Association
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
IDEAS - Int'l Data Engineering and Science Association
Battling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial Intelligence
IDEAS - Int'l Data Engineering and Science Association
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
IDEAS - Int'l Data Engineering and Science Association
Blockchain Application in Real Estate Transactions
Blockchain Application in Real Estate Transactions
IDEAS - Int'l Data Engineering and Science Association
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
More Related Content
Similar to Data Architecture (i.e., normalization / relational algebra) and Database Security
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
Suppose you are the information technology (IT) manager for an IT
Suppose you are the information technology (IT) manager for an IT
lisandrai1k
Data models
Data models
Usman Tariq
Aen007 Kenigsberg 091807
Aen007 Kenigsberg 091807
Dreamforce07
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
Ali Dasdan
Databases By ZAK
Databases By ZAK
Tabsheer Hasan
Database Project
Database Project
haleycockrell208
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Vasu S
Big data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
DBMS Class Presentation for English Version.
DBMS Class Presentation for English Version.
Adamjee Cantonment College
Database System.pptx
Database System.pptx
Database Homework Help
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
Ry Walker
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
Anand Sampat
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...
IQ Online Training
Similar to Data Architecture (i.e., normalization / relational algebra) and Database Security
(15)
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
Suppose you are the information technology (IT) manager for an IT
Suppose you are the information technology (IT) manager for an IT
Data models
Data models
Aen007 Kenigsberg 091807
Aen007 Kenigsberg 091807
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
Databases By ZAK
Databases By ZAK
Database Project
Database Project
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
Big data analytics, research report
Big data analytics, research report
DBMS Class Presentation for English Version.
DBMS Class Presentation for English Version.
Database System.pptx
Database System.pptx
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
Data Science tutorial for beginner level to advanced level | Data Science pro...
Data Science tutorial for beginner level to advanced level | Data Science pro...
More from IDEAS - Int'l Data Engineering and Science Association
How to deliver effective data science projects
How to deliver effective data science projects
IDEAS - Int'l Data Engineering and Science Association
Digital cracks in banking--Sid Nandi
Digital cracks in banking--Sid Nandi
IDEAS - Int'l Data Engineering and Science Association
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
IDEAS - Int'l Data Engineering and Science Association
Battling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial Intelligence
IDEAS - Int'l Data Engineering and Science Association
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
IDEAS - Int'l Data Engineering and Science Association
Blockchain Application in Real Estate Transactions
Blockchain Application in Real Estate Transactions
IDEAS - Int'l Data Engineering and Science Association
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
Practical Machine Learning at Work
Practical Machine Learning at Work
IDEAS - Int'l Data Engineering and Science Association
Artificial Intelligence: Hype, Reality, Vision.
Artificial Intelligence: Hype, Reality, Vision.
IDEAS - Int'l Data Engineering and Science Association
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
IDEAS - Int'l Data Engineering and Science Association
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
Best Practices in Data Partnerships Between Mayor's Office and Academia
Best Practices in Data Partnerships Between Mayor's Office and Academia
IDEAS - Int'l Data Engineering and Science Association
Everything You Wish You Knew About Search
Everything You Wish You Knew About Search
IDEAS - Int'l Data Engineering and Science Association
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
IDEAS - Int'l Data Engineering and Science Association
Data-Driven AI for Entertainment and Healthcare
Data-Driven AI for Entertainment and Healthcare
IDEAS - Int'l Data Engineering and Science Association
Generating Creative Works with AI
Generating Creative Works with AI
IDEAS - Int'l Data Engineering and Science Association
Using AI to Tackle the Future of Health Care Data
Using AI to Tackle the Future of Health Care Data
IDEAS - Int'l Data Engineering and Science Association
State of AI/ML in Real Estate
State of AI/ML in Real Estate
IDEAS - Int'l Data Engineering and Science Association
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
IDEAS - Int'l Data Engineering and Science Association
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
IDEAS - Int'l Data Engineering and Science Association
More from IDEAS - Int'l Data Engineering and Science Association
(20)
How to deliver effective data science projects
How to deliver effective data science projects
Digital cracks in banking--Sid Nandi
Digital cracks in banking--Sid Nandi
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
Battling Skynet: The Role of Humanity in Artificial Intelligence
Battling Skynet: The Role of Humanity in Artificial Intelligence
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
Blockchain Application in Real Estate Transactions
Blockchain Application in Real Estate Transactions
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Practical Machine Learning at Work
Practical Machine Learning at Work
Artificial Intelligence: Hype, Reality, Vision.
Artificial Intelligence: Hype, Reality, Vision.
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
Best Practices in Data Partnerships Between Mayor's Office and Academia
Best Practices in Data Partnerships Between Mayor's Office and Academia
Everything You Wish You Knew About Search
Everything You Wish You Knew About Search
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
Data-Driven AI for Entertainment and Healthcare
Data-Driven AI for Entertainment and Healthcare
Generating Creative Works with AI
Generating Creative Works with AI
Using AI to Tackle the Future of Health Care Data
Using AI to Tackle the Future of Health Care Data
State of AI/ML in Real Estate
State of AI/ML in Real Estate
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Machine Learning in Healthcare and Life Science
Machine Learning in Healthcare and Life Science
Recently uploaded
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Roshan Dwivedi
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Recently uploaded
(20)
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Data Architecture (i.e., normalization / relational algebra) and Database Security
1.
Changing the Way
the Financial World Processes & Utilizes Information Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
2.
Introduction Speaker: Samuel Berger Topic:
Data architecture (i.e., normalization / relational algebra) and database security Description: This presentation underscores the importance of creating precise data structures when handling, processing and manipulating mass amounts of data. As data has become key in the operations of virtually all major companies around the world, having the data easily maintained and utilized is pivotal. Companies often live or die in today’s hyper- competitive business climate by their ability to advantageously manipulate their data. It is therefore paramount that this enterprise- critical data is housed in well-organized structures that are intuitive for developers to work on. The bulk of this presentation offers tips and examples on how as well as the numerical benefits using a large data example. 1Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
3.
Background I am a
Fintech entrepreneur and developer as well as a data scientist having worked mostly in financial mass data projects since 1989. I started in these fields with SBIC, using technology and massive amounts of data to predict the world’s largest financial market – FOREX. My systems earned my clients (Daiwa Securities, Bank of Montreal, Julius Bär Group Ltd., Société Générale, royalty and national treasuries, to name a few) returns of over 18% per annum non-compounded over the 5 ½ years we traded. At peak my company traded the equivalent of over $1 billion in a day. Some of my other projects included: working on E*Trade’s E*Advisor system, VeriSign, SGI, two industry-founding VOIP unified messaging companies, and Enterprise Architect for Capital Group Companies (managed over $1.3 trillion at the time). I am currently working on a large project for CLEAR. 2Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
4.
3 Discussion Points I. Relational
Algebra / Normalization Familiarity II. Brief History III. 1st Normal Form Simply Stated IV. Best Practices V. Practical Examples VI. Key Structures VII. Performance and Data Space / Maintenance VIII. Overloaded Domains (Columns) IX. Theory Modified Slightly by Practice X. Locking XI. Normalization Conclusion XII. Securing the Data Layer XIII. Problems with Outsourcing IT XIV. Data Theft – Primary Weaknesses XV. Q & A Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
5.
Relational Algebra Database Normalization Have
you worked on a database that was in at least the First Normal Form (1NF)? Does anyone know at what point in Normalization duplicates are no longer allowed? Does anyone know at what point NULLs are no longer allowed? 4Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
6.
Brief History Relational Algebra
was primarily developed by Edger F. Codd from 1969 to 1973, and primarily documented by Chris Date, both IBM employees. Codd also created his “12 rules” (really 13 as he started from zero) that were used to define the qualifications of a relational database management system (RDBMS). Codd’s work heavily influenced IBM’s first RDBMS called System R back in 1973. System R was created by Ray Boyce and Don Chamberlin. System R introduced the Standard Query Language (SQL), originally called SEQUEL while in development, hence the reason we still refer to SQL Server as “Sequal” Server. Codd and Boyce later teamed up to create the Boyce-Codd Normal Form, which is one step more confined than the 3rd Normal Form. 5Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
7.
6 1st Normal Form
(1NF) My goal here is not to confuse but to simplify Key principles: 1) Each row must have at least one unique key also referred to as a Candidate Key (i.e., no duplicate rows). A Candidate Key is the minimum column grouping on a table to create a unique record. Columns that do not help to define uniqueness are attributes of the Candidate Key or Candidate Keys as the case may be should more than one unique column set exist. 2) Every row column intersection must have a value. 3) Every row column intersection can only contain one value, not a list of values. 4) Every row column intersection must have a valid value from the pool of potential valid values (i.e., a plane parts table cannot have a column for engine parts and then enter into it both engine parts as well as plane max speeds). 5) The functionality of the table is not dependent on the order of the data with respect to the order of rows or columns (i.e., querying the data will determine the column order and the row order of the output). Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
8.
Best Practices Modern database
terminology uses the term Primary Key as a binder of the data more than as a concept of a unique row identifier based on data properties. As such the Primary Key is now a separate concept from the Primary Candidate Key. It should always be an auto-growing integer starting sequentially from row 1, and the server prefers that it is the first column. My naming convention is the table name plus “_ID”. Table and column names should always be descriptive even if verbose. Mistakes occur most commonly due to lack of understanding of the data model and the purposes of each container. Never use database keyword names for column names (i.e., name is a keyword as is filename). Data should be related for the data’s sake and not for the current application requirements. Requirements change, if the data is structured accurately then the data model will remain accurate. 7Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
9.
Audit Columns Also a
component of best practices is to include audit columns to cover the following key pieces of data: 1) The date and time the record was created. 2) The person ID or process name that created the record. 3) The date and time the record was last updated. 4) The person ID or process name that made the update. 5) Update count – a truly critical column for every table that will be covered a bit later in this discussion. 6) A column indicating whether the record is currently active. Note: In blockchain projects records are never updated, only added. Also, additional information must be included to indicate whether or not the data is in sync with its partners. 8Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
10.
Practical Example The following
example is designed to help illustrate normalization. For this example I created a database schema for how I would build the Microsoft Explorer application from scratch. I believe everyone here has used Microsoft Explorer and can fully visualize this exercise and see some of the power of the Normalized architectural design. This small database meets the requirements of the Boyce-Codd Normal Form. NOTES: This is not the actual Microsoft Explorer data model. This is how I would design it. Also, I added silver keys to denote the Candidate Key to each table. Most tables have only one Candidate Key. For those with multiple I only used one to simplify the model for easier conceptual understanding. As such all silver keys on a table are used to create a single Candidate Key for each table. Lastly, audit columns do not apply to this simple application. 9Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
11.
Explorer Data Model 10Copyright
© 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
12.
Key Database Structure 11Copyright
© 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
13.
Second Example “Persons” This example
is far more interesting. I have taken a very common (maybe the single most common) database architecture and restructured it using relational algebra properly. It should be noted I have never seen this structure used anywhere in the world, much to my surprise. I do use it in CLEAR in multiple locations (i.e., not just with the Persons information). It is the correct usage of the mathematics and has massive benefits when applied to very large data sets that will be covered numerically. 12Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
14.
Data and Architecture
Notes 1) Loaded just over 265 million records into both structures to give proper time and size comparisons in a big data environment. 2) As there are duplicate records with respect to first, middle and last names along with birthdays, no candidate key is possible in the traditional structure. 3) I used the 2010 US Census data for a list of last names and their frequency. They only included non-concatenated last names, as such I had to create my own concatenated examples. 4) I used a list of 2016 baby names in Scotland as it was the largest first name database that I could locate with a breakout between male and female names. 5) I randomly generated the first and middle names from this list of known names. I also randomly generated the order to enter the names into both table schemas to prevent bias. 6) I did not add the audit columns or display the field specifications for conceptual simplicity. 13Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
15.
Typical Persons Table 14Copyright
© 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. Notes: 1) No way to properly identify a candidate key in spite of important defining data. 2) MiddleName and Suffix will have to allow NULL or absent values. 3) Multiple middle names almost never supported.
16.
Normalized Persons Table
Structure 15Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
17.
1st Key Database
Structure 16Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
18.
17 2nd Key Database
Structure 17Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
19.
Statistics – Traditional
vs. Normalized Traditional data structure Total data size that requires indexing for increased performance: 19.9 gigabytes – also requires multiple columns to be indexed and most likely multiple indexes Time to count number of people born on May 5 of any year: 3 minutes 57.4 seconds Total record count: 729,836 Time to return the ID for 1 person given first, last and middle names plus birthdate (without indexing): 50.4 seconds 18Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
20.
Statistics – Traditional
vs. Normalized (Continued) Normalized data structures Total data size that would require indexes for increased performance: 5.2 megabytes – only requires a single column to be indexed Time to count number of people born on May 5: 1.5 seconds Total records: 729,836 Time to return the ID for 1 person given first, last and middle names, and birthdate (without indexing): 1.2 seconds 19Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
21.
Statistics – Traditional
vs. Normalized (Continued) Difference Data requiring indexing: 3,943.3 times the data! Time difference to retrieve information by date (not a column that can be added by an index – so it is what it is): 153.9 times faster! Without indexing either table structure – time to retrieve a single record by customer specific data: 43.1 times faster! Conclusion: Normalization is always faster and massively more efficient with respect to data maintenance within a production transaction environment. 20Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved.
22.
21 Overloaded Data Column 21Copyright
© 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. The First Normal Form requires each column to be a domain A domain is a column that contains data from the “pool of legal values”. Legal values for a ZipCode field are all known zip codes, not, for example, a street name. Columns that contain more than one informational piece are referred to as “overloaded”. It can be accurately argued that the datetime data type is the most commonly used within databases and is an overloaded column. In Microsoft SQL Server the function “DATEPART” allows the following retrievals: year weekday nanosecond quarter hour TZoffset month minute ISO_WEEK dayofyear second day millisecond week microsecond weekday nonosecond
23.
22 Multiple Candidate Keys Boyce-Codd
Normal Form Slight Flaw 22Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. This rule is 99% true. The exceptions primarily are with reference data as the difficulties maintaining data are moot at best. The DateInfo table to the right is an excellent example as the table data will never change. I typically add records to support multiple centuries (36,525 days per century – tiny amount of data for a table to support). A table must be in third normal form (3NF). Additionally, every domain that determines the value of another domain must be in part or in whole a key (Candidate Key) that has no overlapping domains with another key.
24.
23 Database Table Locking The
Power of Logical Level Locking 23Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. 1) In most RDBMS the default table locking for select statements is a shared lock. 2) Shared locks easily escalate in high volume production environments leading to poor performance and deadlocks. 3) In the vast majority of cases the lock proves unnecessary. 4) In almost all cases the locks can be completely avoided without creating concurrency issues by using an UpdateCount field, NOLOCK (or equivalent table locking hint) when selecting data and logic checks while conducting updates, inserts and deletes from the previously selected data. 5) In my experience the difference in high volume production environments is in almost all cases massive.
25.
24 Normalization Conclusion 24Copyright ©
2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. 1) Proper normalization of the data model can save companies working on big data or high production transaction databases tens of millions in hardware and maintenance expenses over the life of a company. 2) Performance in a production environment will always be more reliable and significantly faster using relational algebra. 3) No one claiming to be a professional database architect can make that claim without being proficient in relational algebra. 4) Even with data warehousing some data should always be normalized for maximum performance and flexibility. A good example is the date information, which can be used to rapidly slice and dice denormalized data marts efficiently for maximum flexibility with the data. 5) The database can make or break almost all projects. Proper database design, locking schema and efficient database code is always essential.
26.
25 Securing the Data
Layer 25Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. Once past the network security layer, which is often far more geared to protecting against outside intrusion, hackers often experience little to no real impediments to gaining access and control of the database servers. Protection should be at all layers with equal and extreme diligence. Aside from the common data protection deterrents, I have listed how to properly add security that, I believe, will give even the NSA ulcers if they should try to hack. Please note, just as there is a large gap between technology available and technology applied, there is also a large gap between known best practices and those practices actually applied. In most environments, big and small, developers, IT personnel and even sometimes executives want, and usually get, a back door entry into the production databases.
27.
26 Securing the Data
Layer (continued) 26Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. Recommendations: 1) Change the default port setting to an obscure port. Strangely, in my entire career I have never come to a company and seen any of their servers not functioning on the standard ports. Completely unnecessary. All ports outside of the random one chosen for DB server should be closed and the DB server port restricted to the DB application. 2) Deny data reader and data writer to all logins. Do not allow any login to the DB servers to have access to anything but executing stored procedures. No ad hoc querying or dynamic SQL allowed! Direct access to the data circumvents all business rules and allows direct access from the users to your data. Very bad practice and poor security. 3) Use a multi-Unicode 30+ character password for the database server system administrator account. 4) Deny all access by local administrators to the database layer. 5) Use a multi-Unicode 30+ character password for the NT Administrator account. 6) Disable all local administrators. 7) Use at least 3 multi-Unicode characters in all of the stored procedure names (i.e., characters from other languages). According to Wikipedia Unicode currently contains 136,755 distinct characters. Many are not allowed within SQL, but still the difference in combinations just in a 5 character name is staggering!
28.
27 Securing the Data
Layer (continued) 27Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. 8) Create an application for administrators that will open a port up and enable their OS administrator logins for a limited amount of time. That process needs to keep an audit trail that includes IP address and machine mac address. Mac addresses should be pre-authorized. 9) Alternate between logins (at least three) every minute changing the password to each every minute to a new, but calculable, password. Retain each password for three minutes to allow overlapping. Again, passwords should be long and multi-Unicode. 10) Production data is never to be shared with employees no matter what title they have or how much they complain. If they need information then a report can be developed for them that properly follows Sarbanes-Oxley (SOX) requirements and is well vetted and approved. If these standards are followed the production data will be secured. Programmers and IT personnel may complain, but they are not being paid large salaries to do easy work. Their work is first and foremost to protect the key company data, the integrity and privacy of client data, and make sure the companies’ products and services are highly available and dependable.
29.
28 Problems with Outsourcing
IT 28Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. The company accountants always like to look for ways to reduce expenses. The problem is at what cost. Some costs cannot be measured in mere P & L. One such cost is the proliferation of key company information and intellectual properties. The following are some of the dangers that occur when outsourcing: 1) Access to servers are granted with administration rights to persons selected by the outsourced company. No vetting, not even a list of those given access and their backgrounds, credentials, criminal history – anything at all… 2) Much of the company data is often accessible. That includes backups, company personnel information, contracts, bids, clients, data, inventions, etc.
30.
29 Problems with Outsourcing
IT (Continued) 29Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. 3) Outsourced IT companies often outsource themselves to companies abroad. Your data is then accessible to persons unknown in India, Pakistan and China to name a few. Those persons are completely unknown to you and beyond any legal restrictions of the United States. 4) Much has been stated by the government and the news about China and other countries hacking and stealing our key data. I propose they are not stealing it as much as we are giving it to them and, to add insult to injury, we are actually paying them to take it. Some expenses simply make sense and are a cost of doing business in this day and age. IT is one of those key mission critical expenses. Live with it… Never lose control of your company’s life’s blood.
31.
30 Data Theft –
Primary Weaknesses 30Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. 1) Approximately half of all data theft incidences are from employees. 2) That does not count those that you have given access via outsourcing your IT. 3) Most large data breaches occur from employees stealing entire backups and having access to large file data stores of documents and company critical information. 4) Almost all of Julian Assange’s information that he publishes via WikiLeaks come from employee data theft. Primarily stolen backup tapes. 5) In 98% of companies less than half of the data tape backup files are encrypted according to surveys of IT professionals.
32.
31 Data Theft –
Primary Weaknesses (Continued) 31Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. 6) Symptoms of data theft from source with internal company granted access: a) Size of the amount of data: hackers want to be quick and are naturally worried about being caught, so they filter their data searches to find the critical documents or data rapidly. Those working within have no such time constraints and tend to be far less skilled, so searching is left to those receiving the data from the thief. b) Breadth of the data theft: hackers are focused on who they want files from again limited by time. Those working within tend to get all users’ data. c) Scope of the data: Once in the hacker will look around as quickly as possible and attempt to gain information from multiple points within the network. Employees tend to get all of one type of data that they are focused on. Usually pertaining to what they are specifically working on, have specific granted permissions to access, and have an issue.
33.
32 Data Theft –
Primary Weaknesses (Continued) 32Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. Profiling data theft can assist when protecting your data. Know your data and the type of interests that will want to take, change or distribute your key company information. A high profile theft that took place last year with the DNC fits the symptoms more of an internal data theft than external. The fact that the FBI was refused access and simply, and unprofessionally, accepted the DNC’s word for who hacked them is not surprising as the FBI is way behind with respect to investigating and prosecuting data theft. Do not anticipate the government to protect your data anytime for the foreseeable future. Or for that matter to properly investigate or prosecute those guilty in its theft. It is on you!
34.
33 Changing the Way
the Financial World Processes & Utilizes Information Thank You Samuel Berger Chief Information Officer (805) 701-0761 sberger@ClearFinTech.com Copyright © 2017 CLEAR Information, Inc., a United States Class C corporation, all rights reserved. George Sterling Harris Executive Vice President (310) 295-7524 gharris@ClearFinTech.com