Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Integrity in Rails

I explain how you can make sure the data in the application more bulletproof. It covers some techniques you can use in your applications to remove chances of introducing incoherent data.

  • Login to see the comments

Data Integrity in Rails

  1. 1. Data Integrity in Rails Strategies and techniques to make your data bulletproof
  2. 2. Proactive Reactive VS
  3. 3. Simple Complex Email ValidationsNot NULL constraint In ApplicationIn Database
  4. 4. Proactive Techniques
  5. 5. NOT NULL Constraint
  6. 6. The “NoMethodError” @blog.body.to_html Returns nil undefined method ‘to_html' for nil:NilClass
  7. 7. Fix in view @blog.body.try(:to_html)
  8. 8. Fix in view @blog.body.to_s.to_html
  9. 9. Fix in view unless @blog.body.nil?
 @blog.body.to_html
 end
  10. 10. Learnings Likely to encounter the same problem in other places Messy code littered with checks and guards These are all band-aid fixes
  11. 11. Fix using Rails Validation validates :body, presence: true, if: ->(record) { record.nil? } WTH?!?
  12. 12. Validations can still be bypassed blog = Blog.new
 blog.body = nil
 blog.save(validate: false)
  13. 13. Learnings Code is unnecessarily hard to read Validations can be bypassed, resulting in incoherent data
  14. 14. Fix in Database change_column :blog, :body, :text, null: false, default: ''
  15. 15. Learnings No Code Modification Less Complexity–you never have to deal with both nils and blank strings Work on the assumption that body is never nil
  16. 16. The missing Parent
  17. 17. The “NoMethodError” @post.author.name Returns nil undefined method ‘name' for nil:NilClass Deleting parent but not children results in this error
  18. 18. Fix in view @post.author.name if @post.author
  19. 19. Fix in view @post.author.try(:name)
  20. 20. Learnings Likely to encounter the same problem in other places Messy code littered with checks and guards These are all band-aid fixes
  21. 21. Fix using ActiveRecord has_one :author, dependent: :destroy
  22. 22. Fix using ActiveRecord has_one :author, dependent: :destroy Inefficient if lots of records
  23. 23. Fix using ActiveRecord has_one :author, dependent: :delete_all Does only one query, but doesn’t run callbacks
  24. 24. Fix using ActiveRecord has_one :author, dependent: :restrict_with_exception Blows up if you try to delete a parent with children in DB
  25. 25. Fix using ActiveRecord has_one :author, dependent: :restrict_with_error Shows an error if you try to delete a parent with children in DB
  26. 26. These strategies can still be bypassed Post.find(1).delete
  27. 27. Learnings This is better than fixing locally in views But this can still introduce bad data
  28. 28. Fix in Database add_foreign_key :authors, :posts
  29. 29. Fix in Database add_foreign_key :authors, :posts Rails 4.2 Feature
  30. 30. Fix in Database ALTER TABLE `authors`
 ADD CONSTRAINT `authors_post_id_fk`
 FOREIGN KEY (`post_id`) REFERENCES `posts`(id);
  31. 31. Fix in Database add_foreign_key :authors, :posts,
 on_delete: :cascade Removes all authors when a post is deleted
  32. 32. Fix in Database add_foreign_key :authors, :posts,
 on_delete: :restrict :restrict is the default behavior of foreign keys
  33. 33. Ideal fix has_one :author, dependent: :delete_all add_foreign_key :authors, :posts,
 on_delete: :restrict
  34. 34. Learnings The ideal fix never allows someone to directly introduce orphan data, but still does the optimized cascading behavior when deleted in ActiveRecord.
  35. 35. Duplicate Data
  36. 36. Uniqueness Validation validates :name, uniqueness: true
 
 Author.where(name: "Mr. Duplicate").count
 # => 2

  37. 37. Uniqueness Validation author = Author.new
 author.name = "Mr. Duplicate"
 author.save(validate: false)
  38. 38. Unique Index add_index :authors, :name, unique: true
  39. 39. Unique Index PG::Error: ERROR: could not create unique index "index_authors_on_name"
 DETAIL: Key (name)=(Mr. Duplicate) is duplicated.
  40. 40. Ways of Removing Duplicate Data Use SQL to arbitrarily remove duplicates Use scripts to automatically merge content in rows Manually merge content/remove duplicate rows
  41. 41. Unique Index Protects Data from having Duplicates PG::Error: ERROR: duplicate key value violates unique constraint "index_authors_on_name"
 DETAIL: Key (title)=(Mr. Duplicate) already exists This error is thrown every time the Active Record validation is bypassed
  42. 42. Unique Index Protects Data from having Duplicates def save_with_retry_on_unique(*args)
 retry_on_exception(ActiveRecord::RecordNotUnique) do
 save(*args)
 end
 end Retries saving when error is thrown, so the validation can take over
  43. 43. Unique Index Protects Data from having Duplicates def save_with_retry_on_unique(*args)
 retry_on_exception(ActiveRecord::RecordNotUnique) do
 save(*args)
 end
 end Retries only once Calls the block only once
  44. 44. One-to-One Relationships add_index :authors, :name, unique: true Protects from associating multiple records to the parent
  45. 45. Learnings Active Record validations are not meant for data integrity. Incoherent Data can still be introduced. Database level index on unique makes sure data is never duplicated. Rails will skip validations a lot in concurrent situations, so always handle the underlying ActiveRecord::RecordNotUnique Error. Don’t forget to add unique index on one-to-one relationships.
  46. 46. Polymorphic Associations
  47. 47. Polymorphic Association class Post
 has_many :comments, as: :commentable
 end
 
 class Comment
 belongs_to :commentable, polymorphic: true
 end Both commentable_type and commentable_id are stored in the database.
  48. 48. Polymorphic Association class Post
 has_many :comments, as: :commentable
 end
 
 class Comment
 belongs_to :commentable, polymorphic: true
 end There is no way to add foreign keys to polymorphic associations.
  49. 49. Learnings There is no SQL standard way of adding polymorphic associations. Referential Integrity is compromised when we use this ActiveRecord pattern. Expensive to index. The data distribution isn’t usually uniform. Harder to JOIN in SQL.
  50. 50. Database-friendly Polymorphic Associations class Post
 has_many :comments, class_name: 'PostComment'
 end
 
 class PostComment
 include Commentable
 
 belongs_to :post
 end
  51. 51. Learnings Adding one table for each child type maintains data integrity. Foreign keys can be added. Extract similar behaviors using modules in Ruby in the application. Create a non-table backed Ruby class for creating comments Use class_name option to designate which class name to use when retrieving records.
  52. 52. Learnings Easier to grok and operate. Harder to aggregate over all comments regardless of type. More expensive to add another parent type. Use specific tables if you care for data integrity. If data integrity is a non-issue, use polymorphic associations. Event logging or activity feeds are good examples.
  53. 53. Reactive Techniques
  54. 54. Data Integrity Test Suite MAX_ERRORS = 50
 
 def test_posts_are_valid
 errors = []
 Post.find_each do |post|
 next if post.valid?
 
 errors << [post.id, post.errors.full_messages]
 
 break if errors.size > MAX_ERRORS
 end
 assert_equal [], errors
 end
  55. 55. Data Integrity Test Suite def test_post_bodys_are_not_nil
 assert_equal 0, Post.where(body: nil).count
 end
  56. 56. Learnings Proactive techniques work best They’re not always feasible if you have bad data already Reactive integrity checks are a good alternative Run these regularly against production data to surface errors up. Avoid using complex constraints.
  57. 57. Recap Not null constraints Unique indexes Foreign keys Refactor Polymorphic association into separate tables Reactive integrity checks
  58. 58. Thanks! @rizwanreza

×