Database migration scripts are a notorious source of difficulty in the software delivery process. This session will discuss how we neutralized this all too common headache.
Now our deployment framework executes database migrations automatically with every application deploy, and the QA team performs self-service full stack deployments in test environments. The resulting additional bandwidth has been invested in more frequent software releases, and the opportunity to focus on higher-value tasks.
1. Database Migrations
with Gradle and Liquibase
Dan Stine
Copyright Clearance Center
www.copyright.com
Gradle Summit
June 12, 2015
2. About Me
• Software Architect
• Library & Framework Developer
• Platform Engineering Lead & Product Owner
• Gradle User Since 2011
• Enemy of Inefficiency & Needless Inconsistency
dstine at copyright.com
sw at stinemail.com
github.com/dstine
6/12/20152
6. Database Migrations
• Database structure changes
– Tables, constraints, indexes, etc.
– Schema changes (DDL, not DML)
• Reference data
– List of countries, user types, order status, etc.
– Set of allowed values
• Database logic
– Functions, procedures, triggers
– (Very little of this)
6/12/20156
7. Our Historical Approach
• DB migrations handled in relatively ad-hoc fashion
• Various flavors of “standard” practice
– Framework copied and modified from project to project
– Framework not always used (“small” projects)
• Development teams shared a DEV database
– Conflicts between code and database
6/12/20157
8. Development Pain Points
• Intra-team collaboration was difficult
• Forced synchronous updates within development team
• Learn variations when switching between projects
• Project startup was costly
6/12/20158
9. Deployment Pain Points
• Manual process
– Where are the scripts for this app?
– Which scripts should be run and how?
• Recurring difficulties
– Hours spent resolving mismatches between app and database
– Testing activities frequently delayed or even restarted
• Impossible to automate
– Too many variations
• Self-service deployment was a pipe dream
6/12/20159
10. Standard Software Platform
• Started platform definition in 2011
– Homogenous by default
• Tools
– Java, Spring, Tomcat, Postgres
– Git / GitHub, Gradle, Jenkins, Artifactory, Liquibase, Chef
• Process
– Standard development workflow
– Standard application shape & operational profile
6/12/201510
11. Vision for Database Script Management
• Integrated into developer workflow
• Feeds cleanly into deployment workflow
• Developer commits scripts and the process takes over
– Just like with application code
6/12/201511
12. A Plan For Pain Relief
• Manage scripts as first-class citizens
– Same repo as application code
– Standard location in source tree
• Standard execution engine
– No more variations
– Automatic tracking of applied migrations
• Prevent conflicts and mismatches
– Introduce developer workstation databases (LOCAL )
– Dedicated sandbox
– Commit database and associated application change together
6/12/201512
13. A Plan For Pain Relief
• Liquibase
– Database described as code
– Execution engine & migration tracking
• Gradle
– Provide conventions
– Tasks for invoking Liquibase
– Already familiar to developers from existing build process
– Flexibility to integrate into deployment process
– Flexibility to handle emergent requirements
6/12/201513
15. Liquibase Basics
• Provides vocabulary of database changes
– Create Table, Add PK, Add FK, Add Column, Add
Index, …
– Drop Table, Drop PK, Drop FK, Drop Column, Drop
Index, …
– Insert, Update, Delete, …
• Changes are grouped into changesets
– Change(s) that should be applied atomically
• Changesets are grouped into changelogs
– Files managed in version control
6/12/201515
16. Liquibase Basics
• Changesets uniquely identified by [Author, ID, File]
– Liquibase tracks changeset execution in a special table
– Lock table to prevent concurrent Liquibase invocations
– Modified changesets are detected via checksums
• Supported databases
– MySQL, PostgreSQL, Oracle, SQL Server, …
• Groovy DSL
– Liquibase v2 supported only XML
– https://github.com/tlberglund/groovy-liquibase
6/12/201516
18. Liquibase @ CCC
• Learning curve
– Team needs to understand the underlying model
– Don’t edit changesets once they’ve been applied
• Our standards
– Schema name and tablespace are required
– Parameterize schema name and tablespace
createTable(
schemaName: dbAppsSchema,
tableName: 'myapp_version',
tablespace: dbDataTablespace)
6/12/201518
20. Development Workflow
• Gradle is our SCM hub
– Workstation builds
– LOCAL app servers via command line
– IDE integration
– CI and release builds on Jenkins
• Maintain Gradle-centric workflow
– Integrated database development
6/12/201520
21. Standard Project Structure
• Single Git repo with multi-project Gradle build
myapp
myapp-db
myapp-rest
myapp-service
myapp-ui
group = com.copyright.myapp
• UI and REST service published as WARs
• DB published as JAR
6/12/201521
22. Custom Gradle Plugin
• Created custom plugin: ccc-postgres
• Standard script location
– Main source set: src/main/liquibase
– Package: com.copyright.myapp.db
• Standard versions
– Liquibase itself
– Postgres JDBC driver
6/12/201522
23. Plugin Extension
• Custom DSL via Gradle extension
cccPostgres {
mainChangelog = 'com/copyright/myapp/db/main.groovy'
}
• Main changelog includes other changelogs
6/12/201523
24. Development Lifecycle Tasks
• Provided by ccc-postgres
• Easy to manage LOCAL development database
– Isolated from other developers and deployments
– Pull in new schema changes run a task
• Built on Gradle Liquibase plugin
https://github.com/tlberglund/gradle-liquibase-plugin
6/12/201524
26. Development Lifecycle Tasks
• Typical developer loop
– gradlew update
– gradlew tomcatRun and/or IDE
• Not just for product development teams
– Simple to run any app
– Architects, QA, Platform Engineering
6/12/201526
27. Development Lifecycle Tasks
Task Runs As Description
createDatabase postgres Creates ccc user and database
Creates data and index tablespaces
createSchema ccc Creates apps schema
update ccc Runs main changelog
dropDatabase postgres Drops ccc user and database
resetBaseChangelog postgres Truncates
postgres.public.databasechangelog
6/12/201527
• resetBaseChangelog
– Must clear all traces of Liquibase to start over
28. Plugin Configuration
• Override default library versions
cccPostgres.standardDependencies.postgresDriver
• Defaults point to LOCAL development database
– Can override property values
dbHost, dbPort, dbName
dbUsername, dbPassword
dbDataTablespace, dbIndexTablespace
dbBaseUsername, dbBasePassword
6/12/201528
29. Standardization and Compliance
• So all our teams are authoring DB code
• But Liquibase is new to many
• And we have company standards
• Let’s automate!
6/12/201529
30. Static Analysis
• CodeNarc
– Static analysis of Groovy code
– Allows custom rule sets
• Created a set of custom CodeNarc rules
– Analyze our Liquibase Groovy DSL changelogs
• Apply to our db projects via the Gradle codenarc plugin
– Fail build if violations are found
6/12/201530
31. Static Analysis – Required Attributes
• Our rule categorizes all change attributes
– Required by Liquibase
• createTable requires tableName
– Required by CCC
• createTable requires schemaName and tablespace
– Optional
• Unintended positive consequence!
– Catches typos that otherwise would not be detected until farther
downstream
– constrainttName or tablspace
6/12/201531
32. Static Analysis – Required Parameterization
• Ensure that schemaName & tablespace are parameterized for
future flexibility
@Override
void visitMapExpression(MapExpression mapExpression) {
mapExpression.mapEntryExpressions
.findAll { it.keyExpression instanceof ConstantExpression }
.findAll { ['schemaName', 'tablespace']
.contains(it.keyExpression.value) }
.findAll { it.valueExpression instanceof ConstantExpression }
.each { addViolation(it, "${it.keyExpression.value} should
not be hard-coded") }
super.visitMapExpression(mapExpression)
}
6/12/201532
33. Schema Spy
• Generates visual representation of database structure
– Requires running database instance
– Requires GraphViz installation
• Custom task runSchemaSpy
– By default, points at LOCAL database
6/12/201533
34. Continuous Integration for DB Scripts
• Compile Groovy
– Catches basic syntax errors
• CodeNarc analysis
– Catches policy and DSL violations
• Integration tests
– Apply Liquibase scripts to H2 in-memory database
– Catches additional classes of error
6/12/201534
35. Release Build
• Publish JAR
– Liquibase Groovy scripts from src/main/liquibase
• META-INF/MANIFEST.MF contains entry point
Name: ccc-postgres
MainChangelog: com/copyright/myapp/db/main.groovy
6/12/201535
37. Deployment Automation
• Early efforts focused on applications themselves
– Jenkins orchestrating Chef runs
– Initial transition from prose instructions to Infrastructure as Code
• Database deployments remained manual
– Better than ad-hoc approach
– But still error prone and time-consuming
6/12/201537
38. Automated Application Deployments
• Chef environment file
– Cookbook versions: which instructions are used
• Chef data bags
– Configuration values for each environment
– Encrypted data bags for (e.g.) database credentials
• Jenkins deploy jobs (a.k.a “the button”)
– Parameters = environment, application version
6/12/201538
40. Initial Delivery Pipeline (DB Deployments)
• Clone Git repo and checkout tag
• Manually configure & run Gradle task from ccc-postgres
gradlew update -PdbHost=testdb.copyright.com
-PdbPort=5432 -PdbDatabase=ccc
-PdbUsername=ccc -PdbPassword=******
• Many apps x
many versions x
multiple environments =
TIME & EFFORT & ERROR
6/12/201540
42. Target Delivery Pipeline
• Automated process should also update database
– Single Jenkins job for both apps and database scripts
• Maintain data-driven design
– Environment file lists database artifacts
– Controlled flow down the pipeline
• Gradle database deployment task
– Retrieve scripts from Artifactory
– Harvest information already in Chef data bags (URL, password)
– Execute Liquibase
6/12/201542
44. Jenkins Deploy Job
• One job per application group, per set of deployers
– E.g. myapp.qa allows QA to deploy to environments they own
– Typically contains multiple deployables (apps, db artifacts)
– Typical deployer sets = DEV, QA, OPS
• Executes Liquibase via Gradle for database deployments
– Invokes deployDbArtifact task for each db artifact
• (Executes Chef for application deployments)
6/12/201544
45. Gradle deployDbArtifact Task
• Parameterized via Gradle project properties
– appGroup = myapp
– artifactName = myapp-db
– artifactVersion = 2.1.12
– environment = TEST
• Downloads JAR from Artifactory
– com.copyright.myapp:myapp-db:2.1.12
– Extract MainChangelog value from manifest
6/12/201545
46. Gradle deployDbArtifact Task
• Retrieves DB URL from Chef data bag item for TEST
"myapp.db.url": "jdbc:postgresql://testdb:5432/ccc"
• Retrieves password from encrypted Chef data bag
– myapp.db.password
• Executes Liquibase
6/12/201546
47. Data Bag Access
• Built on top of Chef Java bindings from jclouds
• No support for encrypted data bags
• Java Cryptography Extensions and the following libs:
compile 'org.apache.jclouds.api:chef:1.7.2'
compile 'org.apache.jclouds.provider:enterprisechef:1.7.2'
compile 'commons-codec:commons-codec:1.9'
6/12/201547
52. Additional Scenarios
• Framework originally design to handle migrations for
schema owned by each application
• Achieved additional ROI by managing additional
database deployment types with low effort
6/12/201552
53. Roles and Permissions
• An application that manages user roles and permissions
(RP) for all other applications
– Has rp-db project to manage its schema, of course
– But every consuming app (e.g. myapp) needs to manage the
particular roles and permissions known to it
– Reference data that lives in tables owned by another app
• myapp now has multiple db projects
– myapp-db to manage its schema
– myapp-rp-db to manage its RP reference data
– Both are deployed with new versions of myapp
6/12/201553
54. Roles and Permissions
• Minor addition of conditional logic
if (artifactName.endsWith('-rp-db')) {
// e.g. myapp-rp-db
// deploy to RP database
} else {
// e.g. myapp-db
// deploy to application's own database
}
• Easy to implement because … Gradle & Groovy
• Conceptual integrity of framework is maintained
6/12/201554
56. Observations
• Power of convention and consistency
– Once first schemas were automated, dominoes toppled quickly
• Power of flexible tools and building blocks
– Handle legacy complexities, special cases, acquisitions, strategy
changes, evolving business conditions
– New database project types fell easily into place
6/12/201556
57. Observations
• Know your tools
– Knowledge (how) has to propagate through the organization
– Ideally the underlying model (why)
• Schema changes no longer restrained by process
6/12/201557
“If it hurts, do it more often”
“If it’s easy, do it more often”
“If it hurts, do it more often”
Reduced technical debt
58. Dirty Work …
• Database development and deployment processes are
often considered to be unexciting
• But sometimes you need to roll up your sleeves and do
the dirty work to realize a vision
• And relational databases are still the bedrock of most of
today’s information systems
6/12/201558
59. Dirty Work … Can Be Exciting!
• Efficient processes
• Reliable and extensible automation
• CONTINUOUS DELIVERY
6/12/201559
60. Full Stack Automated Self-Service Deployments
• Reduced workload of Operations team
• Safely empowered individual product teams
• Significantly reduced the DEV-to-TEST time delay
• Reinvested the recouped bandwidth
– More reliable & frequent software releases
– Additional high-value initiatives
6/12/201560
Execute via shell or batch script, copy/paste, etc.
Prohibitive time / effort / cost to automate
Exceptions are possible but should be rare
Just a sampling of tools, there are others
No longer need to build our own migrations framework
Liquibase, Flyway, etc.
Changes also called “refactorings” – Refactoring Databases book by Ambler and Sadalage
Example grouping – splitting a column, or a higher level refactoring
Analogy: don’t modify Git commits once they’ve been “published”
As someone who works in central role, the ability to checkout any project and run it via the Tomcat plugin is simply brilliant
BUT I need the database, too!
Simplified example – most projects have additional subprojects for layering purposes and so on
Much farther downstream
Cannot be generated at build time without spinning up Postgres, but there is no way to run embedded Postgres
Data-driven!
DBA’s would copy/paste from text file
Taking advantage of ability to write arbitrary logic in Groovy
Same-day turnaround for test environments
Reap additional dividends on our carefully generalized design
Same in all environments – especially for database scripts!
Balance the internal and external pressure against design considerations – just like any software development effort
Don’t edit changesets! And not just basic usage – form a mental model
We‘ve all heard “if it hurts, do it more often”. Once you’ve done that, next step is “it’s easy so do it more often!”
Single-page web apps, distributed systems
To an ever-growing number of people, teams, companies, and industries
QA team can control the environments they own in the first place!