Step-by-step Legacy Database Migration with Rails

Your engineering team has decided it is time to migrate over to a new PostgreSQL database. This process will require lots of careful planning and timing. Data migration sounds like a daunting task, but it may be easier than you think. In this blog post we will walk through the step-by-step process to copy data to a new database structure using Rails.

Importance of Upserting data during development
The migration process could take several months to complete. When a rake task is run, legacy production data becomes development data. Running the rake tasks locally allows the development team to address issues before running the task in production. In the days it takes to fine tune the rake tasks, the production and development data get farther and farther out of sync. Upserting data in the rake task solves this issue. Upsert means to insert rows into a database table if they do not already exist, or update them if they do (Wiktionary definition). The migration design explained in this blog post allows legacy and newly created data to be upserted and kept in sync.

Project Overview
Let's take a look at the diagram below and outline what we need to accomplish.

(1) Gain access to the legacy_ice_creams table via a vpn.

(2) Create two migration tables legacy_keys_ice_creams and legacy_ice_creams. The legacy_keys_ice_creams table is the join table between the legacy_ice_creams and ice_creams tables. We will discuss why this join table is needed later in this post.

(3) Write rake task to iterate through the legacy_ice_creams table and populate the legacy_keys_ice_creams and ice_creams tables.

(4) Gain access to the legacy_users table via vpn.

(5) Create migration for LegacyKeysUser and User models. The legacy_keys_users table is the join table between the legacy_users and users tables.

(6) Write rake task to iterate through the legacy_users table and populate the legacy_keys_users and users tables.

Vocabulary:
rake task, vpn, legacy data, legacy database, legacy table, join table, map/mapping, foreign key, upsert, database management application (e.g. SequelPro)

Gems to add to existing Rails project:
rake-progressbar (it’s an oldie but goodie)

Part I

1. Set up a VPN in order to have access to the legacy database.
(If you have access to the legacy database, skip to step 2.) You’ve got two choices here. You can either ask your DevOps teammate to set up this connection (direct database access) for you or you’ll need to put on your DevOps hat. This blog post assumes you already have the proper user permissions for the legacy database.

Having access to the legacy database in the Rails console is a requirement. Having a database management application like SequelPro, Navicat or pgAdmin will be extremely helpful. These provide a graphical user interface that allows you to see the legacy database, tables and record count. If you are accessing production data be careful! This interface is like a china shop -- look, but do not touch. You don't want to break anything.

2. Make an entry point to access the legacy data.
Create a new model named MysqlLegacy. (I find it helpful to choose a name that refers to the kind of legacy database. In this post we are using MySQL, but these steps apply to any Rails-supported database.) If you have a Rails 4.x app, the class will inherit from ActiveRecord::Base. If you have a newer Rails 5.x app, inherit from ApplicationRecord.

Rails 4.x: app/models/mysql_legacy.rb

class MysqlLegacy < ActiveRecord::Base  
  establish_connection configurations['legacy'][Rails.env]
  self.abstract_class = true
end  

Rails 5.x: app/models/mysql_legacy.rb

class MysqlLegacy < ApplicationRecord  
  establish_connection configurations['legacy'][Rails.env]
end  

3. Establish connection to INDIVIDUAL legacy database tables.
Create a model called LegacyIceCream. It will not require a migration. It is merely a bridge to connect the production data to the new app. This model will inherit from the MysqlLegacy class which you created in step 2.

class LegacyIceCream < MysqlLegacy  
  self.table_name = "IceCream"
end  
  • self - the MysqlLegacy database
  • IceCream - name of the legacy table to access, written exactly as it is in the legacy database

Once this model is created, you should be able to access the legacy data in the Rails console. (Remember: This could be production data so you want to be careful.)

4. Count of the LegacyIceCream records in the console
To make sure that you have successfully established a connection in the Rails console:
> LegacyIceCream.count

The number of records should match the number in the legacy database table which you can find by using a database management application. Inspect the table and see what attributes it has. In our example, the legacy_ice_creams has a primary key of IceCreamID and a FlavorName.

5. Create the LegacyKeysIceCream and IceCream models

rails g model legacy_keys_ice_cream legacy_id:integer legacy_id new_id:integer  
rails g model ice_cream flavor:string  

6. Run the migrations.
Check the migrations in the db directory. If they look ok, run:

rake db:migrate  

7. Write the IceCream and LegacyKeysIceCream data migration rake task.

# app/lib/tasks/upsert_ice_cream.rb

desc "Upsert ice cream records from legacy db"  
  task upsert_ice_cream: do
    bar = RakeProgressbar.new(LegacyIceCream.count)

    LegacyIceCream.all.each do |legacy_ice_cream|
      bar.inc

      lkic = LegacyKeysIceCream.find_or_initialize_by(legacy_id: legacy_ice_cream.IceCreamID)

      ice_cream = IceCream.find_or_initialize_by(id: lkic.new_id)
      ice_cream.flavor = legacy_ice_cream.FlavorName
      ice_cream.save!

      lkic.new_id = ice_cream.id
      lkic.save!
    end
    bar.finished
  end
end  

Purpose of the Join Table
The simplest explanation is, the legacy_keys_ice_creams join table serves as a "family tree" that logs which new object was created from which legacy object.

a) This join table maps ice_cream.id to its corresponding legacy_ice_cream.IceCreamID. If you have an ice_cream record and you want to know which LegacyIceCream it was created from, you need a join table to tell you. The join table's job is to keep track of the legacy and new object's "birth records". The relationship works in the opposite direction too. If you have a LegacyIceCream object and you want to know which new IceCream object was created, the legacy_keys_ice_creams table holds that information.

b) During the development process the legacy data will continue to be updated. Because of this, there needs to be a way to upsert IceCream records that are already sitting in the new database. The rake task that we create uses the legacy_keys_ice_creams join table to find them, then update any IceCream record whose LegacyIceCream data has changed. This is a big deal! It means your legacy and new data will stay in sync and data will not be out of date.

Let’s dissect the rake task code
(a) Iterate through the legacy_ice_creams

(b) Next, .find or .initialize a LegacyKeysIceCream object. If the LegacyKeysIceCream record already exists, .find it by searching for the legacy_ice_cream.IceCreamID. If the legacy_key_ice_cream DOES NOT already exist, .initialize a new record.

(c) Then, .find or .initialize an IceCream object. If the IceCream record already exists, .find it by searching for the legacy_keys_ice_cream.new_id. If the IceCream object DOES NOT already exist, .initialize a new record.

8. Run the rake task
In the console, run: rake upsert_ice_cream
Once the rake task is complete, check the IceCream.count in the console to ensure the ice_creams table is now populated.

Part II

9. Establish connection to INDIVIDUAL legacy database tables.
In Part I we set up the IceCream related dependencies. Now we will repeat those steps with the LegacyUser. Again, it will not require a migration. Remember, this is merely a bridge to connect the production data to the new app.

class LegacyUser < MysqlLegacy  
  self.table_name = "User"
end  
  • self - the MysqlLegacy database
  • users - name of the legacy table to access, written exactly as it is in the legacy database

10. Count of the LegacyUser records in the console
To make sure that you have successfully established a connection in the Rails console:
> LegacyUser.count

The number of records should match the number in the legacy database table which you can find by using a database management application. Inspect the table and see what attributes it has. In our example, the legacy_user attributes are a primary key of UserID, FirstName, LastName and IceCreamID.

11. Create LegacyKeysUser and User models

rails g model legacy_keys_user legacy_id:integer new_id:integer  
rails g model user first_name:string last_name:string ice_cream_id:integer  

12. Run the migrations.
Check the migrations in the db directory. If they look ok, run:

rake db:migrate  

13. Write the LegacyKeysUser and User data migration rake task.

# app/lib/tasks/upsert_users.rb

desc "Upsert user records from legacy db"  
  task upsert_users: [:upsert_ice_cream] do
    bar = RakeProgressbar.new(LegacyUser.count)

    LegacyUser.all.each do |legacy_user|
      bar.inc

      lku = LegacyKeysUser.find_or_initialize_by(legacy_id: legacy_user.UserID)

      legacy_ice_cream = LegacyIceCream.find_by(IceCreamID: legacy_user.IceCreamID)

      legacy_keys_ice_cream = LegacyKeysIceCream.find_by(legacy_id: legacy_ice_cream.IceCreamID)

      ice_cream = IceCream.find_by(id: legacy_keys_ice_cream.new_id)

      user = User.find_or_initialize_by(id: lku.new_id)
      user.first_name = legacy_user.FirstName
      user.last_name = legacy_user.LastName
      user.ice_cream_id = ice_cream.id
      user.save!

      lku.new_id = user.id
      lku.save
    end
    bar.finished
  end
end  

Do you remember why we created the join table? Take a minute to explain to yourself or nearest duck the importance of the legacy_keys_ice_cream table. For a hint, review the above explanation.

The Ice Cream Dependency
In our example each User record stores a favorite ice cream flavor id. The users table needs ice_creams to exist before creating any users. In the beginning of the task [:upsert_ice_cream] is added so that it will execute before running the upsert_users task.

Let’s dissect the rake task code
(a) Iterate through each legacy_user

(b) Next, .find or .initialize a LegacyKeysUser object. If the LegacyKeysUser record already exists, .find it by searching for the legacy_user.UserID. If the legacy_key_user DOES NOT already exist, .initialize a new record.

(c) Next, .find or .initialize a user object. If the User record already exists, .find it by searching for the legacy_keys_user.new_id. If the user DOES NOT already exist, .initialize a new record.

14. Run the rake task
In the console, run: rake upsert_users
Once the rake task is complete, check the User.count in the console to ensure the users table is now populated.

Conclusion
Migrating a database is a tough and often messy process. Hopefully this blog post is easy to follow and allows you to accomplish a pain-free migration. If you or your team would rather hire Vulk to upgrade your databases, feel free to contact us.

References:
ApplicationRecord in Rails 5, https://blog.bigbinary.com/2015/12/28/application-record-in-rails-5.html
Upsert, https://en.wiktionary.org/wiki/upsert


Bonus
Here is sample code to traverse the join table and find data you're looking for:

To find a LegacyIceCream record when you have ice_cream data:

#find the ice_cream object
ice_cream = IceCream.find(3)    

#find the join table record based on the ice_cream object
legacy_keys_ice_cream = LegacyKeysIceCream.find_by(new_id: ice_cream.id)    

#find the legacy_ice_cream object through the keys table
LegacyIceCream.find_by(IceCreamID: legacy_keys_ice_cream.legacy_id)  

=> #<LegacyIceCream id: 31, FlavorName: "Rocky Road">

To find an IceCream record when you have legacy_ice_cream data:

#find the legacy_ice_cream object
legacy_ice_cream = LegacyIceCream.find_by(IceCreamID: 20)

#find the join table record based on the legacy_ice_cream object
legacy_keys_ice_cream = LegacyKeysIceCream.find_by(legacy_id: legacy_ice_cream.IceCreamID)

#find the ice_cream object through the keys table
ice_cream = IceCream.find_by(id: legacy_keys_ice_cream.new_id)

=> #<IceCream id: 1, flavor: "Chocolate">