Scott's Recipes Logo

Rails Coding: How I Seed

Pizza courtesy of Pizza for Ukraine!

Donate Now to Pizza for Ukraine

 

Last Updated On: 2025-09-01 04:31:52 -0400

I honestly don’t know how to use database seeding in Rails. I’ve tried a few times over the years but it never worked out for me and I went my own way (as a lot of old school engineers do). This blog post talks about “How I Seed” or my process for building datacentric Rails apps. I’ll start by laying out assumptions

Disclaimer

After writing the above paragraph, I did a wee bit of research and Rails seeding approaches have evolved to be somewhat similar to what I describe below. While the specifics vary, there absolutely is convergent evolution at work. I think I’m a bit lazier and more resource constrained than a lot of Rails projects so my approach optimizes for my labor issues (I’m way too often a project of one resource).

My Assumptions

I start with a few assumptions:

  1. Entering data manually into a system sucks monkey chunks.
  2. You can’t build a good HTML user interface without a robust set of data.
  3. You sometimes can’t even know if you have a good product concept without a robust set of data to play with in a system. Seeing things on screen is, today, as magical as it was 3 decades ago when I started.
  4. Every system needs to have good data both locally for development and server side for an initial bootstrap.
  5. Any seeding system needs to be idempotent to avoid duplicate records; this is particularly true for system level records like supporting tables (think things like tables that input into select lists).

Note: Idempotent is, generally, an infrastructure as code concept which can be basically translated to “run this script and only create this resource (server / database / network connection / firewall ) one time”. For example, let’s say that we have a colors table with 3 colors in it: Red, Green and Blue. An idempotent creation routine could be run a dozen times but there would only ever be 3 colors in it.

My Approach

I use a robust seeding approach tied to Rake tasks which let me bootstrap either the entire system or any part of it. Here are a few examples from an existing Rails application I’m currently running in production:

# master seeding command; runs everything start to finish
bundle exec rake seed:init 

-or-

# just run one table - the one for seeding task frequencies
bundle exec rake task_frequency:init --trace

The seed.rake Code

Here’s what a seed.rake file looks like:

namespace :seed do
  # bundle exec rake seed:init --trace
  task :init => :environment do
    #
    # Core setup data
    #
    Rake::Task["user:init"].invoke
    Rake::Task["relationship_type:init"].invoke
    Rake::Task["team:init"].invoke
    Rake::Task["team_member:init"].invoke
    Rake::Task["secure_note_type:init"].invoke
    if Rails.env.development?
      Rake::Task["secure_note:init"].invoke
    end
    Rake::Task["task_state:init"].invoke    
    Rake::Task["task_frequency:init"].invoke    
    Rake::Task["job_type:init"].invoke
    Rake::Task["job:init"].invoke
    Rake::Task["task:init"].invoke
    Rake::Task["loan_type:init"].invoke
    Rake::Task["loan:init"].invoke
  end
  
  # be rake seed:metrics --trace
  task :metrics => :environment do
    # Todo -- this should technically be dynamic and get a full list of ActiveRecord 
    # classes from the system but **laziness**
    klasses = [User, RelationshipType, Team, TeamMember]
    klasses.each do |klass|
      puts "#{klass.name} -- #{klass.count}"
    end
  end
end

The task_frequency.rake Code

Here’s what a task_frequency.rake file looks like:

namespace :task_frequency do

  # be rake task_frequency:init --trace
  task :init => :environment do
    Rake::Task["task_frequency:seed"].invoke
    Rake::Task["task_frequency:metrics"].invoke
  end

  # be rake task_frequency:metrics
  task :metrics => :environment do
    klass = "TaskFrequency"
    puts "For object: #{klass.to_s}, there are #{klass.constantize.count} objects in the database"
  end

  task :seed => :environment do
    types = TaskFrequency::TASK_FREQUENCY_TOKENS

    types.each do |type|
      os = OpenStruct.new(name: type)
      status, t = TaskFrequency.find_or_create(os)
      
    end
  end
end

Things to Know

A few notes:

  1. I always, always, always put an invocation example, right down to using bundle exec, at the start of the rake task. I’ve done too much on call work where a Rake task needs to be run to fix something server side and figuring out how to run something when you’re bleary eyed and it is 4:19 am after a 12 hour day, well, you get it.
  2. Almost all my ActiveRecord classes have an idempotent find_or_create action that takes an OpenStruct in and builds an object. Class level constants define the idempotency structure and a concern for the find_or_create routine makes this ridiculously simple. Where the idempotency is more complex than the constant allows for, I simply have a local find_or_create routine which overrides the concern.
  3. A hash could be used instead of an OpenStruct; personally I prefer an OpenStruct as it makes the underlying metaprogramming a bit simpler.
  4. For advanced users, you can customize the underlying Rails generators to create these Rake tasks automatically in parallel with model creation. This is very, very useful.

Here’s an example of the class constants from point #2:

IDENTITY_RELATIONSHIP = :any # could also be :all
IDENTITY_COLUMNS = [:name]

What this means is:

  1. When the IDENTITY_RELATIONSHIP is set to :any then if any of the fields in the IDENTITY_COLUMNS constant are present, the record is considered to already exist.
  2. When the IDENTITY_RELATIONSHIP is set to :all then all of the fields in the IDENTITY_COLUMNS constant must be present.