The problem with Migrations

May 5th, 2008 at 2:48 pm by Joel in DB Migrations

Back in February in sunny Orlando, I gave a presention on why and how one should use CakePHP Migrations. I think the presentation went down well, but when it came to the Q&A at the end of the session, a few issues were raised that - quite honestly - I hadn’t actually thought about. It’s not that I didn’t know about them, it’s just that I never came across them to make them a problem for me. But since I began to actually use Migrations in a team of developers, I have come to understand some issues that could potentially be a deal breaker for some teams and users.

The main issues are related to the use of a version control system such as Subversion, when managing your applications source code and migration files, and it only really applies when using migrations as part of a team. Basically when more than one member of the team creates a migration file and checks it into SVN near the same time, you could end up with an SVN conflict and two migration files that have the same name, but consist of different migration code. Of course you won’t see this problem until you SVN update, but when you do, and you try running those migrations, not all will be run. If they are, they could be run in the completely wrong order.

The other issue is that of file naming. You may come across times when migration files have been given the same name. So combine this issue with the first one above, we have a potential for all out world war three!

Let me give you an example…

We have a team of top notch CakePHP developers; Harry, Ron and Hermione (excuse the sad Harry Potter conventions for a second). They all use CakePHP Migrations as part of this new spangly Blog application that they are building. They have already baked the application, so have their skel setup and ready to use, and they have checked in the code into Subversion.

Now Harry starts off by creating a migration file called ”001_create_users.yml” so he can create the users table. He modifies the file as he sees fit, creating a few columns etc. He then runs that migration and the users table is created perfectly. So no problems so far and all works great.

However… Ron, being the half-wit he can sometimes be, has gone ahead and created another migration file so he can start work on the posts table. So he creates “001_create_posts.yml” migration file, edits it and runs it. He now has the posts table created, and we still have no problems - yet!

So along skips the team leader; Hermione who wants to check out what Harry and Ron have done so far and maike sure all works. So Ron and Harry go ahead and commit the code they have created so far, along with their migration files, then Hermione updates her working copy. Again, all is fine.

But now we come across problems, because Hermione wants to run the migrations. Dun, dun, derrr!

She now sees two migration files:

  • 001_create_users.yml
  • 001_create_posts.yml

She goes ahead and runs these migrations, which work, but because we have two files with a version number of 001, only the first one is run, and the second is ignored.

And there my friends, is the crux of the problem.

So a few weeks ago, I sat down and started thinking about how I could get around these issues. But then all of a sudden, Rails came along and solved it for me.

I still have a great interest in Ruby on Rails. After all, Cake Migrations was inspired greatly by Rails implementation of the same system. I therefore keep a close eye on the Rails changesets. A few weeks ago, I saw quite a large change to the way Migrations are handled in Rails. It seems quite a few people have come across the same problem I just described so elloqently. ;)

So they introduced timestamped migrations. Now why didn’t I think of that?

So now instead of using incremental version numbers for each migration file (i.e. 001 or 002), migration files in Rails now use a UTC timestamp indicating the time that the file was created. So now they look something like 12345678_create_users.yml.

This slap in the face simple change now means that all migration files are unique, and conflicts are never experienced. It means that all Migration files can now be run in the correct order. That is, as long as each member of the team has the correct time set!

This change also introduces a nice little extra called Interleaved Migrations. Meaning that any missed migrations can still be caught and run at any time. We will also be able to migrate down and ignore any migrations that have not yet been run. After all, we don’t to drop a table that has not yet been created.

So as long as I haven’t confused you too much, you now know when is coming in the next version of CakePHP migrations. You can also expect to see support for native PHP arrays in migration files, aswell as the YAML. This will mean that you can use your Cake Schema migration files with CakePHP Migrations.

I already started work on this before I left for my holidays, and will continue when I return. Expect a release by the end of May. So hopefully, these changes will make Migrations a non-blocker for all your guys working on Cake apps as part of a team. Hey, but if you think it won’t, please let me know. I appreciate any comments and questions any of you may have.

 

15 Comments

Using the UTC timestamp doesn’t completely fix the problem with the version numbers. You could in theory still have 2 migration files that start with the same timestamp. But this won’t occur as often as the incremental version numbers that is currently being used.

Looking forward to what is in the next version.

No it doesn’t completely fix it, but the chances of more than person using the exact same timestamp is very, very unlikely. So I would think it is safe to say that timestamping pretty much solves the problem.

I’m not sure if that will be an issue in practice, but if you use a timestamp as version number then the order of the migrations can be rather randomly. For example: if we both start with version X, and I create my migration one minute earlier than you create yours, then it means my migration is executed before your migration. If the order matters in which the migrations are executed (i.e. your migration must be executed before my migration), than you have a conflict…

@daniel: then i think there is no way around that, as that would be an issue no matter how migrations are handled. But it is very easily fixed by simply changing the timestamp, so the migration occurs after the other; in the correct order.

RainChen says...

I’m curious how to do some initialize action in YML files.
For example in my Rails project, I can initialize some counter cache like this:

#005_add_area_places_count.rb
class AreaPlacesCount 0

# init counter cache
Area.reset_column_information
Area.find(:all).each do |area|
area.update_attribute :places_count, area.places.length
end
end

def self.down
remove_column :areas, :places_count
end
end

@RainChen Sorry, but that is not supported in the CakePHP Migrations shell. Rails using Ruby code for their migration files, so you can write anything in there, but CakePHP Migrations uses YML, so you are somewhat limited.

However, as mentioned, the next release will allow you to write migration files using all PHP code and arrays. So you could do anything you want in there.

RainChen says...

@Joel
I’m looking for some kind of Rails style migration on Cake for a long time.
You do the great job.
So,my next question is when will the next release come out.

well I get back from vacation this weekend, so hope to get it done next week. So if I had to guess, you should see something by the end of May.

I’m curious. Let’s say there are 2 developers. One in England and one in the US working on the same project. Joe from England creates a migration file with a timestamp at 2:00pm is local time. Let’s say the timestamp is 345678_create_user.yml Now Jim from US creates a migration at 12:00pm his local time and has a timestamp of 123456_create_user.yml

Even though Joe created his first at 2pm and Jim created his second at 12pm. Does this still account for the timezone difference?

That won’t be a problem as timestamps will be UTC or GMT. So as long as you use the Migration shell script to generate the file, all will be fine.

@Joel: Yeah, sure, you probably can’t avoid such conflicts with migrations. But with traditional version numbers it is obvious there is a conflict, whereas with timestamped version numbers you no longer see directly that there may be a conflict. But as I said in my first comment, it’s possible that this is just a theoretical issue ;-)

Time will tell…

Actually I see (saw) two problems with CakePHP Migrations:

- It uses PEAR
- It is console only

I solved both problems. I wrote a CakeMigration component based on your that still supports multiple DB types but without the need to install ADODB or PEAR, also it is a non-shell component so it can be run by any application (for example for the General update process).

I thought about publishing it on the bakery but maybe we can join forces?

Ruby on Rails 2.1 is using the UTC timestamp methodology. The concept that people would create their migration at exactly the same second is a pretty unlikely race scenario. Given that a community of multi-developers as large as RoR has embraced this methodology I feel like the edge cases associated with issues would be small enough that it should be acceptable. There is no more risk inherent in this method than in CakePHP’s built in Schema tool.

[...] The problem with Migrations Categories [...]

 

Leave your Comments

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>