One of the biggest hurdles I had to get past in order to use Git efficiently was getting my head around the differences between SVN’s externals and Git’s submodules. Since our migration from SVN to Git, I’ve seen other folks on our team working through the same hiccups I had initially. I thought the following notes might be useful to other folks making the leap as well.
At Crowd Favorite we write modular code (for example: lots of lean, targeted WordPress plugins instead of few complex plugins) for a number of reasons:
- it enables code reuse (DRY)
- smaller bits of code are easier to maintain, update, debug, etc.
- it’s easier to test smaller sets of features/functions
To support this approach we previously made extensive use of SVN externals in our projects. Often times active development would be happening on a specific project and within one or more of the externals within the project (to support a new feature, etc.).
With SVN externals, the included externals are automatically updated to the latest version on every update (unless you –exclude-submodules). If you follow a good trunk/branches/tags model within your externals, you can get away with this without too much trouble; you primarily point the external to the latest stable tag, but switch the pointer to trunk when active development is needed.
Like I said, it’s something you can get away with. That doesn’t mean there aren’t some challenges there. Primarily you can run into multiple people updating or needing to work on an external at the same time (and SVN doesn’t well support a branch-driven development model). You might be pulling in changes on each “svn up” that you don’t want.
With Git’s submodules, you can still bring in another codebase into your project but the mechanics of it are a bit different.
The submodule will be the entire Git repo
With SVN you can make your external point to a subdirectory of a project (this is how you’d choose trunk vs tags/1.0.2, etc.), with Git the submodule will always be the entire project. This means you’ll want to keep your code repositories lean and mean – you don’t want deep URL paths or a bunch of historical design documents and other things in there that would be considered cruft when the repo is used as a library for another project.
Because of this, we’re often maintaining two Git repos for each project. One has just the code, README, CHANGELOG, etc. while the other includes design docs, mockups, etc.
Submodules require extra steps when cloning
With a standard SVN checkout, all of your externals get populated and are ready to go right away. With Git submodules, an additional step is required after a git clone. From inside the newly cloned repo (at the top level), you have to initialize and update the submodules:
git submodule update --init --recursive
This step will get all if your submodules setup, pointing to the proper refs, etc.
UPDATE: or clone with
--recursive. (thanks Shawn)
Submodules require extra steps when committing
After initialization, your submodule will initially be in a “detached head” state. This means that even though the submodule is pointing to the correct ref/code revision, but it isn’t setup to update from or commit to a specific head (branch). If you’re not used to this (or are used to how SVN externals work), it’s easy to accidentally start editing code in the submodule code while you’re on a detached head. Recovering from this isn’t particularly hard (I’ll do a follow up post with details on this), it’s just another step in learning how Git wants you to work with submodules.
When I was doing active development on a project that had SVN externals, I’d often end up making changes to the externals as well as the current project. When I did this, I’d need to commit to the external separately from the parent project. Git submodules work in a somewhat similar fashion in that you need to commit the changes to the submodule first; but there are also some additional steps involved.
Like SVN externals, you need to
cd into the submodule to commit any changes you’ve made. When you’ve committed your changes to the submodule and have it in the state you want for its inclusion in the parent project, you then need to
cd back up to the parent project and commit the “change” of the state of the submodule. Make sense? Basically, you commit to the Git submodule separately just like you used to commit to your SVN external. The additional step is checking out a tag, branch or specific revision that you want the submodule to “stick” at; then committing that change to the parent repository.
Submodules don’t update on their own
Recapping the previous point: submodules are stuck to the revision/tag/branch they are set up on. You have to explicitly update them, then commit the “change” of having the submodule point to a different revision to the parent repository. This is useful for situations where multiple people are updating a project that is being used as a submodule by multiple parent projects. Changes to the submodule won’t automatically be pulled into a parent project that doesn’t expect it (they have to be explicitly pulled in, committed, etc.).
Once you’re out of active development, I consider it a best practice to make sure your submodules are pointed at a tag for the purposes of tagging the parent project. That way when bugs or issues are found they are logged against the proper version of the package in question. When questions about a submodule version arise (does it include feature X?), you can answer them pretty easily by checking the CHANGELOG.
Hopefully this helps you get your head around the differences between SVN externals and Git submodules. Overall I like the implementation of Git submodules better now that I understand what they are trying to do. We’ve found that this development approach compliments the core extensibility features of WordPress very nicely. In fact, many of our plugins interoperate by implementing hooks and filters in the same manner WordPress core does.
Does building modular code (WordPress or for other PHP projects) sound like something you’d like to be doing more of? We’re hiring. As previously noted here, our careers page is a little different, and we believe in a healthy work-life balance.
This post is part of the thread: Version Control – an ongoing story on this site. View the thread timeline for more context on this post.