Git


3
Apr 12

When to do git merging or rebasing

This one pops up fairly often, and can indeed be quite confusing – when to use merge versus rebase in git? Here’s the answer.

The story so far …

In merge, you combine two divergent branches back into one. There’s also a special kind of merge called fast-forward, done when a branch being merged is just a continuation of the branch you’re merging into – so the new commits are just pasted on top of the target branch (ie. it is “fast-forwarded”).

In rebase, you change your branch (that’s being rebased) so it looks like it was branched off a new base, not the original one. This involves rewriting the commits, so you’ll end up with different commit IDs.

… and now the conclusion

So, when to use either? The rules are suprisingly simple:

Never rebase public branches

By “public branches” here I mean branches that other people might have checked out. Rebasing rewrites history, and anyone having branches that were checked out of the history you just unmade will be sad, angry or worse. That’s one reason you can’t just push rebased branch on GitHub (unless you force it and sacrifice a kitten). So just say no.

Rebasing private branches is perfectly fine, and in fact often done when squashing or rearranging commits, cleaning up a branch before going public with it, or just updating long-running feature branch (go easy on the last one, though).

Never merge upstream master into your master

By “merge” here I mean a recursive (non-fast-forward) merge, ie. the one that leaves tracks. And the rule is not only for master, but for any upstream tracked branch you may want to merge back into one day.

As a rule of thumb, fork master should always be in sync with upstream master (ie. only do fast forward merges/pulls), only having the commits that are in upstream already. Also, you should never have to merge master (local or remote) into a feature branch. If you need to update an existing feature branch, rebase it rather than merging into int.

Updating public feature branches

What to do if your public branch needs to be updated from upstream? You can’t rebase it since it’s public, and you can’t merge into it since you’ll want to merge it back some day.

Turns out it’s really simple: create a fresh one off of upstream, merge your branch into it, and continue working on a new one.

(This was originally a comment on the How To GitHub: A Complete Guide to Forking, Branching, Squashing and Pulls article Hacker News discussion. The article itself and the ensuing discussions are full of useful advice – if you haven’t already, go read them!).


1
Sep 11

On generic coding style

.. or, much ado about whitespace

Every programming language has at least one standard or preferred code style – most have several, and they’re an endless inspiration for bikeshedding.

I’m about to do just that.

Instead of talking about a specific language or community, I’ll look at some general guidelines that are applicable to every programming language. Of course, there are exceptions, but there has to be a good reason to make the exception.

These are just things I picked along the way and found that they’re useful in general. I’ll try to explain why I think each of the guidelines is important, but YMMV – code style is largely a subjective issue, and I’m not here to preach my way is in any way superior for you. It just works best for me.

Indentation: spaces versus tabs

Spaces and tabs make an ugly mix. In some languages (like Python), mixing them can easily lead to hard-to-find errors. In others, the effect is purely visual, but still ugly.

People have various tab length settings in their editors. Yes, I agree we should all standardize on tab width of 4 (or 8, or 2, or ..), but at this point I don’t think we’ll ever do that. So if you share (as in show or view) code with anyone not following the same style, it’s going to be a mess.

Spaces or tabs? When you’re mainly using tabs, there are still places where a space or two are going to be needed. You’re effectively going to mix them. So don’t, use only spaces.

Okay, that was easy. I also intentionally ignored the fact that some tools require mixing them – make, for example.

Line endings: Unix or Windows?

If you’re only working in and for Windows environment, and you’re not going to need to share code with developers on other operating systems, you can get by by using Windows-style line ending (CR-LF). In fact, it’s probably going to be easier than the alternative.

Otherwise, use Unix line ending (LF) everywhere. It’s the default on both Linux and OS X, and if your (web app) code is going to be served by some cloud server, it’s probably going to be Linux or some other variant of Unix as well.

Whatever you do, don’t ever mix the two. If you have a hybrid team, some people on Windows and some on OSX or Linux, standardize now.

Trailing whitespace

Trailing whitespace on each line can be trouble, because otherwise identical lines can actually be different. This can wreak havoc with your version control, since it’s easy to change them, without changing the actual content of the line, and the version control will hapilly treat them as code changes.

Fortunately, most editors have the option of automatically trimming the white space when saving a file. No reason not to use it.

Line width

This is a tough one. I try to have the content fit in 80 columns. It’s easier to read, as it avoids wrapping (easily mistaken for multiple lines) and horizontal scrolling (downright cumbersome).

This is easier for some languages than for others. I’ve pretty much given up on doing it in HTML. It’s just too verbose for 80 columns to be enough.

So, I try to follow the 80-column rule unless working around it would make the code uglier than if I just let it go beyond col 80. In those cases, I prefer wrapping instead of horizontal scroll.

Terminating newline

A file should end with a single newline. That means, no empty lines after the content, but also not having the last line of the code dangling without an ending newline.

This makes it easier on the editors, nicer when using cat or patch, and makes git not complain. Good enough for me.

File encoding

With everyone supporting it these days (and it being a default on a lot of systems), there’s just no reason not to use UTF-8 encoding. I try to stick to the ASCII part whenever possible, but when I do need it, UTF-8 is the only sane choice.

YMMV

I’m pretty opinionated with regards to these things, as I’m sure you are as well, dear reader. So please do share your preferred settings for the above in the comments (we’ll agree to disagree, thus avoiding any potential flamewars).

Also, if you have an interesting example where some of these MUST be broken (eg. in Makefile), please do share.


18
Jan 11

Using Git for web app deployment

Until recently, I’ve been manually deploying my web apps on the server, as well as manually testing whether everything seems to be okay. Unfortunately, that kind of setup basically ensures I’ll run into (or create) trouble sooner or later.

So when working on my last project (Encode), I thought about how to automate the deployment procedure, as well as integrate it with unit tests I’ve written. The idea is that by automating it, I’d be less anxious to update the production code, and conversely, I’d iterate faster. Since I use Git as source control for all my projects, I decided to set up the deployment using git.

My main repository is hosted on my VPS (I love GitHub and use it for my open source stuff, but it’s both easier and cheaper for me to host my private repos on my own server). I have a few clones on my laptops, and one clone on the production server.

The production server repo’s working directory actually contains the code that runs Encode. But it’s not tracking master branch, as that would mean I couldn’t keep my config settings in the repository (or, alternatively, I’d have to store those settings in master in my main repository, which I don’t want to do, as I’ve got separate configs for staging and production environments).

Instead, the production repo has a “production” local branch containing a few extra patches for configuration, that gets rebased on top of current master, every time I update it. Rebasing, if it goes wrong, will leave your working directory in a mess – this is where the unit tests come in handy.

You see, before rebasing the production repo, I check everything in a separate repository (also on production server, so the environment is the same). In this repository, I pull master branch from main repository, production branch from production repository, and then rebase production on top of master, and run the unit tests in the production branch.

If all that succeeds, I’m fairly confident I can go ahead and update the production code. So, I do the same pull and rebase on production. I tag the new HEAD with current time and date, and restart the services I need to restart.

What does this set up give me? Confidence that if I screwed up somewhere (either the code doesn’t work or production can’t be rebased), it’ll be caught in the test. Tagging gives me easy rollback to any “previous release” if at any point later I figure out something did go wrong after all (Edit: easy, but not painless, as I can’t just automatically “un-rebase”, so the previously tagged commit is destroyed by the new rebase – thanks for readers pointing this out). And having it automated and feeling safe, I can do quick code changes (eg. critical bugfixes) without worrying I’ll break everything in a hurry.

One thing I could do in the future is maybe use framework such as Fabric or Capistrano instead my own custom shell scripts. But what I like about this setup is that the deployment script is on the server, so I can use it from anywhere – I don’t have to have my development machine nearby to do the updates.