[Development] gerrit : using branches (was: pointers to use it cleverly/efficiently?)

Welbourne Edward edward.welbourne at theqtcompany.com
Mon Apr 4 17:04:56 CEST 2016


I mentioned:
>> merge-base - if you have a strict tree, this isn't a merge, so it's
>> where one was branched off the other, but nothing about the merge-base
>> or any of its ancestors contains any hint as to which branch it was on
>> when it was committed.

René J.V. Bertin replied:
> In short, there's something like a table for each branch that tells
> what commits "belong" to that branch, but there's no way to obtain the
> branch from a given commit?

There is no table.  Each commit is an object in a datastore, named by
its sha1.  The commit object contains information about the commit -
IIRC, the sha1 names of:
  * a "tree" object representing that commit's checked-out tree
  * one or (for a merge) more "parent" objects
along with the times and user info of authorship and commit; and the
commit message.  The "blob" for a commit is a simple text file packaging
this information; the sha1 that names the commit is the sha1 of the text
of this blob.

The .git/refs/heads directory contains files whose names are branches
and whose contents are sha1 IDs of commits.  When git looks at history,
it reads a branch's head file to get the current tip's sha1; it reads
the object with that sha1 to discover what the commit is; if it needs to
know history, it looks at the parent sha1(s) and finds the commit
objects named there and in the objects it thus ends up opening as it
traverses the directed graph of ancestry.  There is no database, no
table; just a filesystem [*], in which objects name other objects,
forming a directed acyclic graph of mutual references.

[*] technically, the object store ends up being a virtual filesystem, as
not all of the blobs are typically held on disk as separate files; many
of them get compressed together in a "pack file"; but this is just the
implementation of a virtual file system.  Pedagogically, git thinks it
just saves blobs as files to disk under .git/objects/: in practice, it's
usually more efficient than that.

In particular, although the refs/heads/ directory names some objects as
tips of branches, *nothing* in the git object store knows *anything*
about branches.  It only knows about commits, trees and files.  (Kinda.
It actually also knows about notes, signed tags and some other fun
meta-data - but nothing about branches.)  If you look at a commit
object, that object has no knowledge of being on any branch; it only
knows who its parents are, what tree object it describes and the user
info and times of its authorship and creation (these may be separate,
especially after cherry-picking or rebasing).  The nearest a commit gets
to knowing it's on any branch at all is the fact that it hasn't been
garbage-collected yet, so it must be an ancestor of some commit named by
some branch or (under refs/tags/) tag.  A commit doesn't even know what
other commits have it as a parent.

(When merging, the default commit message mentions the branches being
merged; so you could plausibly get some heuristics out of that.  All the
same, I can checkout -b a temp-branch from each branch, merge these,
then merge --ff-only each of the original branches to the resulting
merge point; the commit message shall name my temp-branches, yet the
result is clearly the result of merging the two branches I now have
pointing at that merge-point.  In any case, after a merge, I can git
commit --amend to change its commit message from the default.)

>> suspects (5.5, 5.6, 5.7, dev) and see which one has the closest
>> ancestor as git merge-base; or I pipe git shortlog 5.6...$branch | wc
>> -l and

> That looks like something not really trivial to capture in a script;

Indeed: git actually doesn't believe it's a meaningful question to ask.
I can move a branch name around arbitrarily, pointing it at the sha1 of
any commit in my object store, without making *any* changes to the
object store; only the file under .git/refs/heads/ changes.  You are
better off looking at your reflogs for information about what branched
off from where.

A branch is just a name that I'm temporarily giving to one commit while
preparing another, to which I'll soon move that name.  I sometimes
inadvertently make a bunch of commits on my local 5.6 (which I normally
keep shadowing a pristine origin/5.6); I could branch off a side-branch
while I'm doing that; sooner or later, I'll notice my mistake and do the

$ git branch local-changes
$ git reset --hard origin/5.6
$ git checkout local-changes

that gets my 5.6 back to where it's meant to be and gives a name to the
changes I'm working on.  If I forked off my side-branch from this set of
changes, several commits clear of where my 5.6 parted company with
origin/5.6 but before the tip at which I renamed the local development
to local-changes, do you think I branched my side-branch off from 5.6
(which might never again include the commit at which the side-branch set
off) or from local-changes (which didn't exist when I created the
side-branch) ?  From git's point of view the only thing that it's
interesting to say is that there's a merge-base prior to which
local-changes and the side-branch share common history; and this
merge-base is more recent than their common merge-base with origin/5.6.

> you'd need to do `git merge-base $topic $branch` for all (remote)
> branches, and then check the returned commits against `git rev-list
> $topic` to see which comes first ... which might not even be correct
> under certain border cases.

I am quite sure that any heuristic for guessing inter-branch ancestry
relations from the ancestry digraph of the object store shall go wrong
in plenty of ways.  You might want to use git merge-base --fork-point
rather than the plain merge-base as raw material for those heuristics;
but it'll still be error-prone.

> Playing with it I get the impression that KDevelop does have a
> function to figure out parenthood; at least I now understand why it
> shows one of my 2 topic branches as a child of the other (both were
> created off the same commit).

If you care about "forked off from" relationships among branches, I
encourage you to use a naming scheme for your branches that lets you
keep track of this.

	Eddy.



More information about the Development mailing list