Debugging a Git clone issue

2022-11-12 on adnano.co

In January of 2022, an update to git.sr.ht added the ability to create new Git repositories by cloning an existing repository1. The feature takes a URL to clone from, initializes the repository, and completes the clone in the background. The feature was implemented with go-git, a Git implementation in Go.

Screenshot of the clone web interface

After a while, we started to get reports of users being unable to clone repositories which had been created using the new feature. Cloning these repositories with git clone would lead to strange errors. Oddly enough, these errors only appeared when cloning over HTTPS, not SSH.

$ git clone https://git.sr.ht/~user/goguma
Cloning into 'goguma'...
fatal: expected 'packfile'

Our first thought was that the cloned repositories produced by go-git were somehow invalid. We reported the issue on the go-git issue tracker, but the maintainers were not responsive. Meanwhile, we continued investigating internally.

As we investigated the problem we noticed that clones would start working again after a few days. This was deemed to be the result of git gc, which runs periodically for all Git repositories on git.sr.ht. We confirmed that git gc fixes the issue by running it manually on an affected repository.

Let's debug the issue. We'll start by cloning the repository again but with verbose output.

$ git clone -v https://git.sr.ht/~user/goguma
Cloning into 'goguma'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (452 bytes)
fatal: expected 'packfile'

Not very helpful. Let's increase the verbosity.

$ git clone -vv https://git.sr.ht/~user/goguma
Cloning into 'goguma'...
POST git-upload-pack (175 bytes)
want 49de801ae8ac0865e4fef50a311ba44b36a52250 (HEAD)
want 49de801ae8ac0865e4fef50a311ba44b36a52250 (refs/heads/master)
want 9768a49c170142b888c8980944303c2ba794a826 (refs/tags/v0.1.0)
want 7303c46eb27ac22b5de34fb8d867d82d7d06121f (refs/tags/v0.2.0)
want e7e6a1bf11431a37f45ff9cb1abd90bec9124b74 (refs/tags/v0.3.0)
want aa9980534db4bd25e2b78d360f7170e21ca01c21 (refs/tags/v0.4.0)
want 1638a79dcc58127a08f3d81732169b536f6f5546 (refs/tags/v0.4.1)
POST git-upload-pack (452 bytes)
fatal: expected 'packfile'

This is a little more helpful. We can also inspect the Git packets with GIT_TRACE_PACKET.

$ env GIT_TRACE_PACKET=1 git clone -v https://git.sr.ht/~user/goguma
Cloning into 'goguma'...
	packet:          git< version 2
	...
POST git-upload-pack (175 bytes)
	packet:          git> 0002
	packet:        clone< 49de801ae8ac0865e4fef50a311ba44b36a52250 HEAD symref-target:refs/heads/master
	...
POST git-upload-pack (467 bytes)
	packet:          git> 0002
	packet:        clone< 0002
fatal: expected 'packfile'

Compare this to the output for a successful clone:

$ env GIT_TRACE_PACKET=1 git clone -v https://git.sr.ht/~user/goguma
Cloning into 'goguma'...
	packet:          git< version 2
	...
POST git-upload-pack (175 bytes)
	packet:          git> 0002
	packet:        clone< 49de801ae8ac0865e4fef50a311ba44b36a52250 HEAD symref-target:refs/heads/master
	...
POST git-upload-pack (gzip 1117 to 597 bytes)
	packet:        clone< packfile
	packet:     sideband< PACK ...
	packet:     sideband< 0000
	packet:          git> 0002
	packet:        clone< 0002

Notice how the failed clone is completely missing the packfile packet. That's why the clone fails with fatal: expected 'packfile'. The odd thing is that the git-upload-pack endpoint returns a status code of 200 and there are no errors in the logs. It is failing silently.

In an attempt to reproduce the issue, I wrote a simple script which would clone a repository with go-git, serve the cloned repository with nginx and git-http-backend, and then clone it again with git clone. Surprisingly, the clone succeeded! I was unable to reproduce the issue this way.

I thought there must be something else at play. I obtained a tarball of an affected repository from production before and after git gc had run. I extracted the tarballs and investigated the repository, expecting to find something wrong. Except, nothing was obviously wrong.

I decided to try to reproduce the issue again. I spun up an Alpine Linux image in qemu and installed meta.sr.ht and git.sr.ht from packages. I edited the nginx configuration so that I could connect over HTTP. I forwarded port 80 from the guest to the host. I then created a user, logged in, and cloned a repository from the web interface. This time, I was able to reproduce the issue.

$ git clone http://git.sr.ht.local/~user/goguma
fatal: expected 'packfile'

Now I needed to determine the cause. I compared the nginx configuration to the one I had used previously. Eventually, I narrowed it down to one line: the fcgiwrap socket path.

fastcgi_pass unix:/run/fcgiwrap/fcgiwrap.sock;

In my previous attempt to reproduce the issue, I had created an fcgiwrap socket manually instead of relying on the socket created by OpenRC. Let's take a look at the fcgiwrap init script used by OpenRC at /etc/init.d/fcgiwrap:

$ cat /etc/init.d/fcgiwrap
#!/sbin/openrc-run

name="fcgiwrap"
description="fcgiwrap cgi daemon"

command="/usr/bin/fcgiwrap"
command_background="yes"
user="fcgiwrap"
group="www-data"
: ${socket:=unix:/run/fcgiwrap/fcgiwrap.sock}

...

As you can see, OpenRC will execute fcgiwrap as the user fcgiwrap and the group www-data. nginx is also in the www-data group, so it will have access to the fcgiwrap socket.

Perhaps the issue has to do with permissions. The Git repositories are stored in the /var/lib/git directory, which is owned by the user git. Let's run fcgiwrap as the user git and see what happens.

su git -c 'fcgiwrap -f -s unix:/tmp/fcgiwrap.sock' &
chgrp www-data /tmp/fcgiwrap.sock
chmod g+w /tmp/fcgiwrap.sock

The nginx configuration needs to be edited to point at our new fcgiwrap socket. Now we can try cloning again.

$ git clone http://git.sr.ht.local/~user/goguma
Cloning into 'goguma'...
remote: Enumerating objects: 4077, done.
remote: Total 4077 (delta 0), reused 0 (delta 0), pack-reused 4077
Receiving objects: 100% (4077/4077), 630.40 KiB | 2.49 MiB/s, done.
Resolving deltas: 100% (2970/2970), done.

This works! But why is this an issue in the first place? /var/lib/git should be accessible to other users. Let's take a look at the problematic repository.

$ cd /var/lib/git/~user/goguma
$ stat -c '%a %n' **/*
644 config
644 git-daemon-export-ok
644 HEAD
755 objects
755 objects/info
755 objects/pack
644 objects/pack/pack-1c673b53da2f0bfe8a3399cee03e82b17247a69a.idx
600 objects/pack/pack-1c673b53da2f0bfe8a3399cee03e82b17247a69a.pack
755 refs
755 refs/heads
644 refs/heads/master
755 refs/remotes
755 refs/remotes/origin
...

Compare this to the output after git gc.

$ git gc
$ stat -c '%a %n' **/*
644 config
644 git-daemon-export-ok
644 HEAD
755 info
644 info/refs
755 objects
755 objects/info
444 objects/info/commit-graph
644 objects/info/packs
755 objects/pack
444 objects/pack/pack-9f22cdfa7bd58ed88636b390b65937e0f7090e3f.bitmap
444 objects/pack/pack-9f22cdfa7bd58ed88636b390b65937e0f7090e3f.idx
444 objects/pack/pack-9f22cdfa7bd58ed88636b390b65937e0f7090e3f.pack
644 packed-refs
755 refs
755 refs/heads
755 refs/remotes
755 refs/tags

Notice how the permissions on the .pack file in objects/pack change from 600 to 444. To test if the 600 permissions are the source of the clone errors, we can try to change the permissions on a freshly cloned repository and see if the errors disappear.

$ chmod 0644 objects/pack/*.pack

This fixes the issue! We have now identified the cause. go-git sets the wrong permissions on packfiles, which means that git-http-backend will be unable to read them. This also explains why the issue could not be reproduced previously, since fcgiwrap was running as the same user that owned the git repository files.

To fix the issue, a patch for go-git is needed to create packfiles with the proper permissions. git-upload-pack should also be patched so that it errors out when this happens instead of failing silently. A clear error message from git-upload-pack would have made debugging this issue much easier.


  1. Note that cloning repositories within the same git.sr.ht instance was already implemented. This was a generalization of that feature to allow cloning external repositories as well. ↩︎