In my last post, I tried to install Mercurial, and it didn’t go well.

Solemn music plays over this black and white flashback:

$ nix-env -iA nixpkgs.mercurial
installing 'mercurial-5.6'
these paths will be fetched (3.45 MiB download, 14.95 MiB unpacked):
  /nix/store/7x5s4k57cw0a8nldypmm1y5f763k01kl-mercurial-5.6
copying path '/nix/store/7x5s4k57cw0a8nldypmm1y5f763k01kl-mercurial-5.6' from 'https://cache.nixos.org'...
Assertion failed: (size_ < capacity_), function push_back, file src/libexpr/attr-set.hh, line 54.
[1]    1968 abort      nix-env -iA nixpkgs.mercurial

So what if I… fixed it? What better way to learn “how Nix works” than to actually dive into the implementation? I figure I’ve got to learn something by doing this, even I never get further than “how to read the definition of the mercurial derivation.”

First up: let’s see if it’s already fixed in the latest Nix master. To do that, I’ll need to figure out how to build Nix from source. Wouldn’t it be funny if I just couldn’t? Like if the Nix install instructions were just configure; make; pray?1

Let’s see. I already cloned the Nix repo a few days ago. I fetch and fast-forward it.

It has a README.md, but it just links to the Nix “hacking guide." Sounds promising.

To build Nix for the current operating system/architecture use

$ nix-build

or if you have a flake-enabled nix:

$ nix build

I have no idea what a “flake-enabled nix” is, and it is not explained. I guess… let’s find out?

$ nix build
[19.2 MiB DL]

It seems to be stuck downloading things? Or now it’s building? No idea. For a few seconds that little prompt was animating. Now it’s just stuck. Oh, there it goes:

$ nix build
[0/24 built, 1/75/214 copied (128.9/1148.7 MiB), 40.9/205.2 MiB DL] fetching pcre-8.44 from https://cache.nixos.org

I wait a second. And then:

$ nix build
builder for '/nix/store/fcl2xslcvp9vinc910zfvirjgmdbcffb-nix-2.4pre19700101_52b6e0f.drv' failed with exit code 77; last 10 log lines:
  configure flags: --prefix=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f --bindir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/bin --sbindir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/sbin --includedir=/nix/store/byflz5mq5iafhxw26ak98wghvycn5zr7-nix-2.4pre19700101_52b6e0f-dev/include --oldincludedir=/nix/store/byflz5mq5iafhxw26ak98wghvycn5zr7-nix-2.4pre19700101_52b6e0f-dev/include --mandir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/share/man --infodir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/share/info --docdir=/nix/store/kcd4rkg6y1fi9xrdfirpk1nxqk91gi69-nix-2.4pre19700101_52b6e0f-doc/share/doc/nix --libdir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/lib --libexecdir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/libexec --localedir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/share/locale --sysconfdir=/etc
  checking for a sed that does not truncate output... /nix/store/mcz107ij67iidr86mm74sh98q193y7mq-gnused-4.8/bin/sed
  checking build system type... x86_64-apple-darwin20.3.0
  checking host system type... x86_64-apple-darwin20.3.0
  checking for the canonical Nix system name... x86_64-darwin
  checking for gcc... clang
  checking whether the C compiler works... no
  configure: error: in `/private/tmp/nix-build-nix-2.4pre19700101_52b6e0f.drv-0/source':
  configure: error: C compiler cannot create executables
  See `config.log' for more details
[0 built (1 failed), 214 copied (1148.6 MiB), 224.5 MiB DL]
error: build of '/nix/store/fcl2xslcvp9vinc910zfvirjgmdbcffb-nix-2.4pre19700101_52b6e0f.drv' failed

Hmmmmm. “C compiler cannot create executables.” That sounds… bad. Nix, didn’t you BYO C compiler? Why would you pick one that doesn’t work? The whole point of Nix is to make reproducible builds, isn’t it?

This is not a good start. This feels rather embarrassing for Nix.

It points me at config.log, but of course there is no such file. I suspect that it means /private/tmp/nix-build-nix-2.4pre19700101_52b6e0f.drv-0/source/config.log, but of course that whole directory has been deleted.

I sigh loudly.

$ nix build --help
Usage: nix build <FLAGS>... <INSTALLABLES>...

Summary: build a derivation or fetch a store path.

Flags:
      --arg <NAME> <EXPR>       argument to be passed to Nix functions
      --argstr <NAME> <STRING>  string-valued argument to be passed to Nix functions
      --dry-run                 show what this command would do without doing it
  -f, --file <FILE>             evaluate FILE rather than the default
  -I, --include <PATH>          add a path to the list of locations used to look up <...> file names
      --no-link                 do not create a symlink to the build result
  -o, --out-link <PATH>         path of the symlink to the build result

Examples:

  To build and run GNU Hello from NixOS 17.03:
  $ nix build -f channel:nixos-17.03 hello; ./result/bin/hello

  To build the build.x86_64-linux attribute from release.nix:
  $ nix build -f release.nix build.x86_64-linux

Note: this program is EXPERIMENTAL and subject to change.

There’s nothing here about “keep the build directory around on failure please.”

So maybe I don’t have a “flake-enabled nix?” Maybe that’s why it failed? I would certainly expect a very different looking failure if that were the case. But sure. Let’s try nix-build, because at least it has a --keep-failed:

$ nix-build --keep-failed
these derivations will be built:
  /nix/store/fcl2xslcvp9vinc910zfvirjgmdbcffb-nix-2.4pre19700101_52b6e0f.drv
building '/nix/store/fcl2xslcvp9vinc910zfvirjgmdbcffb-nix-2.4pre19700101_52b6e0f.drv'...
unpacking sources
unpacking source archive /nix/store/ifh6yng48bq3rd8hkib71gvv1prnd313-source
source root is source
patching sources
autoreconfPhase
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force
autoreconf: configure.ac: tracing
autoreconf: configure.ac: not using Libtool
autoreconf: running: /nix/store/x9zbnxr523rnhgrmwq1qsnx56sh9mmhj-autoconf-2.69/bin/autoconf --force
autoreconf: running: /nix/store/x9zbnxr523rnhgrmwq1qsnx56sh9mmhj-autoconf-2.69/bin/autoheader --force
autoreconf: configure.ac: not using Automake
autoreconf: Leaving directory `.'
configuring
configure flags: --prefix=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f --bindir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/bin --sbindir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/sbin --includedir=/nix/store/byflz5mq5iafhxw26ak98wghvycn5zr7-nix-2.4pre19700101_52b6e0f-dev/include --oldincludedir=/nix/store/byflz5mq5iafhxw26ak98wghvycn5zr7-nix-2.4pre19700101_52b6e0f-dev/include --mandir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/share/man --infodir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/share/info --docdir=/nix/store/kcd4rkg6y1fi9xrdfirpk1nxqk91gi69-nix-2.4pre19700101_52b6e0f-doc/share/doc/nix --libdir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/lib --libexecdir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/libexec --localedir=/nix/store/kkr83n49yici75b3h01aqmmr6wklph0d-nix-2.4pre19700101_52b6e0f/share/locale --sysconfdir=/etc
checking for a sed that does not truncate output... /nix/store/mcz107ij67iidr86mm74sh98q193y7mq-gnused-4.8/bin/sed
checking build system type... x86_64-apple-darwin20.3.0
checking host system type... x86_64-apple-darwin20.3.0
checking for the canonical Nix system name... x86_64-darwin
checking for gcc... clang
checking whether the C compiler works... no
configure: error: in `/private/tmp/nix-build-nix-2.4pre19700101_52b6e0f.drv-0/source':
configure: error: C compiler cannot create executables
See `config.log' for more details
note: keeping build directory '/private/tmp/nix-build-nix-2.4pre19700101_52b6e0f.drv-0'
builder for '/nix/store/fcl2xslcvp9vinc910zfvirjgmdbcffb-nix-2.4pre19700101_52b6e0f.drv' failed with exit code 77
error: build of '/nix/store/fcl2xslcvp9vinc910zfvirjgmdbcffb-nix-2.4pre19700101_52b6e0f.drv' failed

Alllright. Well, at least I have the config.log now. But there’s not a whole lot more information in it.

configure:3066: checking whether the C compiler works
configure:3088: clang    conftest.c  >&5
ld: file not found: /usr/lib/system/libcache.dylib for architecture x86_64
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
configure:3092: $? = 1
configure:3130: result: no
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "nix"
| #define PACKAGE_TARNAME "nix"
| #define PACKAGE_VERSION "2.4pre19700101_52b6e0f"
| #define PACKAGE_STRING "nix 2.4pre19700101_52b6e0f"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define SYSTEM "x86_64-darwin"
| /* end confdefs.h.  */
|
| int
| main ()
| {
|
|   ;
|   return 0;
| }
configure:3135: error: in `/private/tmp/nix-build-nix-2.4pre19700101_52b6e0f.drv-0/source':
configure:3137: error: C compiler cannot create executables
See `config.log' for more details

So it looks like it’s building a trivial little executable with clang, which fails with an error from ld.

Sigh. Let’s see what we can make of this.

$ echo 'int main () { return 0; }' >test.c

$ clang test.c

$ ./a.out

$ echo $?
0

Sure seems fine to me.

But obviously that’s my system clang – not clang-7 or whatever the Nix version of clang is. Let’s see if I can make that work:

$ nix-shell -p clang
(downloading a million packages)

[nix-shell:~/scratch]$ clang test.c

[nix-shell:~/scratch]$ ./a.out

[nix-shell:~/scratch]$ echo $?
0

Hrmmmm. Ooooookay. So I have no idea what Nix is doing wrong here.

Let’s see: I remember that nix-build tries to build whatever the default.nix file is when you run it without arguments. Let’s see what that is?

$ cat default.nix
(import (fetchTarball https://github.com/edolstra/flake-compat/archive/master.tar.gz) {
  src = ./.;
}).defaultNix

Hmmmmm. Very weird. We’re downloading something called flake-compat from GitHub?

I pull up that repo and see that it’s just a single file, default.nix:

curl -s https://raw.githubusercontent.com/edolstra/flake-compat/master/default.nix | head
# Compatibility function to allow flakes to be used by
# non-flake-enabled Nix versions. Given a source tree containing a
# 'flake.nix' and 'flake.lock' file, it fetches the flake inputs and
# calls the flake's 'outputs' function. It then returns an attrset
# containing 'defaultNix' (to be used in 'default.nix'), 'shellNix'
# (to be used in 'shell.nix').

{ src, system ? builtins.currentSystem or "unknown-system" }:

let

So maybe I need to know what “flakes” are. I cannot find any reference to them in the man pages. I ⌘F through the Nixpkgs manual to see if this is explained there and I just haven’t gotten to it yet. Nothin'.

I finally give in and google “nix flakes.”

The first result is the NixOS wiki entry on flakes. It’s very interesting.

Nix Flakes are an upcoming feature of the Nix package manager.

Okay. I have no idea when this was written, or if it is still upcoming, or how often the NixOS Wiki is updated.

Flakes allow you to specify your code’s dependencies (e.g. remote Git repositories) in a declarative way, simply by listing them inside a flake.nix file:

{
  inputs = {
    home-manager.url = "github:nix-community/home-manager";
  };
}

Each dependency gets then pinned, that is: its commit hash gets automatically stored into a file - named flake.lock - making it easy to, say, upgrade it:

$ nix flake update --update-input home-manager

(if you’re familiar with modern packages managers like cargo or npm, then the overall mechanism shouldn’t surprise you - Nix works in a similar way, although without a centralized repository.)

Flakes replace the nix-channels command and things like ad-hoc invocations of builtins.fetchgit - no more worrying about keeping your channels in sync, no more worrying about forgetting about a dependency deep down in your tree: everything’s at hand right inside flake.lock.

Neat! Okay. I think this answers the question I had about “how do I tell if I have a flake-enabled Nix:” I would presumably have a nix flake subcommand. And I do not. So.

I don’t understand how this replaces the “nix-channels command.” I assume that means the nix-channel executable? And I assume they only mean for some small use case? I don’t know. I don’t see how this would prevent me from needing to nix-channel --update to prime myself for those sweet, sweet upgrades. But I resolve not to worry about it for now.

Wouldn’t it be funny if you needed to use some a cutting edge Nix feature in order to build the latest Nix? I don’t think that’s what’s going on: the nature of the failure makes me think this has nothing to do with whether or not I have flakes.

I just wish I had some idea what it does have to do with.

Man, at the beginning of this I thought it would be really funny if I had trouble building Nix with Nix. And now… it seems less funny. It seems more tragic.

I had initially assumed that this was some sort of macOS codesigning or SIP nonsense thing. But the nature of the failure makes me think not.

Maybe I should read the rest of the “hacking guide,” and it will explain this.

On a whim I try:

$ nix-build -A defaultPackage.x86_64-darwin

Just in case it’s inferring the wrong platform? Nope. Didn’t work. Same error. Kind of a shot in the dark anyway.

Well, the last thing the hacking guide offers is nix-shell. Let’s give it a shot.

$ nix-shell
(installing things)

[nix-shell:~/src/nix]$ ./bootstrap.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force
autoreconf: configure.ac: tracing
autoreconf: configure.ac: not using Libtool
autoreconf: running: /nix/store/x9zbnxr523rnhgrmwq1qsnx56sh9mmhj-autoconf-2.69/bin/autoconf --force
autoreconf: running: /nix/store/x9zbnxr523rnhgrmwq1qsnx56sh9mmhj-autoconf-2.69/bin/autoheader --force
autoreconf: configure.ac: not using Automake
autoreconf: Leaving directory `.'

[nix-shell:~/src/nix]$ ./configure $configureFlags --prefix=$(pwd)/outputs/out
checking for a sed that does not truncate output... /nix/store/mcz107ij67iidr86mm74sh98q193y7mq-gnused-4.8/bin/sed
checking build system type... x86_64-apple-darwin20.3.0
checking host system type... x86_64-apple-darwin20.3.0
checking for the canonical Nix system name... x86_64-darwin
checking for gcc... clang
checking whether the C compiler works... no
configure: error: in `/Users/ian/src/nix':
configure: error: C compiler cannot create executables
See `config.log' for more details

I mean, it would have been a lot more upsetting to me if that had worked. At least it’s consistent.

[nix-shell:~/src/nix]$ which clang
/nix/store/3mrwg8hplg82w3c4q2c2whkjq4wfly25-clang-wrapper-7.1.0/bin/clang

Alright. And just to make sure that it’s not the configure script…

[nix-shell:~/src/nix]$ clang ~/scratch/test.c
ld: file not found: /usr/lib/system/libcache.dylib for architecture x86_64
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)

So whatever this clang-wrapper nonsense is… doesn’t actually work. We know clang works! So what is that package? And why is it looking in /usr/lib/system, instead of some path in /nix/store? Isn’t the whole point of Nix to fix this problem?

I am at this point rather frustrated, but simultaneously a little bemused. I’m trying to fix a bug in a tool that’s supposed to give me reproducible builds by isolating the build environment. And I can’t build the tool itself because its build environment is apparently not isolated. It’s funny, no?

It’s a little bit funny. And a little bit frustrating.

So let’s see if we can’t figure this out.

$ nix-env -qa clang-wrapper --description
clang-wrapper-10.0.1  A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-11.0.1  A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-5.0.2   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-6.0.1   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-7.1.0   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-8.0.1   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)
clang-wrapper-9.0.1   A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)

Come on.

Alright, well, things are not… good right now. I really didn’t expect that it would take me this long just to build Nix, and it is now very late. I will have to return to this tomorrow, and see if I can get it working then.

Tomorrow and tomorrow and tomorrow:

$ nix-shell -p clang-wrapper
error: undefined variable 'clang-wrapper' at (string):1:94
(use '--show-trace' to show detailed location information)

$ nix-shell -p clang-wrapper-7.1.0
error: undefined variable 'clang-wrapper-7' at (string):1:94
(use '--show-trace' to show detailed location information)

Wha? Why doesn’t this package work with nix-shell?

$ nix-env -qa clang-wrapper-7.1.0 --json

30 seconds later…

{
  "nixpkgs.cc": {
    "name": "clang-wrapper-7.1.0",
    "pname": "clang-wrapper",
    "version": "7.1.0",
    "system": "x86_64-darwin",
    "meta": {
      "available": true,
      "broken": false, /* ARE YOU SURE */
      "description": "A c, c++, objective-c, and objective-c++ frontend for the llvm compiler (wrapper script)",
      "homepage": "https://llvm.org/",
      "insecure": false,
      "license": {
        "fullName": "University of Illinois/NCSA Open Source License",
        "shortName": "ncsa",
        "spdxId": "NCSA",
        "url": "https://spdx.org/licenses/NCSA.html"
      },
      "name": "clang-7.1.0",
      "outputsToInstall": [
        "out"
      ],
      "platforms": [ /* ... */ ],
      "position": "/nix/store/v43dzqk60bmd4rpksq5ix9gibcj18as9-nixpkgs-21.05pre273941.a2b0ea6865b/nixpkgs/pkgs/build-support/cc-wrapper/default.nix:497",
      "priority": 10,
      "unfree": false,
      "unsupported": false
    }
  },
  "nixpkgs.clang_7": {
    "name": "clang-wrapper-7.1.0",
    /* ... */
  },
  "nixpkgs.llvmPackages.libstdcxxClang": {
    "name": "clang-wrapper-7.1.0",
    /* ... */
  },
  "nixpkgs.llvmPackages.lldClang": {
    "name": "clang-wrapper-7.1.0",
    /* ... */
    }
  },
  "nixpkgs.llvmPackages.lldClangNoCompilerRt": {
    "name": "clang-wrapper-7.1.0",
    /* ... */
    }
  },
  "nixpkgs.llvmPackages.lldClangNoLibc": {
    "name": "clang-wrapper-7.1.0",
    /* ... */
  },
  "nixpkgs.llvmPackages.lldClangNoLibcxx": {
    "name": "clang-wrapper-7.1.0", 
    /* ... */
    }
  }
}

Okay. So I suspect from this that the Nix derivation function takes nixpkgs.cc as an input, not clang, and that’s why we’re getting this weird wrapper script. Let’s see:

$ nix-shell -p cc

[nix-shell:~/src/nix]$ clang ~/scratch/test.c

[nix-shell:~/src/nix]$ echo $?
0

Nope! Wrong again. I wonder if -p cc does not mean the same thing as nixpkgs.cc.

I check the man page:

--packages / -p packages...
   Set up an environment in which the specified packages are present. The command line arguments are interpreted as attribute names
   inside the Nix Packages collection. Thus, nix-shell -p libjpeg openjdk will start a shell in which the packages denoted by the
   attribute names libjpeg and openjdk are present.

Okay. It does. So: what is going on here? This doesn’t seem to be a fundamental problem with Nix’s cc itself. Hmmmm, but maybe…?

[nix-shell:~/src/nix]$ which clang
/nix/store/8i0yaj3r82kqcdbr453wzzijnszxg4gx-clang-wrapper-7.1.0/bin/clang

That’s not the same clang-wrapper. The hash is different. So either I picked the wrong clang-wrapper and it’s not coming from cc, or something else is thefoot.

I’m definitely not going to check every single clang-wrapper-7.1.0. And I suspect at this point that there is actually a problem with “Nix flakes.” Like it’s somehow pinning my cc to a broken version, or something?

Let’s see:

cat flake.lock
{
  "nodes": {
    "nixpkgs": {
      "locked": {
        "lastModified": 1614309161,
        "narHash": "sha256-93kRxDPyEW9QIpxU71kCaV1r+hgOgP6/aVgC7vvO8IU=",
        "owner": "NixOS",
        "repo": "nixpkgs",
        "rev": "0e499fde7af3c28d63e9b13636716b86c3162b93",
        "type": "github"
      },
      "original": {
        "id": "nixpkgs",
        "ref": "nixos-20.09-small",
        "type": "indirect"
      }
    },
    "root": {
      "inputs": {
        "nixpkgs": "nixpkgs"
      }
    }
  },
  "root": "root",
  "version": 7
}

Alright. That gives me a rev, which I think is all I need to check this out.

$ cd ~/src/nixpkgs

$ git worktree add ../old-nixpkgs 0e499fde7af3c28d63e9b13636716b86c3162b93
Preparing worktree (detached HEAD 0e499fde7af)
Updating files: 100% (22713/22713), done.
HEAD is now at 0e499fde7af Merge pull request #114354 from aanderse/fix/flightgear

$ nix-shell --file ~/src/old-nixpkgs -p cc
error: unrecognised flag '--file'
Try 'nix-shell --help' for more information.

Exasperated sigh. nix-shell is not in the random subset of commands that uses ~/.nix-defexpr and has the --file flag. But -p, and I am quoting here, said it expects “attribute names inside the Nix Packages collection.” Well, what is the “Nix Packages collection,” then? I realize that I have no idea how nix-shell -p works.

man nix-shell eventually explains:

The -p flag looks up Nixpkgs in the Nix search path. You can override it by passing -I or setting NIX_PATH. For example, the following gives you a shell containing the Pan package from a specific revision of Nixpkgs:

$ nix-shell -p pan -I nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/8a3eea054838b55aca962c3fbde9c83c102b8bf2.tar.gz

[nix-shell:~]$ pan --version
Pan 0.139

Sigh okay that’s fancy but I kind of already have my own thing going on here with git worktree.

It’s very weird that the man page says it looks up “Nixpkgs.” Doesn’t it look up literally the string "nixpkgs"? I take great exception with capitalizing that there. Is it case-insensitive? That’s very confusing.

But okay:

$ nix-shell -p cc -I nixpkgs=$HOME/src/old-nixpkgs
(downloading things, as always...)

[nix-shell:~/src/nix]$ which cc
/nix/store/3mrwg8hplg82w3c4q2c2whkjq4wfly25-clang-wrapper-7.1.0/bin/cc

OKAY. Moment of truth:

[nix-shell:~/src/nixpkgs]$ cc ~/scratch/test.c
ld: file not found: /usr/lib/system/libcache.dylib for architecture x86_64
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)

Alright. Now we’re cooking.

Well, okay. So this is hilarious, right? The “flakes” feature exist to pin us to specific revisions of “channels” to ensure that we get a deterministic result independently of when users have last updated their Nixpkgs. And Nix uses this to pin itself to a broken C compiler.

So.

Cool.

Obviously I have many ways forward here. I know that there is a working clang on the latest Nixpkgs – or rather, on whatever the unstable Nixpkgs channel happened to be the last time I pulled. Which was probably a day or so ago.

But the right thing to do here is to report this as a bug. I expect that there are between zero and zero Nix maintainers who actually develop on macOS, and while this will probably be caught and fixed the next time they try to cut a release, that could be years from now.

But first: time has passed. I first encountered this bug on a Saturday night. It is now Monday night. It’s possible someone already fixed it. I should pull and see.

$ git fetch
remote: Enumerating objects: 74, done.
remote: Counting objects: 100% (74/74), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 74 (delta 49), reused 59 (delta 45), pack-reused 0
Unpacking objects: 100% (74/74), done.
From github.com:NixOS/nix
   52b6e0f83..1c0e3e453  master             -> origin/master
   3cdd46421..2a19287b8  2.3-maintenance    -> origin/2.3-maintenance
 * [new branch]          ca/sign-drvoutputs -> origin/ca/sign-drvoutputs

$ git ff
Updating 52b6e0f83..1c0e3e453
Fast-forward
 doc/manual/src/command-ref/opt-common.md | 9 ---------
 src/libcmd/command.hh                    | 2 ++
 src/libcmd/installables.cc               | 6 ++++++
 src/nix-build/nix-build.cc               | 1 +
 4 files changed, 9 insertions(+), 9 deletions(-)

Nope. No change to the flake.lock. Ugh. That means I have to talk to actual people and risk looking dumb on The Internet. Social anxiety sets in, and I feel a familiar doubt surface: what if I’m the problem here? What if it’s something weird about my computer? What if there’s already a report of this and I end up filing a duplicate that gets closed immediately? I search every term I can think of.

What if Linux people make fun of me for using a Mac?

Good thing I’m so brave that none of these thoughts have ever prevented me from reporting a bug to an open source project before.

I bite the bullet:

https://github.com/NixOS/nix/issues/4621

In the course of writing that report I learned that the nix-shell example in the man page:

$ nix-shell -p pan -I nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/8a3eea054838b55aca962c3fbde9c83c102b8bf2.tar.gz

Doesn’t work for arbitrary revisions, only for particular blessed ones that have an archive entry. But GitHub seems to expose arbirary revisions through the API:

$ nix-shell -p cc -I nixpkgs=https://api.github.com/repos/NixOS/nixpkgs/tarball/0e499fde7af3c28d63e9b13636716b86c3162b93

And that works great. Neat.

Anyway we did the thing, sort of.

As you might have seen in the issue I filed, I also tried just updating the flake to a recent Nixpkgs and trying again, and at first that appeared to work – it certainly got through configure, and managed to compile everything – but it eventually failed during linking:

  LD     /nix/store/2sdjhgn1aywr03sgdqw37h6g4lgsx3iz-nix-2.4pre19700101_dirty/lib/libnixmain.dylib
ld: file not found: /System/Library/Frameworks/Security.framework/Versions/A/Security for architecture x86_64
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [mk/lib.mk:100: /nix/store/2sdjhgn1aywr03sgdqw37h6g4lgsx3iz-nix-2.4pre19700101_dirty/lib/libnixmain.dylib] Error 1
builder for '/nix/store/sn9lynk5yznkh7sxijgfi6gzvakhr0dn-nix-2.4pre19700101_dirty.drv' failed with exit code 2
error: build of '/nix/store/sn9lynk5yznkh7sxijgfi6gzvakhr0dn-nix-2.4pre19700101_dirty.drv' failed

That… hmm. I don’t know what that is or what that is about. Is that something weird with my computer? I don’t know. That feels… less like an obvious problem with Nix and more like a possible problem with my weird ancient computer.

I mean, yes, it is still the case that Nix is supposed to make builds work independent of my system state and save me from having to figure out how my system dylibs are supposed to whatever. But I am sympathetic that, you know, it’s macOS. It’s not exactly designed to run software that I didn’t pay money for.

I think I’ll have to wait for someone with more Nix knowledge to chime in on the issue before I can hope to fix this bug. Or rather… to check whether the bug has already been fixed on latest master.

Hmm.

Something occurs to me.

I had assumed that the mercurial derivation was only broken on macOS, because I am on macOS, and I expect macOS to be a second class citizen in the Nix ecosystem.

But… maybe it’s not? If this problem exists on Linux as well, then I could compile Nix on my remote NixOS box and try to fix the issue there. So… let’s check it?

claudius $ nix-env -iA nixpkgs.mercurial
installing 'mercurial-5.6'
these paths will be fetched (15.30 MiB download, 79.47 MiB unpacked):
  /nix/store/1amg8fs88bj0ac06hbs1fbqf22c9rak5-readline-6.3p08
  /nix/store/cdz5vbnfp9vq84ir414cgnvzq63wp9m6-gdbm-1.19
  /nix/store/f7jzmxq9bpbxsg69cszx56mw14n115n5-bash-4.4-p23
  /nix/store/r8iamjpyzwy154g0dvr397gcls2crm3w-mercurial-5.6
  /nix/store/yl69v76azrz4daiqksrhb8nnmdiqdjg9-python3-3.8.8
copying path '/nix/store/f7jzmxq9bpbxsg69cszx56mw14n115n5-bash-4.4-p23' from 'https://cache.nixos.org'...
copying path '/nix/store/cdz5vbnfp9vq84ir414cgnvzq63wp9m6-gdbm-1.19' from 'https://cache.nixos.org'...
copying path '/nix/store/1amg8fs88bj0ac06hbs1fbqf22c9rak5-readline-6.3p08' from 'https://cache.nixos.org'...
copying path '/nix/store/yl69v76azrz4daiqksrhb8nnmdiqdjg9-python3-3.8.8' from 'https://cache.nixos.org'...
copying path '/nix/store/r8iamjpyzwy154g0dvr397gcls2crm3w-mercurial-5.6' from 'https://cache.nixos.org'...
nix-env: src/libexpr/attr-set.hh:54: void nix::Bindings::push_back(const nix::Attr&): Assertion `size_ < capacity_' failed.
[1]    19549 abort (core dumped)  nix-env -iA nixpkgs.mercurial

Okay! So even claudius can’t install Mercurial. It’s not just a macOS issue! Excellent. That means we can debug it.

Assuming… well, you know. Surely. Surely we an compile Nix here, of all places. Right? Right??

I am going to say this and desperately hope that it does not become an ironic epitaph to this blog series: this has to work. If we can’t build Nix on (latest stable!) NixOS, we may as well go home.

So let’s see.

claudius $ nix-build

And it’s building it! So far so good.

The build has been running for about an hour now without failure; I’ll come back tomorrow to see if finishes.

Oh! There it goes. Okay. Build succeeded!

And let’s see if the bug still exists on master

claudius $ result/bin/nix-build -A nixpkgs.mercurial -o mercurial
error: attribute 'nixpkgs' in selection path 'nixpkgs.mercurial' not found

Er, right. nix-build works completely differently from nix-env. Of course.

claudius $ result/bin/nix-build '<nixpkgs>' -A mercurial -o mercurial
/nix/store/r8iamjpyzwy154g0dvr397gcls2crm3w-mercurial-5.6

And it worked! That’s reassuring, at least.

$ mercurial/bin/hg --version
Mercurial Distributed SCM (version 5.6)
(see https://mercurial-scm.org for more information)

Copyright (C) 2005-2020 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Okay. Whatever this bug was, it has already been fixed in master. I do not need to dust off my C++ helmet after all.

Which I really did expect. It would be super weird if a package slipped into Nixpkgs that could not be built by hand of man nor beast. I assume the maintainer of the Mercurial package is just using some bleeding edge version of Nix, and thus didn’t notice this problem. I check the package meta and see that the nixpkgs.mercurial maintainer is edolstra – the inventor of Nix – so yeah their running a freshly honed and bloody Nix seems pretty likely.

This makes me wonder about the CI situation here! I would expect, you know, that Nix would test that derivations can be built with the latest stable Nix before merging changes into Nixpkgs. I guess that that’s not the case? Troubling.2

Since the bug has since been fixed, I wonder if I can find the commit that does so and at least learn a little bit from all this.

On a whim I search the repo for push_back on the off-chance that someone included the exact assertion error in their commit message. I find this commit which references the same error message:

https://github.com/NixOS/nix/commit/8dbd57a6a5fe497bde9e647a3249c1ce0ea121ab

But that doesn’t seem to have anything to do with this – just an error when passing arguments via the command line. It’s the same assertion, but it’s coming from what I imagine is pretty common code path: adding an attribute to a set.

Hmm. I search for assertion and find no likely culprits. I suspect this was fixed as part of a larger change and not a specific targeted patch for this one issue.

Well, I am very curious what the bug is now, even though it seems that I don’t have to fix it. If I can understand why the Mercurial derivation is broken, I could at least fix the derivation in Nixpkgs. Right?

git bisect is my first thought here. It’s a great tool when you’re working with an unfamiliar codebase, and you don’t have any intuition for where a bug might hide or when it might have been introduced.

Rebuilding every revision is going to take a really long time, but that’s not really a problem: I just need to remember how to get an auto-bisect working right. I usually just bisect by hand and feel guilty about all the time I’m wasting at work, because my past experiences have taught me that auto-bisecting is always more frustrating than it’s worth – coming up with a single script that runs the test you want at any point in time is… often hard.

But let’s figure it out! This can be the thing we learn today.

Er, by “today” I mean “in this blog post that I have been writing for the past six days.”

I read the man page for git-bisect and learn that you can use custom terms instead of bad and good. Neat! Usually I am bisecting to find the commit that introduced a bug, but now I’m trying to find a commit that fixes a bug, and using good/bad to mean buggy/fixed would be very confusing to me. I am delighted to learn that I can use git bisect terms if I need a reminder of the terms I decided to use. Very cute.

I learn that I need to exit 125 to skip a commit – which I’ll need to do if I encounter a commit that I cannot build. And I need to exit 0 to indicate an “old” commit – aka to indicate that the assertion failure is there. And I need to exit nonzero to indicate a “new” commit, aka one where the assertion failure has been fixed.

So a little bit of weird inversion, but nothing crazy. Could just be ! result/bin/nix-build, but I opt for something a little more explicit:

claudius $ cat ~/scratch/nix-detective
#!/usr/bin/env bash

set -euo pipefail

nix-build || exit 125

if result/bin/nix-build '<nixpkgs>' -A mercurial -o mercurial; then
  # the assertion failure has been fixed
  exit 1
else
  # the assertion failure is still here
  exit 0
fi

Let’s see if this works? It seems like there is almost no chance that I got this right on my first try. Let’s look up the git tag… just 2.3.10. Okay. Let’s go.

claudius $ git bisect start --term-old assertion-failed --term-new mercurial-is-fine

claudius $ git bisect mercurial-is-fine

claudius $ git bisect assertion-failed 2.3.10
Bisecting: a merge base must be tested
[b774845af7a645b44bff69cf9f655c47fe4b9fb2] Set release date

What’s that? I don’t think I’ve ever seen that message before. I google it, and find this helpful StackOverflow answer.

This will happen if the given good and bad revision are not direct descendants of each other.

Hmm. Is master not a descendant of 2.3.10? That’s kind of surprising. But not really – I would expect this to happen if you cut a new stable release with some backported patches or something. Everything is fine. Let’s run the script.

I’m so nervous.

claudius $ git bisect run ~/scratch/nix-detective
running /home/ian/scratch/nix-detective
error: getting status of '/home/ian/src/nix/default.nix': No such file or directory
warning: the merge base between 1c0e3e453d41b869e4ac7e25dc1c00c349a7c411 and [8803753666023882515404177b08f3f8bdad52a0] must be skipped.
So we cannot be sure the first mercurial-is-fine commit is between b774845af7a645b44bff69cf9f655c47fe4b9fb2 and 1c0e3e453d41b869e4ac7e25dc1c00c349a7c411.
We continue anyway.
Bisecting: 1611 revisions left to test after this (roughly 11 steps)

Huh. Weird. But true! Inspecting that commit shows there is no default.nix. I guess it’s so old that it… no; it’s only from September 2019. What year is it again? I have no idea.

Anyway, the script is running. I’ll come back in a few hours and see how it does. I briefly think about what my life was like before I started using tmux for everything all the time. Man. I should write a blog post about that.

I hope I don’t run out of disk space doing this. It sure seems like it’s downloading a lot of versions of different packages as it goes. I guess I’m going through a different flake rev every time I change commits. Should’ve had my script collect garbage in between runs.

Can I trigger a GC while this bisect is running? I think that it should work just fine. But I am nervous to do so. With only 4.4 jeebies left on my VPS, I’m not sure I’ll have much choice soon. Do you think Nix registers GC roots before it starts building a package? Probably, right? Let’s find out.

claudius $ nix-collect-garbage -d

Well it didn’t fail immediately. And yeah, it seems like it’s carrying on just fine. Good! I expected that to be okay but, you know, I wasn’t totally sure. But it occurs to me now that on a multi-user system it would be completely insane if garbage collection could break builds for other users, so of course it sets up the GC roots ahead of time.

Sigh. I come back some six hours later to find:

Bisecting: 497 revisions left to test after this (roughly 9 steps)
[5dafde28dbec379678da6d033cdc5c48856babf5] BinaryCacheStore: Add index-debug-info option
running /home/ian/scratch/nix-detective
error: getting status of '/home/ian/src/nix/default.nix': No such file or directory
There are only 'skip'ped commits left to test.
The first mercurial-is-fine commit could be any of:
497 INDIVIDUAL REVISIONS LISTED
We cannot bisect more!
bisect run cannot continue any more

Boo. I guess I should have paid more attention to that failure before.

And then… change the script to conditionally build… hrm. Hrm.

Startup idea: a single ./build executable shell command in the root of every single repository so that you can write bisect scripts that work regardless of how often you change your build invocation. Yes, of course I know that that’s the least of my problems shut up. Maybe a high level wrapper around bisect that like… interactively learns different ways to build/test revisions across history that progressively “learns” how the invocations change over time. Sort of like… sort of like running a normal interactive bisect. Except that in the happy case you can just leave it sitting for a few hours. This is nothing.

Let’s see. There’s actually a section in the manual that tells me how to build Nix. I assumed that this only applied to some ancient outdated version, since the Wiki taught me I could just run nix-build bare and that worked just fine. But it seems that this command applies in these old revisions that lack a default.nix:

$ nix-build release.nix -A build.x86_64-linux

I test this and it seems okay, although I don’t wait to see if the build actually completes. It leaves me with this slightly more complicated script:

cat ~/scratch/nix-detective
#!/usr/bin/env bash

set -euo pipefail

if [[ -e default.nix ]]; then
  nix-build || exit 125
else
  nix-build release.nix -A build.x86_64-linux || exit 125
fi

if result/bin/nix-build '<nixpkgs>' -A mercurial -o mercurial; then
  # the assertion failure has been fixed
  exit 1
else
  # the assertion failure is still here
  exit 0
fi

No, I’m not gonna scope || over that entire if block to reduce code duplication. Do you know how || works? It’s – look it up. If you can figure out how to google that. It’s not the way you want it to work. It like disables the effect of set -e on the left-hand side of its argument in a super weird confusing way I don’t know the semantics I just know I’ve been burned before.

Anyway, I start over:

claudius $ git bisect reset
Previous HEAD position was b774845af Set release date
Switched to branch 'master'
Your branch is up to date with 'origin/master'.

claudius $ git bisect start --term-old assertion-failed --term-new mercurial-is-fine

claudius $ git bisect mercurial-is-fine

claudius $ git bisect assertion-failed 2.3.10
Bisecting: a merge base must be tested
[b774845af7a645b44bff69cf9f655c47fe4b9fb2] Set release date

claudius $ git bisect run ~/scratch/nix-detective
running /home/ian/scratch/nix-detective

And we shall return… in another six hours. Fortunately I remember to collect garbage before walking away from my computer to enjoy the last rays of sunshine on this beautiful spring day.

The merge base b774845af7a645b44bff69cf9f655c47fe4b9fb2 is mercurial-is-fine.
This means the first 'assertion-failed' commit is between b774845af7a645b44bff69cf9f655c47fe4b9fb2 and [8803753666023882515404177b08f3f8bdad52a0].
bisect run failed:
'bisect_state mercurial-is-fine' exited with error code 3

Okay no; that one failed pretty fast. Seems that the problem does not occur in the “merge base,” so the bug is (according to my bisect) never actually present in the master branch. It was introduced at some point after the 2.3.10 tag diverged from master.

So… that’s good, right? I mean, that really narrows the search space. It also means that there is a bug that has not been fixed yet. So we do get to debug it. Nice!

So let’s try to bisect that. It shouldn’t be too hard, and we’re back into familiar “find a bug” bisect territory, so we can just use the normal “good” and “bad” terminology.

And what… what am I doing here? What is my life right now? What has this series become? I’m way too deep into this to bail right now, so I push those thoughts deep down inside of me and get bisecting. Did you notice the wordplay there? Did you see it? Do you appreciate the things I do to spice up this already thrilling account of me fumbling my way through basic git commands?

$ git bisect reset start run good bad detective --help --where-is-the-forest --who-is-the-tree

Okay focus:

claudius $ git bisect start

claudius $ git bisect bad 2.3.10

claudius $ git bisect good 2.3
Bisecting: 74 revisions left to test after this (roughly 6 steps)
[7afd8321edbf94d19caa76b668133ae3d0e58eb3] libstore/ssh: Improve error message on failing `execvp`

claudius $ git bisect run ~/scratch/nix-detective
running /home/ian/scratch/nix-detective

And that worked!

He wrote, optimistically, after about ten seconds of nothing happening. We’ll be back… in another six hours.

Wow, time really flies on the internet huh.

1d5cb6ad4839a50a96c27c94f19adcb97b6391af is the first bad commit
...
bisect run success

Whoops.

$ git bisect log
git bisect start
# bad: [8803753666023882515404177b08f3f8bdad52a0] Merge pull request #4374 from NixOS/2.3-absolute-url-in-binary-caches
git bisect bad 8803753666023882515404177b08f3f8bdad52a0
# good: [22d4ea7a989d26b86fc27706dfea0abd2fb52c52] Tweak release notes
git bisect good 22d4ea7a989d26b86fc27706dfea0abd2fb52c52
# bad: [7afd8321edbf94d19caa76b668133ae3d0e58eb3] libstore/ssh: Improve error message on failing `execvp`
git bisect bad 7afd8321edbf94d19caa76b668133ae3d0e58eb3
# bad: [10bf5340ca35269153aca67ecd35f5419d0a08bc] Fix sandbox fallback settings
git bisect bad 10bf5340ca35269153aca67ecd35f5419d0a08bc
# bad: [65953789bcd73f098486b0a385b4e661c0ccda19] Remove world-writability from per-user directories
git bisect bad 65953789bcd73f098486b0a385b4e661c0ccda19
# bad: [f3ce4453a61fff960551322c1743f979f8c07e68] Don't catch exceptions by value
git bisect bad f3ce4453a61fff960551322c1743f979f8c07e68
# bad: [3c5788d09444b48c5ad82e8677d4ac5a58b94a3a] Fix typos in the Nix Manual.
git bisect bad 3c5788d09444b48c5ad82e8677d4ac5a58b94a3a
# bad: [1b78bbb4144c6ad4ef15f7a10fd9d479e06df5da] nix search: Don't quietly ignore errors
git bisect bad 1b78bbb4144c6ad4ef15f7a10fd9d479e06df5da
# bad: [1d5cb6ad4839a50a96c27c94f19adcb97b6391af] getSourceExpr(): Handle channels
git bisect bad 1d5cb6ad4839a50a96c27c94f19adcb97b6391af
# first bad commit: [1d5cb6ad4839a50a96c27c94f19adcb97b6391af] getSourceExpr(): Handle channels

I inverted what I was looking for… but I forgot to invert the script, so I got a nonsense answer: the very first commit made in the 2.3 branch, which appears to have nothing to do with anything. Man, that was really dumb.

Alright, well, pretend you didn’t see that. I’ll see you in another six hours!

Six hours later:

Oh I hate this so much. It finished, in the exact opposite way:

claudius $ git bisect log
git bisect start
# bad: [8803753666023882515404177b08f3f8bdad52a0] Merge pull request #4374 from NixOS/2.3-absolute-url-in-binary-caches
git bisect bad 8803753666023882515404177b08f3f8bdad52a0
# good: [22d4ea7a989d26b86fc27706dfea0abd2fb52c52] Tweak release notes
git bisect good 22d4ea7a989d26b86fc27706dfea0abd2fb52c52
# good: [7afd8321edbf94d19caa76b668133ae3d0e58eb3] libstore/ssh: Improve error message on failing `execvp`
git bisect good 7afd8321edbf94d19caa76b668133ae3d0e58eb3
# good: [2fad345ae1b316b9f08c31565374bc22e2fd37ec] Bump version
git bisect good 2fad345ae1b316b9f08c31565374bc22e2fd37ec
# good: [f09b375837e8139b4b06efeb6517265370b128c4] Prevent a deadlock when user namespace setup fails
git bisect good f09b375837e8139b4b06efeb6517265370b128c4
# good: [c67264d218f6595e8fd3f59dcf76c20350198b90] Bump version
git bisect good c67264d218f6595e8fd3f59dcf76c20350198b90
# good: [62f01d7ed3d5e6fb0dc1ed05be45e4e793bbf839] Bump version
git bisect good 62f01d7ed3d5e6fb0dc1ed05be45e4e793bbf839
# good: [0f359049157337a91b524b07d1ef122f75404f7b] Add support for \u escape in fromJSON
git bisect good 0f359049157337a91b524b07d1ef122f75404f7b
# good: [6de15f722daa709ee0ea5a32cc6b9b203d8bac4c] Allow HTTP binary cache to request absolute uris
git bisect good 6de15f722daa709ee0ea5a32cc6b9b203d8bac4c
# first bad commit: [8803753666023882515404177b08f3f8bdad52a0] Merge pull request #4374 from NixOS/2.3-absolute-url-in-binary-caches

It never found a bad commit.

Which means one of two things: the test that I’m doing to build mercurial isn’t working – I suspect that it isn’t actually building mercurial, in the case that I already have mercurial in my store. I assumed there was a problem just like evaluating the derivation, based on the error message, but maybe not.

The second thing is much more terrifying: that this bug doesn’t actually exist at 2.3.10. And that’s – surely not, right? But I never tested it! I just assumed that the tag 2.3.10 meant the Nix release 2.3.10 – and that version definitely has the bug. But was that not a good assumption?

I’ll have to build that revision by hand and test it manually.

I confirm that collecting garbage after removing the symlink does the thing that I want it to:

claudius $ rm -rf mercurial

claudius $ nix-collect-garbage
finding garbage collector roots...
removing stale link from '/nix/var/nix/gcroots/auto/xcj9j0ibwmjhw421sz2zz5g332gvczw4' to '/home/ian/src/nix/mercurial'
deleting garbage...
deleting '/nix/store/r8iamjpyzwy154g0dvr397gcls2crm3w-mercurial-5.6'
deleting '/nix/store/yl69v76azrz4daiqksrhb8nnmdiqdjg9-python3-3.8.8'
deleting '/nix/store/9gc6b7bazn23m0g7xcg9zv4j36kqraa7-mercurial-5.6.drv'
deleting '/nix/store/xys2byp05w12ixiswi69znw2yrfnfkd0-mercurial-5.6.tar.gz.drv'
deleting '/nix/store/1amg8fs88bj0ac06hbs1fbqf22c9rak5-readline-6.3p08'
deleting '/nix/store/cdz5vbnfp9vq84ir414cgnvzq63wp9m6-gdbm-1.19'
deleting '/nix/store/trash'
deleting unused links...
note: currently hard linking saves -0.00 MiB
6 store paths deleted, 77.14 MiB freed

So that in case it’s actually the first thing, I won’t be misled. See you in a bit.

Okay.

Bad news.

2.3.10 can build mercurial just fine.

What the hell is going on here?

I do a sanity check, and find that my stable nix-build can also build mercurial now. I scroll up, and see that… I didn’t just dream that this was an issue.

And I haven’t run nix-channel --update.

So what changed such that this works now?

I check my ~/.nix-defexpr/channels/manifest.nix and see that I’m on Nixpkgs f5f6dc053b1 – a commit from 2021-03-05 (it is now the 12th). So it hasn’t like… updated my channel in the background or something. That would have been crazy, but it was the first thing I thought of, because I do have NixOS auto-updating… but that’s a completely different channel on a completely different user.

But something about something has changed such that I can now build this package that I couldn’t build before.

Hmm.

Let’s do a sanity check: it’s still broken on my laptop, right?

$ nix-env -iA nixpkgs.mercurial
installing 'mercurial-5.6'
these paths will be fetched (3.45 MiB download, 14.95 MiB unpacked):
  /nix/store/ipymk6q0mdpvisyfmr3fz12s7yl2dcgq-mercurial-5.6
copying path '/nix/store/ipymk6q0mdpvisyfmr3fz12s7yl2dcgq-mercurial-5.6' from 'https://cache.nixos.org'...
Assertion failed: (size_ < capacity_), function push_back, file src/libexpr/attr-set.hh, line 54.
[1]    48643 abort      nix-env -iA nixpkgs.mercurial

Well thank goodness.

Which means…

Ugh. Which means I’m an idiot. Also on my laptop:

$ nix-build '<nixpkgs>' -A mercurial -o mercurial
/nix/store/ipymk6q0mdpvisyfmr3fz12s7yl2dcgq-mercurial-5.6

Heavy sigh. Well, that was a really dumb assumption. That’s on me.

So none of the bisects that I’ve done have meant… anything.

Reset everything to the initial state. Unwind the last, like, 20 pages of text you’ve read. We don’t even know if this bug exists on master or not, because I was checking the wrong thing this whole time. I feel very silly.

I am very curious though… installing and building aren’t very different, right? Like, aren’t we basically just building stuff and then making some symlinks? The error is coming not from the complex evaluation of a Nix expression, but just from the symlinking part? I don’t know. We’ll find out, I guess.

Aaaand:

claudius $ result/bin/nix-env -iA nixpkgs.mercurial -p ~/scratch/profile
installing 'mercurial-5.6'
nix-env: src/libexpr/attr-set.hh:54: void nix::Bindings::push_back(const nix::Attr&): Assertion `size_ < capacity_' failed.
[1]    32002 abort (core dumped)  result/bin/nix-env -iA nixpkgs.mercurial -p ~/scratch/profile

Yep. Okay. Gah. Stupid.

claudius $ git checkout master

claudius $ nix-build

Well, I’ve wasted a lot claudius’s time here. The poor lad has just been compiling Nix in the background for the past like 24 hours, for no reason. I hope that this serves as a good lesson in, you know, not being an idiot. It was certainly humbling. I truly have made an ass out of you and me.

But one wrong turn will not keep us from our destination. Nor… three wrong turns. Maybe they were all left turns, and now we’re headed in the… right direction.

claudius $ result/bin/nix-env -iA nixpkgs.mercurial -p ~/scratch/profile
installing 'mercurial-5.6'
these 4 paths will be fetched (14.94 MiB download, 78.20 MiB unpacked):
  /nix/store/1amg8fs88bj0ac06hbs1fbqf22c9rak5-readline-6.3p08
  /nix/store/cdz5vbnfp9vq84ir414cgnvzq63wp9m6-gdbm-1.19
  /nix/store/r8iamjpyzwy154g0dvr397gcls2crm3w-mercurial-5.6
  /nix/store/yl69v76azrz4daiqksrhb8nnmdiqdjg9-python3-3.8.8
copying path '/nix/store/cdz5vbnfp9vq84ir414cgnvzq63wp9m6-gdbm-1.19' from 'https://cache.nixos.org'...
downloading 'https://cache.nixos.org/nar/1nh2lym6nmhm8d0hf0hlyrczxzac6p5fyk42z2wx6q6jsz1g7n44.nar.xz'...
copying path '/nix/store/1amg8fs88bj0ac06hbs1fbqf22c9rak5-readline-6.3p08' from 'https://cache.nixos.org'...
downloading 'https://cache.nixos.org/nar/1dfaz2awrdv01cqics7j3i0hyymhzx9y7gn3fs17pnm8aalbp2gx.nar.xz'...
copying path '/nix/store/yl69v76azrz4daiqksrhb8nnmdiqdjg9-python3-3.8.8' from 'https://cache.nixos.org'...
downloading 'https://cache.nixos.org/nar/0jdkfshgiqfz1r51w7710pyg2js61k0iijpcz873fz6j9fywvkxz.nar.xz'...
copying path '/nix/store/r8iamjpyzwy154g0dvr397gcls2crm3w-mercurial-5.6' from 'https://cache.nixos.org'...
downloading 'https://cache.nixos.org/nar/1cxpijnp5m0fvcrylxmd1k9p8m5q58sdg5zcp36dfr8lp37f9n9w.nar.xz'...
building '/nix/store/87c047xnxhjibf8h4wi8j4pxvxp48vn2-user-environment.drv'...
created 3 symlinks in user environment

Okay. So it is fixed in master. As, you know, as I expected. Now let’s do the correct bisect this time, so we can backport the fix to 2.3.10

I very carefully restore the original script I had run to make sure that I get the old/new terminology right…

old = exit 0 = install-failed
new = exit 1 = install-succeeded

And just to review:

claudius $ cat ~/scratch/nix-detective
#!/usr/bin/env bash

set -euo pipefail

nix-env -p ~/scratch/profile --uninstall mercurial || true
nix-collect-garbage

if [[ -e default.nix ]]; then
  nix-build || exit 125
else
  nix-build release.nix -A build.x86_64-linux || exit 125
fi

if result/bin/nix-env -p ~/scratch/profile -iA nixpkgs.mercurial; then
  # install succeeded
  exit 1
else
  # install failed
  exit 0
fi

Wish me luck.

claudius $ git bisect start --term-old install-failed --term-new install-succeeded

claudius $ git bisect install-succeeded master

claudius $ git bisect install-failed 2.3.10
Bisecting: a merge base must be tested
[b774845af7a645b44bff69cf9f655c47fe4b9fb2] Set release date

claudius $ git bisect run ~/scratch/nix-detective

Alright. See you soon.

Oh hey! In the meanwhile I am notified that someone opened a PR that addresses the issue I posted above about not being able to build Nix on macOS. Neat! Although doing these long-running bisects is really much nicer on the remote machine, due to, you know, laptops wanting to take lots of naps. So that doesn’t really affect me yet. But I’m happy to see some movement there!

Anyway. Our bisect finished, and it looks like it actually succeeded:

d27eb0ef573b4739967119448779da4a8b2a2cbf is the first install-succeeded commit
commit d27eb0ef573b4739967119448779da4a8b2a2cbf
Author: David McFarland
Date:   Wed Dec 30 16:20:03 2020 -0400

    Fix insufficent attribute capacity in user profile

 src/nix-env/user-env.cc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
bisect run success

bisect run success indeed. Look at that beautiful actual bisect:

claudius $ git bisect log
git bisect start '--term-old' 'install-failed' '--term-new' 'install-succeeded'
# install-succeeded: [1c0e3e453d41b869e4ac7e25dc1c00c349a7c411] Merge pull request #4601 from lovesegfault/fix-4598
git bisect install-succeeded 1c0e3e453d41b869e4ac7e25dc1c00c349a7c411
# install-failed: [8803753666023882515404177b08f3f8bdad52a0] Merge pull request #4374 from NixOS/2.3-absolute-url-in-binary-caches
git bisect install-failed 8803753666023882515404177b08f3f8bdad52a0
# install-failed: [b774845af7a645b44bff69cf9f655c47fe4b9fb2] Set release date
git bisect install-failed b774845af7a645b44bff69cf9f655c47fe4b9fb2
# install-failed: [09fc06daab280735dd2ec94276f00a9c5bffd9b2] nix flake init: Use git add --force
git bisect install-failed 09fc06daab280735dd2ec94276f00a9c5bffd9b2
# install-failed: [9ee3122ec71ae43f5cd8bf0a9282777ba17342c5] Remove redundant import
git bisect install-failed 9ee3122ec71ae43f5cd8bf0a9282777ba17342c5
# install-failed: [1973669e868f4414b666d0fbd34f1a7a87322ae9] Merge pull request #4271 from wiltaylor/IgnoreReferenceSwitch
git bisect install-failed 1973669e868f4414b666d0fbd34f1a7a87322ae9
# skip: [17beae299d5e6bb511c453d0b9d0d7ef906b3d14] Support binary unit prefixes in command line arguments
git bisect skip 17beae299d5e6bb511c453d0b9d0d7ef906b3d14
# install-failed: [4d458394991f3086c3c9c306d000e6c0058c4fa7] Fix the detection of already built drv outputs
git bisect install-failed 4d458394991f3086c3c9c306d000e6c0058c4fa7
# install-succeeded: [0eb22db3116585821096b7b81295d4bbf5550343] Fix macOS build
git bisect install-succeeded 0eb22db3116585821096b7b81295d4bbf5550343
# install-succeeded: [d27eb0ef573b4739967119448779da4a8b2a2cbf] Fix insufficent attribute capacity in user profile
git bisect install-succeeded d27eb0ef573b4739967119448779da4a8b2a2cbf
# install-failed: [c14ed3f8b2cbddb335227d2ff5188896e76b713f] Add 'nix store' NAR-related manpages
git bisect install-failed c14ed3f8b2cbddb335227d2ff5188896e76b713f
# install-failed: [5178211e963fa111f84c4881b22cc506d5254fde] Add 'nix' manpage
git bisect install-failed 5178211e963fa111f84c4881b22cc506d5254fde
# install-failed: [8927cba62f5afb33b01016d5c4f7f8b7d0adde3c] Merge pull request #4366 from NixOS/readInvalidDerivation-on-remote-caches
git bisect install-failed 8927cba62f5afb33b01016d5c4f7f8b7d0adde3c
# install-failed: [5ef7e63ac61efcab020e64bca39ffdc1716718ed] Merge pull request #4399 from sevan/patch-1
git bisect install-failed 5ef7e63ac61efcab020e64bca39ffdc1716718ed
# install-failed: [093de16223b8b93d803e4cd1cc1d3945cb3dfeb1] README: fix link to hacking guide
git bisect install-failed 093de16223b8b93d803e4cd1cc1d3945cb3dfeb1
# install-failed: [abbf9df7b1c4219d5a6d3234d9149204208be7de] Merge pull request #4407 from cole-h/fix-hacking-link
git bisect install-failed abbf9df7b1c4219d5a6d3234d9149204208be7de
# first install-succeeded commit: [d27eb0ef573b4739967119448779da4a8b2a2cbf] Fix insufficent attribute capacity in user profile

Great. So from looking at the patch – it’s pretty small – my first thought is that basically mercurial has too many outputs. There is – or there was – a hardcoded maximum in here, and I guess that the mercurial derivation exceeds that maximum.

--- a/src/nix-env/user-env.cc
+++ b/src/nix-env/user-env.cc
@@ -53,10 +53,12 @@ bool createUserEnv(EvalState & state, DrvInfos & elems,
            output paths, and optionally the derivation path, as well
            as the meta attributes. */
         Path drvPath = keepDerivations ? i.queryDrvPath() : "";
+        DrvInfo::Outputs outputs = i.queryOutputs(true);
+        StringSet metaNames = i.queryMetaNames();

         Value & v(*state.allocValue());
         manifest.listElems()[n++] = &v;
-        state.mkAttrs(v, 16);
+        state.mkAttrs(v, 7 + outputs.size());

         mkString(*state.allocAttr(v, state.sType), "derivation");
         mkString(*state.allocAttr(v, state.sName), i.queryName());
@@ -68,7 +70,6 @@ bool createUserEnv(EvalState & state, DrvInfos & elems,
             mkString(*state.allocAttr(v, state.sDrvPath), i.queryDrvPath());

         // Copy each output meant for installation.
-        DrvInfo::Outputs outputs = i.queryOutputs(true);
         Value & vOutputs = *state.allocAttr(v, state.sOutputs);
         state.mkList(vOutputs, outputs.size());
         unsigned int m = 0;
@@ -88,8 +89,7 @@ bool createUserEnv(EvalState & state, DrvInfos & elems,

         // Copy the meta attributes.
         Value & vMeta = *state.allocAttr(v, state.sMeta);
-        state.mkAttrs(vMeta, 16);
-        StringSet metaNames = i.queryMetaNames();
+        state.mkAttrs(vMeta, metaNames.size());
         for (auto & j : metaNames) {
             Value * v = i.queryMeta(j);
             if (!v) continue;

But when I look at the mercurial package, I can see… well, that it’s constructed with python3Packages.buildPythonApplication. I trace that definition to pkgs/top-level/python-packages.nix:

buildPythonApplication = makeOverridablePythonPackage ( makeOverridable (callPackage ../development/interpreters/python/mk-python-derivation.nix {
    namePrefix = "";        # Python applications should not have any prefix
    toPythonModule = x: x;  # Application does not provide modules.
  }));

Alright, you know what, let’s just ask nix repl.

nix-repl> (import <nixpkgs> {}).mercurial.outputs
[ "out" ]

So, okay. My theory doesn’t hold up.

Let’s just look at the PR for this commit. I bet it describes the issue.

https://github.com/NixOS/nix/pull/4411

Aha. Too many outputs might have triggered an assertion failure, but in this case we’re hitting the “too many meta attributes,” which also has a hardcoded capacity. I imagine once upon a time there was some fixed set of meta attributes… or something… I dunno. It’s just a bug.

nix-repl> builtins.length (builtins.attrNames (import <nixpkgs> {}).mercurial.meta)
17

Anyway, this was merged into master about 3.5 months ago. I was hoping that by pulling up the PR I would see the associated test case, but… nope. Not a thing.

I like test cases. I wonder how I could write one to demonstrate this bug.

But first off: let’s see if I can backport this patch on my laptop, as that would be a lot more convenient. The macOS build probably wasn’t broken when 2.3.10 was cut, right?

$ git checkout 2.3.10
Note: switching to '2.3.10'.

$ nix-build release.nix -A build.x86_64-darwin
...
configure: error: in `/private/tmp/nix-build-nix-tarball-2.3.10pre7057_8803753.drv-0/source':
configure: error: C compiler cannot create executables
See `config.log' for more details
build time elapsed:  0m1.078s 0m1.279s 0m8.135s 0m4.238s
builder for '/nix/store/nvbvw12ymdqvqbld2fhc7nzb1arpw6hq-nix-tarball-2.3.10pre7057_8803753.drv' failed with exit code 77
cannot build derivation '/nix/store/h61w59q5xf3gzqsg9s006vzbfqapfdjb-nix-2.3.10pre7057_8803753.drv': 1 dependencies couldn't be built
error: build of '/nix/store/h61w59q5xf3gzqsg9s006vzbfqapfdjb-nix-2.3.10pre7057_8803753.drv' failed

Sigh. ‘Kay.

Alright. I look around for tests. There is a directory called tests/. It looks like this:

$ ls tests
add.sh                      binary-cache.sh             brotli.sh
build-dry.sh                build-hook.nix              build-remote.sh
case-hack.sh                case.nar                    check-refs.nix
check-refs.sh               check-reqs.nix              check-reqs.sh
check.nix                   check.sh                    common.sh.in
config.nix                  dependencies.builder0.sh    dependencies.builder1.sh
dependencies.builder2.sh    dependencies.nix            dependencies.sh
dump-db.sh                  export-graph.nix            export-graph.sh
export.sh                   fetchGit.sh                 fetchMercurial.sh
fetchurl.sh                 filter-source.nix           filter-source.sh
fixed.builder1.sh           fixed.builder2.sh           fixed.nix
fixed.sh                    function-trace.sh*          gc-auto.sh
gc-concurrent.builder.sh    gc-concurrent.nix           gc-concurrent.sh
gc-concurrent2.builder.sh   gc-runtime.nix              gc-runtime.sh
gc.sh                       hash-check.nix              hash.sh
import-derivation.nix       import-derivation.sh        init.sh
install-darwin.sh*          lang/                       lang.sh
linux-sandbox.sh            local.mk                    logging.sh
misc.sh                     multiple-outputs.nix        multiple-outputs.sh
nar-access.nix              nar-access.sh               nix-build.sh
nix-channel.sh              nix-copy-closure.nix        nix-copy-ssh.sh
nix-profile.sh              nix-shell.sh                optimise-store.sh
parallel.builder.sh         parallel.nix                parallel.sh
pass-as-file.sh             placeholders.sh             plugins/
plugins.sh                  post-hook.sh                pure-eval.nix
pure-eval.sh                push-to-store.sh*           referrers.sh
remote-builds.nix           remote-store.sh             repair.sh
restricted.nix              restricted.sh               run.nix
run.sh                      search.nix                  search.sh
secure-drv-outputs.nix      secure-drv-outputs.sh       setuid.nix
shell.nix                   shell.shebang.rb            shell.shebang.sh*
signing.sh                  simple.builder.sh           simple.nix
simple.sh                   structured-attrs.nix        structured-attrs.sh
tarball.sh                  timeout.nix                 timeout.sh
user-envs.builder.sh        user-envs.nix               user-envs.sh

Jinkies.

Let’s see… user-envs.sh seems to contain the tests for nix-env, and uses this simple file as Nixpkgs to install:

$ cat user-envs.nix
# Some dummy arguments...
{ foo ? "foo"
}:

with import ./config.nix;

assert foo == "foo";

let

  makeDrv = name: progName: (mkDerivation {
    inherit name progName system;
    builder = ./user-envs.builder.sh;
  } // {
    meta = {
      description = "A silly test package";
    };
  });

in

  [
    (makeDrv "foo-1.0" "foo")
    (makeDrv "foo-2.0pre1" "foo")
    (makeDrv "bar-0.1" "bar")
    (makeDrv "foo-2.0" "foo")
    (makeDrv "bar-0.1.1" "bar")
    (makeDrv "foo-0.1" "foo" // { meta.priority = 10; })
  ]

And here’s an example of a test case that uses it, from user-envs.sh:

# Installing "*" should install one foo and one bar.
nix-env -e '*'
nix-env -i '*'
test "$(nix-env -q '*' | wc -l)" -eq 2
nix-env -q '*' | grep -q foo-2.0
nix-env -q '*' | grep -q bar-0.1.1

Alright. Very straightforward.

I just have to say that this is exactly the sort of thing that Mercurial’s “unified tests” were made for. Basically instead of writing the above, you’d write something like this:

Installing "*" should install one foo and one bar.

  $ nix-env -e '*' >dev/null
  $ nix-env -i '*' >dev/null
  $ nix-env -q '*'
  foo-2.0
  bar-0.1.1

And when your tests fail, you get much better output – just the diff between expected and actual. At my last job all tests were written this way – not just tests for command-line tools – and I am extremely spoiled by it. It’s the best way to test that I have ever encountered. Being able to just “observe” properties of your code and witness variations from the expected behavior as normal diffs is just… it’s so life-changingly good. I could wax poetic for hours.

Anyway obviously we are not going to change Nix’s testing infrastructure because I happen to like another way of doing things. And honestly the only generic implementation I’ve used of this idea – cram – was kind of a nightmare in practice in a lot of ways that were entirely my fault. So I don’t realistically know… anything. I’m just rambling now.

The point of all of this is: I don’t see any sorts of regression tests here. Seems like it only tests for basic functionality: Nix doesn’t seem to be a project that proves the validity of each bug fix with a failing-to-passing test case. (As I already guessed from the above patch that contained no tests.)

But that doesn’t mean I’m not going to write one for myself.

Man, how do I just run tests? The command appears to be make installcheck, but that don’t work:

claudius $ make installcheck
  GEN    Makefile.config
/bin/sh: ./config.status: No such file or directory
  GEN    tests/common.sh
/bin/sh: ./config.status: No such file or directory
make: *** [mk/templates.mk:17: tests/common.sh] Error 127

Hmm. Doesn’t work from a nix-shell, either. I guess I need to go through the whole rigmarole in my nix-shell. Not sure why it’s not caching things or whatever from my nix-build… I feel like it should? I don’t know. I don’t know what nix-build release.nix does anyway.

But alright. I’m in a nix-shell. I ran all of the individual build steps. I make installcheck and…

[nix-shell:~/src/nix]$ make installcheck
running test tests/init.sh... [PASS]
running test tests/hash.sh... [PASS]
...many similar lines omitted....
running test tests/function-trace.sh... [PASS]
All tests succeeded

Well that’s a relief. I’m not sure how to just run the user-envs test… so I tweak the tests/local.mk file and comment out everything else. Probably fine, right?

[nix-shell:~/src/nix]$ make installcheck
running test tests/user-envs.sh... [PASS]
All tests succeeded

Excellent. I add a couple of “bad” derivations to tests/user-envs.nix, such that it now contains:

[
  (makeDrv "foo-1.0" "foo")
  (makeDrv "foo-2.0pre1" "foo")
  (makeDrv "bar-0.1" "bar")
  (makeDrv "foo-2.0" "foo")
  (makeDrv "bar-0.1.1" "bar")
  (makeDrv "foo-0.1" "foo" // { meta.priority = 10; })
  (makeDrv "many-outputs-0.1" "many-outputs" // {
    outputs = [
      "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
      "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
    ];
  })
  (makeDrv "many-meta-fields-0.1" "many-meta-fields" // {
    meta.a = 1; meta.b = 1; meta.c = 1; meta.d = 1; meta.e = 1;
    meta.f = 1; meta.g = 1; meta.h = 1; meta.i = 1; meta.j = 1;
    meta.k = 1; meta.l = 1; meta.m = 1; meta.n = 1; meta.o = 1;
    meta.p = 1; meta.q = 1; meta.r = 1; meta.s = 1; meta.t = 1;
    meta.u = 1; meta.v = 1; meta.w = 1; meta.x = 1; meta.y = 1;
    meta.z = 1;
  })
]

Interesting to see that attrs // { foo.bar = 1 } syntax (I stole it from the existing // { meta.priority = 10; } expression). Is this something that’s only allowed in an update operation? No; that would be crazy. It’s probably implicitly creating an empty set? I don’t remember anything in the manual about this. Let’s see:

$ nix repl
Welcome to Nix version 2.3.10. Type :? for help.

nix-repl> { foo = 1 }
error: syntax error, unexpected '}', expecting ';', at (string):1:10

nix-repl> { foo = 1; }
{ foo = 1; }

nix-repl> { foo.bar = 1; }
{ foo = { ... }; }

nix-repl> :p { foo.bar = 1; }
{ foo = { bar = 1; }; }

nix-repl> :p { foo.bar = 1; foo.baz = 2; }
{ foo = { bar = 1; baz = 2; }; }

nix-repl> :p { foo = { x = 1; }; foo.bar = 2; }
{ foo = { bar = 2; x = 1; }; }

Okay cool. Works the way I’d expect.

Far more interesting is that we’re using // when the left-hand side is a derivation. So… derivations are just sets? Is that… a thing? Or is // overloaded?

Let’s see.

nix-repl> (import <nixpkgs> {}).mercurial
«derivation /nix/store/jdn30bqb7k2ycjbw0653ryzfr38gvwkd-mercurial-5.6.drv»

nix-repl> :p (import <nixpkgs> {}).mercurial
«derivation /nix/store/jdn30bqb7k2ycjbw0653ryzfr38gvwkd-mercurial-5.6.drv»

I had sort of used this derivation as a set before, when I checked the number of outputs, but I didn’t really think about the fact that I was doing something weird, that I was dealing with this strange opaque thing.

nix-repl> (import <nixpkgs> {}).mercurial // { name = "capricious"; }
«derivation /nix/store/jdn30bqb7k2ycjbw0653ryzfr38gvwkd-mercurial-5.6.drv»

Hmm.

nix-repl> ((import <nixpkgs> {}).mercurial // { name = "capricious"; }).name
"capricious"

nix-repl> ((import <nixpkgs> {}).mercurial // { name = "capricious"; }).pname
"mercurial"

nix-repl> (import <nixpkgs> {}).mercurial // { name = "capricious"; pname = "capricious"; version = "6.0"; }
«derivation /nix/store/jdn30bqb7k2ycjbw0653ryzfr38gvwkd-mercurial-5.6.drv»

Okay; I have no idea. Gotta come back to that later.

Let’s run tests.

[nix-shell:~/src/nix]$ make installcheck
running test tests/user-envs.sh... [FAIL]
    + '[' -z '' ']'
    + clearStore
    + echo 'clearing store...'
    clearing store...
    + chmod -R +w /run/user/1000/nix-test/store
    + rm -rf /run/user/1000/nix-test/store
    + mkdir /run/user/1000/nix-test/store
    + rm -rf /run/user/1000/nix-test/var/nix
    + mkdir /run/user/1000/nix-test/var/nix
    + nix-store --init
    + clearProfiles
    + profiles=/run/user/1000/nix-test/var/nix/profiles
    + rm -rf /run/user/1000/nix-test/var/nix/profiles
    + clearProfiles
    + profiles=/run/user/1000/nix-test/var/nix/profiles
    + rm -rf /run/user/1000/nix-test/var/nix/profiles
    ++ nix-env -p /run/user/1000/nix-test/var/nix/profiles/test -q '*'
    ++ wc -l
    + test 0 -eq 0
    + mkdir -p /run/user/1000/nix-test/test-home
    + nix-env --switch-profile /run/user/1000/nix-test/var/nix/profiles/test
    ++ wc -l
    ++ nix-env -f ./user-envs.nix -qa '*'
    + test 8 -eq 6
1 out of 1 tests failed
make: *** [mk/tests.mk:12: installcheck] Error 1

Certainly not as nice as a diff, but the output makes it pretty clear what’s going on. I update that 6 to an 8 and re-run…

[nix-shell:~/src/nix]$ make installcheck
claudius $ make installcheck
running test tests/user-envs.sh... [FAIL]
    + '[' -z '' ']'
    ...tons of output elided...
    installing 'many-meta-fields-0.1'
    installing 'many-outputs-0.1'
    these derivations will be built:
      /run/user/1000/nix-test/store/cz09h8w2n7zhmydxqzrr313imzw2fbvr-many-meta-fields-0.1.drv
      /run/user/1000/nix-test/store/d07hf54kxndnsgcxk6a4iyf2am63nq66-bar-0.1.1.drv
      /run/user/1000/nix-test/store/hawd6dykvyb2i94pw03pg0fg5k7wx04y-many-outputs-0.1.drv
    building '/run/user/1000/nix-test/store/d07hf54kxndnsgcxk6a4iyf2am63nq66-bar-0.1.1.drv'...
    building '/run/user/1000/nix-test/store/cz09h8w2n7zhmydxqzrr313imzw2fbvr-many-meta-fields-0.1.drv'...
    building '/run/user/1000/nix-test/store/hawd6dykvyb2i94pw03pg0fg5k7wx04y-many-outputs-0.1.drv'...
    nix-env: src/libexpr/attr-set.hh:54: void nix::Bindings::push_back(const nix::Attr&): Assertion `size_ < capacity_' failed.
    user-envs.sh: line 163: 17503 Aborted                 (core dumped) nix-env -i '*'
1 out of 1 tests failed
make: *** [mk/tests.mk:12: installcheck] Error 1

Great! Okay. That was all it took to tickle the bug. Now let’s try to cherry-pick the fix…

[nix-shell:~/src/nix]$ git stash
Saved working directory and index state WIP on (no branch): 880375366 Merge pull request #4374 from NixOS/2.3-absolute-url-in-binary-caches

[nix-shell:~/src/nix]$ git checkout -b backport-user-env-assertion-fix
Switched to a new branch 'backport-user-env-assertion-fix'

[nix-shell:~/src/nix]$ git cherry-pick d27eb0ef5
[backport-user-env-assertion-fix 2fe57daad] Fix insufficent attribute capacity in user profile
 Author: David McFarland
 Date: Wed Dec 30 16:20:03 2020 -0400
 1 file changed, 4 insertions(+), 4 deletions(-)

Well… that was a lot easier than I expected it to be.

I recompile, reinstall, and re-run my test.

[nix-shell:~/src/nix]$ make -j $NIX_BUILD_CORES
  CXX    src/nix-env/user-env.o
  LD     src/nix/nix

[nix-shell:~/src/nix]$ make install
  LD     /home/ian/src/nix/outputs/out/bin/nix

[nix-shell:~/src/nix]$ git stash pop
On branch backport-user-env-assertion-fix
... blah blah blah ...
Dropped refs/stash@{0} (6a9a703177db41f7ca275e3adf11c82df4f8e732)

[nix-shell:~/src/nix]$ make installcheck
running test tests/user-envs.sh... [FAIL]
    + '[' -z '' ']'
    ... elided ...
    created 4 symlinks in user environment
    ++ wc -l
    ++ nix-env -q '*'
    + test 4 -eq 2
1 out of 1 tests failed
make: *** [mk/tests.mk:12: installcheck] Error 1

Excellent. That’s actually the test I quoted above – the one that runs nix-env -i '*' and checks that foo and bar get installed. But of course now foo, bar, many-outputs, and many-meta-fields all get installed. I change that 2 to a 4 and re-run:

[nix-shell:~/src/nix]$ make installcheck
running test tests/user-envs.sh... [PASS]
All tests succeeded

Alright! Nice and easy. I’ve already forked Nix on GitHub, so:

[nix-shell:~/src/nix]$ git remote rename origin upstream

[nix-shell:~/src/nix]$ git remote add origin git@github.com:ianthehenry/nix.git

[nix-shell:~/src/nix]$ git push --set-upstream origin backport-user-env-assertion-fix

And now I have to go into a web browser to finish the process: opening my first Nix pull request. I wrote zero code and it took me less than a calendar week. Wow.

Okay. Man. I don’t think I can keep a diary of any future debugging. It is… slow. Somehow writing down everything I do and everything I think is slowing me down. Weird how that works.

I bet if I were really good at emacs I could set up some kind of fancy literate blog posting mode that would make this natural and easy. But instead I’m actually copying and pasting things into this markdown file. It takes its toll.


  • How do I --keep-failed with nix build?
  • What’s the difference between nix build and nix-build?
  • What is the not terribly gross way to run nix-shell -p with a package from a custom channel (i.e. without renaming it to "nixpkgs")
  • Can I run Nix tests after nix-build without doing a full rebuild from nix-shell?
  • Are derivations sets or not? Why do they print differently? What makes them different or special or magical?

  1. As I do an editing pass over this post before publication, I feel the need to point out that I actually did write this before I tried to do anything, and before I had any idea what was about to happen to me. ↩︎

  2. Alas. If only I had followed through on this thought… I could have saved myself a lot of time. ↩︎