Debugging full disk space in an illumos VM

A troubleshooting story as I ran into full disk space errors on OpenIndiana with ZFS

Experimenting with illumos

I've been researching inner details of different Unix systems, so in essence I've been running way too many virtual machines. My latest read has been the core internals of illumos-gate, a successor to OpenSolaris from Sun.

Similar to the Linux community, developers have created different distros based on illumos. The illumos-gate project itself doesn't just contain a kernel however but ships with a userspace with libraries and ready command line programs. The experience is similar to the FreeBSD project. The distro I chose to explore was OpenIndiana but others exist such as OmniOS and SmartOS.

Note: I believe I read somewhere that the term gate refers to a specific project management concept from Sun's culture.

Booting OpenIndiana and installing Rust

OpenIndiana was painless to boot in a VM. My physical host is a Linux station so I've been running VMs in QEMU using virt-manager and virsh. With a successful boot, I then wanted to set up the illumos-gate repo within the VM itself. This was less straightforward because of stale documentation but after enough searching, I resolved the issues I had.

I now wanted to write a few programs for illumos in Rust but the default install process didn't work immediately. Fortunately, I found out you can install Rust through pkgin.

Update - 08/12/2021: I sent a PR to rustup to clear up the banner confusion with illumos distros. The default curl command should work, but it might need to be piped into bash.

Filling up ZFS

Originally, I set up the VM to have 50GB of disk which I thought would be enough to host the illumos-gate project and a few Rust programs. Yet, after a day of use, I started seeing random processes start failing with No space left. Oops.

By default, OpenIndiana ships with the commonly talked about ZFS, a filesystem I had to learn much more about to wrap my head around to fix this disk issue I started running up against.

To verify my disk use, I ran zpool to clarify:

$ zpool list
NAME       SIZE      ALLOC    FREE
rpool      49.8G     49.6G       0

Nice, I used it up. I figured I'll just shutdown the VM, add a new virtual disk and reboot. virsh should make that easy enough.

Adding a new disk to ZFS

I was hoping that by just adding the new disk that ZFS would pick it up and just do its thing or at least ask me about the new disk (in retrospect, I understand that I was asking for too much) but instead the boot process crashed into a bash shell after a wall of No space left errors so I couldn't even get back into the desktop environment. Also for some reason, I couldn't see my own /export/home files? I was logged in as root since I couldn't log in as my user. Okay, whatever, let me see if even the virtual disk I just added is registered:

$ diskinfo
TYPE    DISK    VID    PID     SIZE        RMV    SSD
ATA     c1d0    -      -        49.8 GiB    no     no

My main disk is listed but what about the other virtual one I just added? Maybe I didn't understand what diskinfo was supposed to return? Maybe ZFS has a way of seeing the physical device? There has to be a device node file somewhere, right?

Although optimistic in the beginning, I slowly was coming to the conclusion that for some unknown reason, my VM couldn't see the new drive and that I would have to recreate a new VM and go through the song and dance of setting it up again. Was this new disk invisible because of a caching issue? An OS issue or maybe even an underlying QEMU virtualization issue?

Pro tip: you can sometimes replace "illumos" with "Solaris" in your Google searches for better answers

The solution was hidden on this old Oracle Solaris page: Virtual Disks Do Not Display on Solaris 10 Guests.

The instructions listed there worked for me:

$ devfsadm -v
$ diskinfo
TYPE    DISK    VID    PID     SIZE        RMV    SSD
ATA     c1d0    -      -         49.8 GiB   no     no
ATA     c2s0    -      -        149.8 GiB   no     no

Looks like devfsadm(1M) by default refreshes the list of device nodes found in /dev and /devices. Adding it to ZFS was easy enough:

$ zpool add rpool c2s0
$ reboot

And voila! OpenIndiana fully loaded for me on reboot. No more wall of errors

Conclusion

As I mentioned earlier, a nice thing about the illumos project is that it ships with not just the kernel but also with core user space tooling so to look at how devfsadm(1M) worked, I can read the source code directly in usr/src/cmd/devfsadm. Looks like there's a hidden flag to toggle verbosity levels!

Written by