ZFS on Linux with LUKS encrypted disks

WARNING: if you do this wrong or don’t understand the concepts, you risk losing your data. Be sure you know your way around linux and what you’re getting into before attempting!

To me, encryption of data at rest is just as important as encryption of data in transit. You never know if someone is going to break into your house and steal your computer. With so much personal information like financial data and pictures stored on the computer, it could be a major mess to recover from theft of your computer. (Of course, always keep an off-site backup for the really important stuff!)

I chose to migrate from the Solaris based OpenIndiana to Ubuntu. I had grown to love ZFS on OpenIndiana and didn’t want to lose its features. Luckily ZFS on Linux is now ready for prime-time! Unfortunately, ZFS on Linux is a few versions behind the official Oracle ZFS just like all other third part implementations of ZFS and does not support native encryption through the filesystem.

The Solution

My solution was to use the Linux Unified Key Setup (LUKS) to encrypt the raw devices underneath ZFS. LUKS is relatively new to the disk encryption space but is considered mature. The setup goes like this:

ZFS Filesystem  <-top
|
LUKS Encryption
|
Raw disks                  <- bottom

Modified from ServerFault.

With this setup, the mapper devices that are created via the LUKS unlocking/opening process are simply presented as block devices that can then be leveraged by ZFS. The only gotcha is you can’t have your zpools mount on boot – which makes sense because if you’re using encryption you want to be able to unlock them first before they are usable. So on boot, you unlock the disks first, then import your zpools. I’ve provided a script below to make it easier.

Setup

Start with a base, fully up to date Ubuntu 14.04 LTS release, then install a few extra packages:

sudo apt-get install cryptsetup nfs-common nfs-kernel-server samba iscsitarget-dkms iscsitarget

To get ZFS on Linux installed, you need to add and install from a PPA:

sudo add-apt-repository  ppa:zfs-native/stable
sudo apt-get update
sudo apt-get install ubuntu-zfs

Next, select the algorithm and key size to use in your setup. The larger the key size, the longer you will be safe from attack before needing to re-encrypt with a larger key, so go as big as you can afford to with performance. If you have a more modern CPU that supports AES-NI, there should be next to no CPU hit using a 512b key. Also, you should use a cipher using the XFS block mode, as it has been reported there may be vulnerabilities in CBC. From the cryptsetup man page: “The available combinations of ciphers, modes, hashes and key sizes depend on kernel support. See /proc/crypto for a list of available options. You might need  to  load  additional kernel crypto modules in order to get more options.”

To get a rough idea of performance for your CPU, you can run the `cryptsetup benchmark` command:

dan@storage:~$ cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1       197993 iterations per second
PBKDF2-sha256     140786 iterations per second
PBKDF2-sha512      90145 iterations per second
PBKDF2-ripemd160  215578 iterations per second
PBKDF2-whirlpool  115992 iterations per second
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b   141.2 MiB/s   155.1 MiB/s
 serpent-cbc   128b    73.1 MiB/s   181.5 MiB/s
 twofish-cbc   128b   157.0 MiB/s   199.1 MiB/s
     aes-cbc   256b   111.5 MiB/s   121.4 MiB/s
 serpent-cbc   256b    74.4 MiB/s   182.4 MiB/s
 twofish-cbc   256b   158.7 MiB/s   198.6 MiB/s
     aes-xts   256b   151.4 MiB/s   153.5 MiB/s
 serpent-xts   256b   167.0 MiB/s   169.0 MiB/s
 twofish-xts   256b   181.9 MiB/s   181.2 MiB/s
     aes-xts   512b   115.7 MiB/s   118.1 MiB/s
 serpent-xts   512b   167.0 MiB/s   170.2 MiB/s
 twofish-xts   512b   182.2 MiB/s   180.4 MiB/s

My hardware is a bit slow, being about 3 years old and without AES-NI, but when you think about it, I don’t need more than 100MB/s because that will saturate my gigabit ethernet connection and my primary use for this system is network attached storage.

Next, encrypt your disks. Note that you don’t need to create partitions on the raw disks, as you’re just going to create ‘partitions’ via ZFS anyways. The example below formats the raw devices. Make sure you have the right disks selected and that there is no data you need on them, as all data will be rendered useless!

EDIT: I’ve modified the commands to include –iter-time and –use-random for greater security at the suggestion of Valentin in the comments below. A value of 10000 means PBKDF2 will spend 10 seconds processing the passphrase each time you unlock the disk, slowing your mount but also slowing a brute force attempt against your key. I’ve also changed the cipher to aes-xts-plain64 per his suggestion. Thanks Valentin!

cryptsetup luksFormat --cipher aes-xts-plain64 --key-size 512 --iter-time 10000 --use-random -y /dev/sdb
cryptsetup luksFormat --cipher aes-xts-plain64 --key-size 512 --iter-time 10000 --use-random -y /dev/sdc

When running this command you’ll be prompted for a passphrase. Choose a very long and strong passphrase. The strength of this passphrase also determines how vulnerable your disks will be. Also, it is a little counter intuitive for security, but I suggest for ease of use and to use the script provided below to use the same passphrase for all the disks.

Then, unlock them:

cryptsetup luksOpen /dev/sdb sdb-enc
cryptsetup luksOpen /dev/sdc sdc-enc

The final argument can be whatever you want to use to reference the encrypted volume. It will be placed in /dev/mapper and is what you will reference when creating your zpool.

Take note that if you have one of the newer 4K format drives, you will likely gain better performance from using an ashift value of 12 in order to have ZFS align with the sectors on the disk. You won’t hurt the drives by not aligning them – just your performance might not be as great, especially if you’re using raidz1 or raidz2. See this post for methods to determine if you have 4K drives: https://wiki.archlinux.org/index.php/Advanced_Format#How_to_determine_if_HDD_employ_a_4k_sector

This command will create a mirror using our encrypted volumes with ashift set appropriately for 4K drives. If you still have 512 byte drives, just remove the -o ashift=12 option.

sudo zpool create -o ashift=12 tank mirror sdb-enc sdc-enc

And that’s it! ZFS is now riding on LUKS encrypted disks and runs just like Oracle ZFS. Then you can do things like enable deduplication, compression, SMB or NFS via ZFS like you normally would.

Gotchas – solved with a script

In my testing, I crashed my system in every way and state I could think of to make sure my pools came back after reboot and had no problems. I’ve confirmed that if the power goes out, the pools will come back. ZFS isn’t fragile and a scrub after an unclean shutdown does wonders for peace of mind, but it is a little disconcerting to forcefully close the LUKS containers on shutdown without having unmounted the zpools first. There is always the possibility of data loss if the pool is in the middle of writing data and dies. So I created a script to try to make it more easy and safe to reboot the system, as well as help unlock LUKS and remount ZFS on boot via an included rc script.

Note that the script is generic. You need to modify the first few lines to include references to your zpools and LUKS disks and mapper names. If you use other services that leverage the pools such as iSCSI or ZFS native SMB/NFS, you might need to modify the script to add commands to stop those services before unmounting the zpools, otherwise they will be busy.

Boot process:

  1. Get all zpool statuses and exports any active pools, which handles the case of an unclean shutdown without first exporting the zpools
  2. Close all open LUKS containers
  3. Opens all LUKS containers specified at the top of the script
  4. Import all zpools specified at the top of the script
  5. Shares out any ZFS shares

Shutdown process:

  1. Unshare any ZFS shares
  2. Export active zpools
  3. Close all LUKS containers.

It can be run with the parameters of status, mount, unmount, reboot, and shutdown.

To cleanly shutdown automatically on reboot or shutdown, copy the ‘storage’ script from the repository into /etc/init.d, then run this command to set the script up to automatically stop for each runlevel, while not running on startup:

update-rc.d storage stop 99 0 1 2 3 4 5 6 .

Then when you startup, simply run `service storage start` and enter your passphrase to unlock the disks.

The script is available on my BitBucket here. If you have suggestions, submit a pull request!

References

http://serverfault.com/questions/586973/zfs-raid-and-luks-encryption-in-linux
http://www.heath-bar.com/blog/?p=203
http://linhost.info/2012/05/configure-ubuntu-to-serve-as-an-iscsi-target/
http://zfsonlinux.org/

46 thoughts on “ZFS on Linux with LUKS encrypted disks”

  1. Thanks for a remarkable post, as I need encryption I was going to go for mdadm raid as I couldn’t find a decent post of LUKS under ZFS,

    Quick question though, can you kindly explain where the 2 scripts should be placed and do I just add it to regular startup scripts?

    Many Thanks

    1. You’re welcome, glad it was useful! I put mountVolumes.sh in /root because that was convenient. You could alternatively put it in /etc or any other location. If you put it somewhere other than /root, you’ll have to modify the ‘storage’ script to point to the new location.

      I’d suggest you put the ‘storage’ script in /etc/init.d, then run this command to automatically run the ‘stop’ command and unmount everything when shutting down the system:
      update-rc.d storage stop 99 0 1 2 3 4 5 6 .

      Note, you shouldn’t set ‘storage’ to start on boot automatically because the boot process will hang until you get to the console of the server to enter the passphrase to unlock the disks. I handle this by letting the OS come up completely, then running “sudo service storage start” which will kick off the script, allow me to enter the passphrase, and then mount all the volumes.

      Hope that helps,
      Dan

  2. Amazing, thanks for your swift response.
    When I run “sudo update-rc.d storage stop 99 0 1 2 3 4 5 6” I get the following error
    user@user:/etc/init.d$ sudo update-rc.d storage stop 99 0 1 2 3 4 5 6
    update-rc.d: warning: start runlevel arguments (none) do not match storage Default-Start values (2 3 4 5)
    update-rc.d: warning: stop runlevel arguments (0 1 2 3 4 5 6) do not match storage Default-Stop values (none)
    update-rc.d: error: start|stop arguments not terminated by “.”
    usage: update-rc.d [-n] [-f] remove
    update-rc.d [-n] defaults [NN | SS KK]
    update-rc.d [-n] start|stop NN runlvl [runlvl] […] .
    update-rc.d [-n] disable|enable [S|2|3|4|5]
    -n: not really
    -f: force
    The disable|enable API is not stable and might change in the future.

    What can I do to resolve this? (I am using Ubuntu 14.04)

    1. I think you’re just missing a trailing space and period at the end of that command. Check the post above again – if you add the space and period at the end it should work.

  3. Yes, no error now.

    How can I test if it works? I can run sudo service storage but is there any output to see if it exports the pool before luksClose?

    Many thanks.

    1. If anything fails the script should exit at the failure, not proceed any further and print an error message. While you’re testing, you should manually check with ‘zpool status’ just to verify.

  4. sorry, just noticed there were some errors.

    user@user-main:/etc/init.d$ sudo update-rc.d storage stop 99 0 1 2 3 4 5 6 .
    update-rc.d: warning: start runlevel arguments (none) do not match storage Default-Start values (2 3 4 5)
    update-rc.d: warning: stop runlevel arguments (0 1 2 3 4 5 6) do not match storage Default-Stop values (none)
    Adding system startup for /etc/init.d/storage …
    /etc/rc0.d/K99storage -> ../init.d/storage
    /etc/rc1.d/K99storage -> ../init.d/storage
    /etc/rc2.d/K99storage -> ../init.d/storage
    /etc/rc3.d/K99storage -> ../init.d/storage
    /etc/rc4.d/K99storage -> ../init.d/storage
    /etc/rc5.d/K99storage -> ../init.d/storage
    /etc/rc6.d/K99storage -> ../init.d/storage

  5. I attached a usb drive, on next reboot it couldn’t mount 1 of the dev/sdX as it was taken by usb, so I rebuilt my zfs using UUID instead.

    When I run sudo service storage stop, the PC shuts-down instantly, when I do a normal restart I see nothing in the log file to say if it exported and closed luks, is this normal behaviour?

    When I start the storage server it takes about 10 minutes to import the pool, is it possible on shut-down it does not export the pool?

    many thanks

  6. The problem was I was using stop instead of restart, with restart looks fine, however in logs i only see
    Checking pool status:
    vpool: ONLINE
    Exporting pools… Checking pool status:
    vpool: unknown – not imported
    it doesn’t log about closing luks.
    it also seems that restarting the machine does not call on this script, i have to run it directly.

    1. The update-rc.d command you ran earlier is what sets the script to be called when the machine is shutting down and restarting. It’s been a while since I’ve written this script so I don’t recall exactly what is logged where. If you want to modify the script and either post your changes or submit a pull request on bitbucket, I’ll glad incorporate them! I don’t have time right now to work on it myself though, sorry.

  7. You forgot to add “–iter-time”.

    Key stretching will improve your security and will allow you to use the shorter password and get the same or better security.

  8. Also adding “–use-random” will be great. The default is “–use-urandom”, which is not secure.

    –use-random

    –use-urandom
    For luksFormat these options define which kernel random number
    generator will be used to create the master key (which is a
    long-term key).

    See NOTES ON RANDOM NUMBER GENERATORS for more information. Use
    cryptsetup –help to show the compiled-in default random number
    generator.

    WARNING: In a low-entropy situation (e.g. in an embedded sys‐
    tem), both selections are problematic. Using /dev/urandom can
    lead to weak keys. Using /dev/random can block a long time,
    potentially forever, if not enough entropy can be harvested by
    the kernel.

  9. Please note that many other programs use weak key stretching by default.

    GnuPG have limitation of 65011712 for the “–s2k-count”, so you can’t enter number greater than 65011712 without modifying the source code.

    I had written simple Python script to solve this problem:

    https://github.com/vstoykovbg/slowkdf

    It works with any insecure (because of too weak or non-existent key stretching) application or system, it can be used to create secure passwords for online services like e-mail, online banking (you need to remember a simple password, salt and number of iterations).

    For example, you remember the password “correct secure horse”, salt “sea salt”, number of iterations “10” and it gives you the password:

    Digest in hex format: 1424f5b8c2a5b9f2ca297e4253f55eea115ad48a18314189fb659f9d81ceba0060505b5e9a01ae6852c2a928be46e75a20f4d1be379ce2b7b85d63870fd32934e086be8f175edacb5fb7f744763429016f2fbf7065452b1121112ad5b3deaa2ba005f577cd75277f7a44f60c55c389e01c593a1b2577ab0547b1c42df794db2f

    Digest in base64 format: FCT1uMKlufLKKX5CU/Ve6hFa1IoYMUGJ+2WfnYHOugBgUFtemgGuaFLCqSi+RudaIPTRvjec4re4XWOHD9MpNOCGvo8XXtrLX7f3RHY0KQFvL79wZUUrESERKtWz3qoroAX1d811J396RPYMVcOJ4BxZOhsld6sFR7HELfeU2y8=

    You also need to remember which digest you use for your password and how many characters (because usually you have limitation on this). And of course, you need to keep the script and remember to use it.

    For example, if you need 40 characters, you get the first 40 characters from the base64 digest:

    FCT1uMKlufLKKX5CU/Ve6hFa1IoYMUGJ+2WfnYHO

    It is much more secure than “correct secure horse” and “correct secure horse sea salt 40”.

  10. The default password hashing algorithm is ripemd160.

    Maybe using SHA-512 will be more secure (ripemd160 is a 160 bit, SHA-512 is 512 bit).

    Also aes-xts-plain64 is better than aes-xts-plain:

    XTS-PLAIN is perfectly fine when encrypting hard drives that are smaller than 2TB’s. However, using XTS-PLAIN on hard drives larger than 2TB’s will dramatically reduce your security because of repeating initialization vectors. Since XTS-PLAIN64 is backwards compatible with XTS-PLAIN, I usually just choose XTS-PLAIN64. However, if you are using a drive smaller than 2TB’s then there is no reason to use XTS-PLAIN64 over XTS-PLAIN. Also, make sure to correctly double your key size when using the XTS mode of operation to achieve your desired end-result. For example, if you wanted to use AES256 in the XTS mode of operation then you would want to specify your key size to be 512-bits, likewise specifying a key size of 256-bits would result in AES128 being used in the XTS mode of operation. This is because when you use XTS you need to double the key size to be as effective as the other modes of operation.

    KaosuX

    Source: http://ubuntuforums.org/showthread.php?t=2132044

    Example:

    sudo cryptsetup -v –cipher aes-xts-plain64 –key-size 512 –hash sha512 –iter-time 30000 –use-random luksFormat /dev/mydevice

  11. Sweet guide, cant wait untill Im gone test this. One question tho, if Im gone create a zpool with two raidz1 – like this:
    pool ONLINE 0 0 0
    raidz1-0 ONLINE 0 0 0
    sde ONLINE 0 0 0
    sdf ONLINE 0 0 0
    sdg ONLINE 0 0 0
    raidz1-1 ONLINE 0 0 0
    sdh ONLINE 0 0 0
    sdi ONLINE 0 0 0
    sdj ONLINE 0 0 0
    Im I still able to use your script? Would ZFS “know” what sd* goes to what raidz1 or would I need to do something different with your script?

    1. Yup! ZFS stores a uuid for the pool with each disk that is part of it. The disks can be swapped around and get different sd* names each reboot and as long as they are unlocked by LUKS ahead of time, ZFS will scan all available devices looking for the uuid of the pool being imported. In the case of the script though, after the LUKS unlock, the device names that ZFS will use will then be sd*-enc.
      Good luck!

  12. I just wanted to say thank you for such a helpful write up. This has really helped me setup my own encrypted NAS.

    I did have one small beginner’s question. I noticed in your mountVolumes.sh script you call shutdown -h now in the shutdown case and reboot in the reboot case. Can you explain why that’s necessary?

    I’m very new to SysV, but wouldn’t those bits of code only be called in the event that the system is shutting down or rebooting ? So, why tell the system to shutdown or reboot when that’s what it’s already doing? What am I missing?

    1. Hi! Glad my writeup was useful!

      Good question! If you followed my post directly and used update-rc.d to set storage to stop on all runlevels, you are correct, calling ‘shutdown’ and ‘reboot’ in the script is redundant and not needed. I put those in the script so you can run ‘sh mountVolumes.sh shutdown’ directly and all the same unsharing, unmounting, and exporting happen and then the system is also shutdown. So those are not explicitly needed but for my ease of use, I included them.

      1. Thanks for explaining that. I really appreciate it. I’ve been trying to adapt the script for CentOS 7 and chkconfig, so it helps to understand exactly why all of the pieces are there. I can’t seem to get it to run on shutdown or reboot. I’m thinking I might have to try to rewrite the “storage” part as a systemd service…should be fun. 🙂

        1. For CentOS 7, I got it working by just doing:

          chkconfig –add storage
          chkconfig –level 0123456 storage off

          Hope that helps!

  13. Thank you!! This was extremely helpful.

    I have one question. I notice that this script depends on device locations which can change if I add another hard drive to my machine down the road. Is there a way this script could use drive ID’s or something that won’t ever change?

    1. Glad it was useful. I’m not sure about using some kind of drive ID that won’t change. I’m not familiar what a permanent drive ID would look like or how you would query it. If one exists though, you could:
      1. Specify the drive IDs that are part of your pools in the script
      2. Iterate over all the drive IDs and get the /dev/sd* name
      3. Plug the /dev/sd* name into the LUKS commands

      Let me know if you figure out a way to do it – sounds useful!

  14. Hi Dan:

    I follow the process as your explain, but I have the following error when I perform the service storage start:

    root@pve:~# systemctl status storage.service
    ● storage.service – LSB: Thomasomalley storage mounting
    Loaded: loaded (/etc/init.d/storage)
    Active: failed (Result: exit-code) since Tue 2016-06-21 21:31:43 AST; 5s ago
    Process: 3344 ExecStart=/etc/init.d/storage start (code=exited, status=127)

    Jun 21 21:31:43 pve storage[3344]: /bin/bash: list our zpools to be mounted, one per line, no delimiter: No such file or directory
    Jun 21 21:31:43 pve systemd[1]: storage.service: control process exited, code=exited status=127
    Jun 21 21:31:43 pve systemd[1]: Failed to start LSB: Thomasomalley storage mounting.
    Jun 21 21:31:43 pve systemd[1]: Unit storage.service entered failed state.
    root@pve:~#

    Have on mind any solution for this? Thanks a lot and also great process.

    1. Hi! I haven’t fully tested the script with systemd yet, so I don’t know how to make it work with systemctl. If suggest you just run the script with the arguments directly, such as ./storage start

  15. Thanks for your fast response!!!

    I had to modify some lines on the mountVolumes.sh due to the copy/paste, but now I’m good. Now, here my status:

    root@pve:~# systemctl status storage.service
    ● storage.service – LSB: Thomasomalley storage mounting
    Loaded: loaded (/etc/init.d/storage)
    Active: active (exited) since Tue 2016-06-21 22:18:50 AST; 26s ago
    Process: 2869 ExecStart=/etc/init.d/storage start (code=exited, status=0/SUCCESS)

    Jun 21 22:18:47 pve storage[2869]: Checking pool status:
    Jun 21 22:18:47 pve storage[2869]: S_zPool: unknown – not imported
    Jun 21 22:18:47 pve storage[2869]: Making sure all LUKS disks are closed…
    Jun 21 22:18:47 pve storage[2869]: Done.
    Jun 21 22:18:47 pve storage[2869]: Opening /dev/sdb to sdb-enc
    Jun 21 22:18:49 pve storage[2869]: Problem opening /dev/sdb!

    When I run the services storage start, I have no request for the Passphrase. Any flag on it? As per the output below, the script runs the openAllLUKS function without ask me about the passphrase and when try to open, the key does not correct.

    1. Interesting. My guess would be that systemctl runs the script in the background. But if that were true it would probably hang waiting for input… so I’m not sure. It might just be easiest to run the script directly rather than from init.d. I really only set it up that way to make it a tiny bit easier to use and so I didn’t have to remember where the script was 🙂

      1. Confirmed, the script directly from the mountVolumes.sh works. Also had have to remove the shutdown elif because on the reboot process, that statement was confirmed and shutdown my server. Now, I make test on reboot and power off and my disks-zpool runs and up with the script sucesfully.

        Thanks so mucho for your support and time on this. Now to test Proxmox container on the zpool with reboot and power off. Takecare!!!

  16. Dan,
    I believe i was the first to comment on this post, you sure saved the day for many people out there.
    I just upgraded to ubuntu 16.04 and was wondering if you can make changes in the script for systemd or advise what needs to be changed
    (ubuntu 16.04 supports zfs, all you need to apt install zfs, no need to add ppa…)

    1. Hey, I spent a little time on this in the past when I upgraded to 16.04 but I didn’t get very far. Right now I just take the lazy way and just run the script manually. I’ll try to get back to it when I have time to learn how to setup the script for systemctl. If you get it modified successfully and want to share, I’d gladly incorporate your changes!

      1. ok, I will try to make it work for systemd.
        I noticed recently that while 1 drive disconnected the script gave an error and exited, I think it should show the error but continue to mount and import the pool, my zfs setup will allow for up to 2 drives to fail. if you can make that amendment it would be great.

  17. Way late to this post but still super handy!

    Also for low power or embedded devices with entropy issues check out the package “haveged” it helps!

  18. Just some points i can improve.

    You can boot Grub2 bootloader having it inside a ZFS over a LUKs over a LUKs over a LUKs … if you install it with grub2-install modules parameter and edit /etc/default/grub.cfg file.

    With commands on Grub2 you can also mount a Ext4 over a MBR disk over a File over a NTFS partition over a ZFS over a LUKs over a LUKS … etc. So putting that commands on grub.cfg you can read the kernel and initramfs from there.

    If you tweak the initrafs with that mount commands, you can boot a Linux with rootfs on whatever you want, like a ZFS over a LUKs over a LUKS over a LUKS.

    Please use at least Grub2 2.02~beta3, since Grub2 2.02~beta2 and prior have a BUG not allowing to mount LUKs on pre-boot.

    Yes, i am talking about pre-boot time. Grub2 is great for such… you can have main bootloader (the Grub2 i comment) inside a ZFS pool that resides on top of a LUKs container that also resides on top of a LUKs container, etc.

    And also can have (and boot from) a Stripped, Mirrored, etc, set as this: ZFS uses any number and any kind of “containers” (files, partitions, disks, LUKs, etc)… putting Grub2 inside that ZFS (if installed with correct modules) can pre-boot and show the menu.

    Let me explain with a sample boot time steps: PowerON, BIOS (yes 32bit system only) runs, check for boot disk, load first sector (MBR, GPT, etc), run that code (aka, Grub2), then Grub2 loads the pre-boot modules (ZFS, LUKS, NTFS, Ext*, etc), since ZFS uses multiple disks and all them are with LUKs… it starts from low level till reach ZFS level, then load grub.cfg and precess it, showing the menu.

    That low to top level can be: Ask for LUKs over HDD0, Ask for LUKs over LUKS over HDD0, Ask for LUKs over LUKs over LUKS over HDD0, Ask for LUKs over HDD1, Ask for LUKs over LUKS over HDD1, Ask for LUKs over LUKs over LUKS over HDD1, Ask for LUKs over HDD2, Ask for LUKs over LUKS over HDD2, Ask for LUKs over LUKs over LUKS over HDD2, … and so on… till all “containers” needed for the ZFS are accessible (warning: Grub2 version must be 2.02~beta3 at least for multi-LUKS work)… also at pre-boot it can use files as containers (if LOOP module is installed for pre-boot)… and so on… when all are accesible, ZFS is mounted (if ZFS module is installed for pre-boot)… so at end it reaches a point where grub.cfg can be readed.

    I know installing Grub2 that way (by using normal command line tools of Ubuntu and such things is tedious, editing a lot of files, etc)… i never ever install Grub2 that way.

    I prefer to have a chain of two bootloaders, one under my total control (first one to boot on) and one for each system under the total control of that system, isolated… i just loose one second on each boot and i gain i must not touch bootloader if any of the multiple OS touch the bootloader (mine only does a chain load to the other bootloader).

    That way i manual install Grub2 with grub2-install command, only that command… then i create grub.cfg by hand with the simpliest menu you can imagine, one entry per OS… doing so i can control where i have Grub2 stuff… inisde a striped set of disks, on a USB disk/stick, on both and use any of them at any time, etc… but the best part is that such Grub2 main bootloader can be inside whatever levels i want of: ZFS (with stripping/mirror/mixed), LUKS, LVM, etc. i mean not just one level, also more levels.

    My complex boot test was (where i have grub.cfg): ZFS pool that uses a striped set of 8x mirrors, each mirror of 3x containers, each of thoose containers being (files, partitions, disks, LUKs, LVM partitions, mixed typed, etc, a lot of tests i had done).

    Just a simple sample of grub.cfg be on a ZFS pool that uses a mirror of 2 sets:

    Set0 (stripping):
    LUKs over LVM over LUKs over HDD0 over Sata0 (Sata-III)
    LUKs over LUKS over LUKs over HDD1 over Pata0_Master (IDE)
    LUKs over LUKs over HDD2 over SAS0 (SAS)

    Set1 (stripping):
    LUKs over LUKs over LUKs over LUKS over LUKs over HDD3 over USB0 (USB 2.0)
    LUKs over LUKs over HDD5 over USB1 (USB 3.0)
    LUKs over Loop File over LUKs over USB2 (USB 3.1 Gen2 Type C)

    You can make it as complex as wanted.

    For security reasons:
    Never ever use AES (i helped on code programing of a break that works fast for AES-128 to AES-8192, it does not get the Key, but get the data on clear, plain decrypted at fly (after 5seconds for AES-128 and one hour for AES-8192, on supercomputing) without the need for the key; i mean key, not password, passphrase, etc… it can also be used to recover from damaged master key), use TwoFish and Serpent (at least by now i do not know they are broken)… and also do not use Whirpool as hash, use SHA-512 at least, SHA-256 is also broken (a friend of me show me a run test)… that, AES and Whirpool breakers will not be public till military agancies do not migrate, too risky by now (we recieved a big warning to reveal that code, our lives in risk).
    Never ever use only one layer of LUKs… allways put a LUKs over a LUKs over a LUKS, … and so on, each with different parameters.

    The iter-time parameter of LUKs does not represent “time” as said on wiki/docs… it represent iterations… so better use a value between 10K and 50K… 10000 causes a delay of near 30 second (at pre-boot) and a few seconds (near 5) on normal mount when OS is loaded.
    Security risk: use a fixed value, like 10000 (that exact one), better use a random number and better if it is not integer divisible.

    I tend to use for iter values over 100K and sometimes (laptops) over 500K and near 1M… warn about doing so… at boot-time after typing the passphrase you can need to wait up to 3000 seconds (near 50 minutes, yes close to one hour) and since i use stripes (multi-disk) and 3 to 8 levels of LUKs (variable on each stripe) i can need to enter about 15 of thoose, so 15*50=750minutes just to give acces to grub.cfg file and show the menu, that is 12 and a half hours… i use that for Laptops internal boot.

    I am paranoid, i am a hacker, etc… and so, i do not want a border agent be able to put his hands on my private data (phone number, eMail, etc). For the rest of more private i use more security level (not LUKs, etc… my own XOR method that mathematically warranty no possible decryption, bassically the KEY is equal in Length to data Length… so i use a key of 500GiB = 4294967296000 bits, not 256 bits, not 8192 bts, etc).

    Remember boot time be more than a half-time? I can boot that in a faster way, having another bootloader on external media, secured in a military grade room, so when i am at home, i can boot it in minutes, but when i am travelling, it requieres some hours, rootfs has less levels, but uses keyfiles (stored where bootloader grub.cfg is and on my secured disk at military grade room).

    That way to gain access to rootfs you need to read such keyfiles that when i am traveling are only inside a partition that needs more than half a day to be mounted if no miss typing passphrases, but at home only a few minutes is needed. Of course… i do not use my Laptop in middle of the travel.

    If you need to use a Linux on travel, allways use (not yours one) a Live Linux… and not to mention, never ever carry togheter your personal info and the system that can use it… only use internal disk to hide one or some OS in a big list of dummy OS, Live OS, etc… and your data on external media.

    What happens if a PC / Laptop is steal or breaks… nothing, there is no data i need, only the OS i use to access my data, all data is on multiply copies (more than seven) on external media on military grade secured rooms, and never ever leave that place.

    Paranoid way of think, i know… but can serve as a sample on how complex anyone can put the OS bootloader Grub2… actually i do not know of any other bootloader able to be so deeply hided/encrypted and be able to boot from there.

    For criticism: Some part of Grub2 is not encrypted, where such modules are stored for pre-boot time, etc., that is a Risk if someone put hands on it and you did not be aware of that (evil maid attack), they can replace it with a keylogger, etc… so allways use external bootloaders, not the internal one… so put as long as you can then internal one and do not worry about boot time from internal, you will not be using it, but someone (thief, border agent, etc) will must use it.

    Well… i actually use a more deep hide… use Ext4 free zone sector X to Y as a LOOP on pre-boot (with a own maded Module for Grub2)… and actually working on a non ordered sector list but having some BUGs yet! so sectors for that loop can be spreaded all along the hdd, also using sectors of multiple partitions, etc… for example mixing some free zones of some Ext4 partitons, etc.,… and disordered.

    All that has a disadvantage… as much more secured… as much more slower times (also Read and Write, not only mounting).

    So a simple a good way would be: ZFS on top of a set of stripes/mirros/mixed of a chain of three or more LUKs per disk.

  19. Hey all,

    when you done with this setup and the disks are not yet mounted by using mountVolumes.sh are they shown as unallocated space?

    I have the problem that I used this setup for a while. When I logged in last time the mountVolumes.sh script failed with the error ” … sdb1 is not a valid luks device”. Running “zfs status” returns “no pools available”. I am quite new to linux and should not used this quite complex setup without understanding it completely and am afraid that my data is now lost.

    Thanks in advance

    1. Hi! Sorry to hear you’re having problems. Yes, in the configuration described in this post, disks that are not yet mounted by mountVolumes.sh/LUKS would show as unallocated space.

      One easy thing to try that has happened to me before is that your disk might have changed identifiers. For example, if it used to be sdb1 maybe it is now sdc1. Sometimes they will change due to driver changes or if the SATA/USB port used changes.

      Hope you get it working!

  20. Two years after the last post, i hope i can get an answer here…

    I have just one annoying problem, and it is possible, that i misunderstanding the meaning of exporting and importing pools.

    Every time i want to import an exported pool, i get the message

    “a pool with that name already exists”. It happens in the script and it happens if i try it manually. Where is the error here?

    1. Hi! That sounds to me like the pool is still online and imported. I’d guess that zpool status shows it is still online. Is that correct? Maybe with some kind of error?

      1. Hi Dan,
        thanks for replying.

        You’re right! Meanwhile i found out, that the pool is still online after exporting. But i never get an error after exporting the pool (neither via script nor manually). And if i enter “zpool import” (without entering the name of pool), it shows me the pool, so i think, the pool is exported correctly. It is very strange, because the zfs handbook says, an exported pool should be invisible for the system. But i see it and i can the the testfiles and testdirectories in the pool.

        1. I wonder if you perhaps created the test files and test directories on the local filesystem once when the pool was not imported. I’ve done that accidentally before and filled up my local disk copying files that I thought were going to the pool.

          So for example, your pool name is “media”. When your zpool is not imported, /media still exists as an empty directory on your machine. BUT you can create files there all you want. Then when you “zpool import media” the contents of your pool will override what is in /media and all you will see is the contents of the pool. When you zpool export export you’ll see the original contents of the directory from the regular filesystem.

          A couple ways to troubleshoot this – after you “zpool export”, try closing your LUKS container. If cryptsetup says the files are still in use, then it might be still mounted. Another thing you can do is import your pool and create a new set of test files, perhaps one with a date and time as the name. Then export the pool and see if you still see that new file.

          Hope that makes sense!

          1. Hi Dan, thanks for replying again.

            I tested it like you said. But it is still diffusing:

            I export the pool “zpool export media”.
            I test with “zpool status media” and it says “cannot open ‘media’ – no such pool”. Should be ok.
            After 10 – 15 seconds i test again with “zpool status media” – and now i see the pool again, with status “online” – it looks like the pool would be imported again in this seconds.
            The same behavior i see with the other test:
            i export the pool, and if i close quickly the LUKS containers, it is ok. But if i wait 10 -15 secons after the export and then try to close the LUKS containers, it fails, because they are in use – and the pool is online again.

            It is just crasy.

            Possible important: I have debian 10 installed on a LUKS-encrypted SSD (passphrase) and then proxmox on it. And i have 6 HDDs which are LUKS-encrypted too, but with keyfiles instead of passphrases. LUKS Open and LUKS Close are working fine.

          2. Very interesting. There must be some utility that is monitoring and auto-importing all available zpools. I can’t think of what would do that off the top of my head. A quick google points here which may or may not be helpful: https://github.com/openzfs/zfs/issues/330

            That would be my next step – looking for something that is auto importing pools.

  21. Hi Dan,
    meanwhile I’ve tried to find out which service or setting is responsible for the automatic reimport.

    The entries in “/etc/default/zfs” dont seem to be, because there is a documentation in it, which says, that the option “ZFS_MOUNT=’no’ ” will not prevent systemd for launching zfs-mount.service. I test the options, and the behavior is still there. I stop and disable zfs-mount.service too, but the behavior is still there, too.

    I keep trying to find the cause and would write again here.

    But a question to you: This HowTo is from 2014 and the script is very elegant – but did you ever test the same on a “fresh” Debian 10 system? I wonder, if you would have the same problems.

    1. Hi Wolfsrabe. I just tried it on a fresh Debian 10 VM. Added two small virtual disks and followed my tutorial in this post. Then I tried various combinations of rebooting, luksOpen, luksClose etc and I can’t duplicate what you experienced.

      Is your pool name “media”? It could be something weird with that. On Ubuntu I’ve had other issues with my pool being named media because that by default is a system directory and other processes expect it to be there. I did try that on my clean Debian 10 VM and didn’t have any issues after I deleted the directory before importing it the first time.

Leave a Reply to sruli Cancel reply

Your email address will not be published. Required fields are marked *