Linux system rollback using btrfs snapshots

Procedures to:

Contact me:

Two linux systems are involved:

  1. TARGET: The system on which the risky procedure is performed
  2. RESCUE: The system used to rollback the TARGET system (usually booted from USB thumb drive or DVD)
    • Examples:
The following subvolumes are used on the TARGET system
(per Fedora standard btrfs install; see /etc/fstab)
Testing

   Follow the steps in Logic overview using the following for "risky procedure":

      Create dummy file in "root00" subvolume:
         $ cd /
         $ ls -la
         $ sudo touch RISKY-PROCEDURE
         $ ls -la

      Create dummy file in "home00" subvolume:
         $ cd ~
         $ ls -la
         $ touch RISKY-PROCEDURE
         $ ls -la

   Assume the target system is now unstable and complete Rollback TARGET to OK subvolumes.
   Then check that the two RISKY-PROCEDURE files are GONE:

      Check "root00" subvolume:
         $ cd /
         $ ls -la

      Check "home00" subvolume:
         $ cd ~
         $ ls -la

Logic overview

   Boot into runlevel 1 and create TARGET system snapshots

   Re-boot TARGET system

   Run the "risky procedure" on the TARGET system

   Boot the modified TARGET system

   if the TARGET system does not boot or is unstable
      if a new kernel was installed
         try re-booting the TARGET system with the 1st-previous kernel;
         if the TARGET system boots OK and is stable
            do;
               remove the newly installed faulty kernel with DNF;
               Delete TARGET system shapshots (not needed);
            end;
         else Rollback TARGET to OK subvolumes;
      else Rollback TARGET to OK subvolumes;
   else Delete TARGET system shapshots (not needed);

Boot into runlevel 1 and create TARGET system shapshots

   Runlevel 1 (maintenance mode) requires userid root to login.
   To enable this, you must set the password for userid root:
      $ sudo passwd root

   You must have access to the GRUB2 boot menu to specify runlevel 1.
   This is turned off by default. To enable:
      $ sudo grub2-editenv - unset menu_auto_hide

   Reboot the TARGET system

   When the GRUB2 boot menu appears, enter an "e" to the left
   of the top line (no quotes)
      The edit screen should appear
      Use the arrow keys to move down to the line starting with "linux"
      Move right to "rhgb quiet" at the end of the line
      Replace "rhgb quiet" with "1" (no quotes)
      Press ctrl+x to exit

   The TARGET system should boot into runlevel 1 (maintenance mode)

   Enter the root password when prompted

   Create read-only snapshots of the current "good" subvolumes:
      (yymmdda = today's date followed by suffix a)
      # btrfs subv snapshot -r /home /home/home_yymmdda
      # btrfs subv snapshot -r / /root_yymmdda
      # btrfs subv snapshot -r /var/lib/machines /machines_yymmdda
        ("machines" required since snapshots do not recurse
         into nested subvolumes)
      (script)


   List all subvolumes:
      # btrfs subv list /

   # shutdown -r now
     (reboot the TARGET system and run the "risky procedure")
      
Rollback TARGET to OK subvolumes

   boot RESCUE system (usually USB thumb drive or DVD)

   Mount the hard drive containing the TARGET system:
      $ sudo mount /dev/sd?3 /mnt
      $ cd /mnt
      do some listings
         $ ls -la
         $ sudo btrfs subv list /mnt
         (view output)

   Delete the machines subvolume (will recreate from snapshot)
      $ sudo btrfs subv delete /mnt/root00/var/lib/machines

   Rename the current "bad" subvolumes:
      $ sudo mv /mnt/home00 /mnt/home00bad
      $ sudo mv /mnt/root00 /mnt/root00bad

   (view output)

   Use snapshot to rename the "good" subvolumes to their "correct" names and change
   them from read-only to read-write:
      Note: the "machines" subvolume is restored to a temporary location.
            Restoring to /var/lib/machines would create the "machines" directory
            under /var/lib/ on the RESCUE system instead of the TARGET system.
            This will be corrected after rebooting the rolled back TARGET system.
      $ sudo btrfs subv snapshot /mnt/home00bad/home_yymmdda /mnt/home00
      $ sudo btrfs subv snapshot /mnt/root00bad/root_yymmdda /mnt/root00
      $ sudo btrfs subv snapshot /mnt/root00bad/machines_yymmdda /mnt/root00/machines_yymmdda
      $ sudo btrfs subv list /mnt
      (view output)

   Optional: place flags in the rolled back subvolumes
      $ cd /mnt/home00
      $ sudo touch this-is-home_yymmdda
      $ cd /mnt/root00
      $ sudo touch this-is-root_yymmdda
      $ cd /mnt
      (view output)

   Delete embedded subvolumes so the "bad" subvolumes can be deleted: 
      $ sudo btrfs subv delete /mnt/home00bad/home_yymmdda
      $ sudo btrfs subv delete /mnt/root00bad/root_yymmdda
      $ sudo btrfs subv delete /mnt/root00bad/machines_yymmdda
      (view output)

   Delete "bad" subvolumes: 
      $ sudo btrfs subv delete /mnt/home00bad
      $ sudo btrfs subv delete /mnt/root00bad
      $ sudo btrfs subv list /mnt
      (view output)

   Cleanup:
      $ cd  (free up /mnt)
      $ sudo umount /mnt

   Reboot the rolled back TARGET system (remove RESCUE boot media)

      No provision made to roll back the /boot hard drive:
         if a new kernel was installed by the "risky procedure"
            select the 1st previous kernel on the GRUB2 menu
            remove the newly installed kernel via DNF

      Move the "machines" subvolume to its correct location:
         Get rid of the systemd-created subvolume and/or its directory (both may return "not found")
            # btrfs subv list /
            # btrfs subv delete /var/lib/machines
            # rm -r /var/lib/machines
         Move the snapshot subvolume and make it read-write:
            # btrfs subv snapshot /machines_yymmdda /var/lib/machines
         Delete the snapshot from its temp location
            # btrfs subv delete /machines_yymmdda
            # btrfs subv list /
      (view output)

   Complete log of the rollback process

Delete TARGET system snapshots

   # btrfs subv list /
   # btrfs subv delete /home/home_yymmdda
   # btrfs subv delete /root_yymmdda
   # btrfs subv delete /machines_yymmdda
   # btrfs subv list /
   (script)