Linux system rollback using btrfs snapshots
Procedures to:
-
Snapshot the home and root subvolumes before any "risky" procedure ("dnf upgrade" for example).
-
Rollback the home and root subvolumes to their OK state (before the risky procedure) should the procedure
make the system unbootable or unstable.
-
No provision is made to rollback the /boot hard drive. If a new kernel was installed by the "risky procedure",
the GRUB2 boot menu can be used to select the 1st previous kernel. The newly installed faulty kernel
can then be deleted later via DNF.
Contact me:
Two linux systems are involved:
-
TARGET: The system on which the risky procedure is performed
-
RESCUE: The system used to rollback the TARGET system (usually booted from USB thumb drive or DVD)
The following subvolumes are used on the TARGET system
(per Fedora standard btrfs install; see /etc/fstab)
-
root00 - mounted at /
-
/var/lib/machines
-
(created by systemd for use with containers and VM's; more info)
-
home00 - mounted at /home
-
Notes:
-
Procedures were developed and tested on Fedora 35. This release used subvolume names
"root00" and "home00" to differentiate the subvolumes from their related directories (/ and /home).
(A good idea!) Fedora 37 unfortunately has changed the subvolume names to "root" and "home".
In all the commands below, wherever "root00" and "home00" are used, you must substitute
whatever names your TARGET system uses for the root (/) and home (/home) subvolumes.
(Use "sudo btrfs subv list /" to see these names.)
-
Use command df on the TARGET system to determine the hard drive containing the directories
/ (root00) and /home (home00) (usually sd?3)
Testing
Follow the steps in Logic overview using the following for "risky procedure":
Create dummy file in "root00" subvolume:
$ cd /
$ ls -la
$ sudo touch RISKY-PROCEDURE
$ ls -la
Create dummy file in "home00" subvolume:
$ cd ~
$ ls -la
$ touch RISKY-PROCEDURE
$ ls -la
Assume the target system is now unstable and complete Rollback TARGET to OK subvolumes.
Then check that the two RISKY-PROCEDURE files are GONE:
Check "root00" subvolume:
$ cd /
$ ls -la
Check "home00" subvolume:
$ cd ~
$ ls -la
Logic overview
Boot into runlevel 1 and create TARGET system snapshots
Re-boot TARGET system
Run the "risky procedure" on the TARGET system
Boot the modified TARGET system
if the TARGET system does not boot or is unstable
if a new kernel was installed
try re-booting the TARGET system with the 1st-previous kernel;
if the TARGET system boots OK and is stable
do;
remove the newly installed faulty kernel with DNF;
Delete TARGET system shapshots (not needed);
end;
else Rollback TARGET to OK subvolumes;
else Rollback TARGET to OK subvolumes;
else Delete TARGET system shapshots (not needed);
Boot into runlevel 1 and create TARGET system shapshots
Runlevel 1 (maintenance mode) requires userid root to login.
To enable this, you must set the password for userid root:
$ sudo passwd root
You must have access to the GRUB2 boot menu to specify runlevel 1.
This is turned off by default. To enable:
$ sudo grub2-editenv - unset menu_auto_hide
Reboot the TARGET system
When the GRUB2 boot menu appears, enter an "e" to the left
of the top line (no quotes)
The edit screen should appear
Use the arrow keys to move down to the line starting with "linux"
Move right to "rhgb quiet" at the end of the line
Replace "rhgb quiet" with "1" (no quotes)
Press ctrl+x to exit
The TARGET system should boot into runlevel 1 (maintenance mode)
Enter the root password when prompted
Create read-only snapshots of the current "good" subvolumes:
(yymmdda = today's date followed by suffix a)
# btrfs subv snapshot -r /home /home/home_yymmdda
# btrfs subv snapshot -r / /root_yymmdda
# btrfs subv snapshot -r /var/lib/machines /machines_yymmdda
("machines" required since snapshots do not recurse
into nested subvolumes)
(script)
List all subvolumes:
# btrfs subv list /
# shutdown -r now
(reboot the TARGET system and run the "risky procedure")
Rollback TARGET to OK subvolumes
boot RESCUE system (usually USB thumb drive or DVD)
Mount the hard drive containing the TARGET system:
$ sudo mount /dev/sd?3 /mnt
$ cd /mnt
do some listings
$ ls -la
$ sudo btrfs subv list /mnt
(view output)
Delete the machines subvolume (will recreate from snapshot)
$ sudo btrfs subv delete /mnt/root00/var/lib/machines
Rename the current "bad" subvolumes:
$ sudo mv /mnt/home00 /mnt/home00bad
$ sudo mv /mnt/root00 /mnt/root00bad
(view output)
Use snapshot to rename the "good" subvolumes to their "correct" names and change
them from read-only to read-write:
Note: the "machines" subvolume is restored to a temporary location.
Restoring to /var/lib/machines would create the "machines" directory
under /var/lib/ on the RESCUE system instead of the TARGET system.
This will be corrected after rebooting the rolled back TARGET system.
$ sudo btrfs subv snapshot /mnt/home00bad/home_yymmdda /mnt/home00
$ sudo btrfs subv snapshot /mnt/root00bad/root_yymmdda /mnt/root00
$ sudo btrfs subv snapshot /mnt/root00bad/machines_yymmdda /mnt/root00/machines_yymmdda
$ sudo btrfs subv list /mnt
(view output)
Optional: place flags in the rolled back subvolumes
$ cd /mnt/home00
$ sudo touch this-is-home_yymmdda
$ cd /mnt/root00
$ sudo touch this-is-root_yymmdda
$ cd /mnt
(view output)
Delete embedded subvolumes so the "bad" subvolumes can be deleted:
$ sudo btrfs subv delete /mnt/home00bad/home_yymmdda
$ sudo btrfs subv delete /mnt/root00bad/root_yymmdda
$ sudo btrfs subv delete /mnt/root00bad/machines_yymmdda
(view output)
Delete "bad" subvolumes:
$ sudo btrfs subv delete /mnt/home00bad
$ sudo btrfs subv delete /mnt/root00bad
$ sudo btrfs subv list /mnt
(view output)
Cleanup:
$ cd (free up /mnt)
$ sudo umount /mnt
Reboot the rolled back TARGET system (remove RESCUE boot media)
No provision made to roll back the /boot hard drive:
if a new kernel was installed by the "risky procedure"
select the 1st previous kernel on the GRUB2 menu
remove the newly installed kernel via DNF
Move the "machines" subvolume to its correct location:
Get rid of the systemd-created subvolume and/or its directory (both may return "not found")
# btrfs subv list /
# btrfs subv delete /var/lib/machines
# rm -r /var/lib/machines
Move the snapshot subvolume and make it read-write:
# btrfs subv snapshot /machines_yymmdda /var/lib/machines
Delete the snapshot from its temp location
# btrfs subv delete /machines_yymmdda
# btrfs subv list /
(view output)
Complete log of the rollback process
Delete TARGET system snapshots
# btrfs subv list /
# btrfs subv delete /home/home_yymmdda
# btrfs subv delete /root_yymmdda
# btrfs subv delete /machines_yymmdda
# btrfs subv list /
(script)