Rescue System » History » Version 140

« Previous - Version 140/202 (diff) - Next » - Current version
Branislav Katreniak, 04/20/2018 12:40 PM


Rescue System

A minimal Linux system, that is loaded before the main system.

- Select a kernel in /boot on the main rootfs based on priority, verify checksums
- Enable re-flashing the rootfs from a given URL

https://git.digitalstrom.org/bsp/rescue-system (part of this wiki moved to in-project documentation)

Presentation rescue_gdansk_2018.pdf

Switch DSS to devel feed

Enable ssh access in web configurator

ssh dssadmin@dss-ip.local
su
opkg update
opkg install rescue-utils
sed  -n 's/.*\(rescue-.*\).*/\1/p' /proc/cmdline
# run `<pre>rescue-install.sh && reboot</pre>` if no match found
reflash_rootfs yocto-devel

Install

rescue system should be installed by default on devel/testing feeds, which can be checked from the kernel cmdline

Booted from rescue system and what is its version?

rescue-system appends its version to the cmdline of the 2nd stage kernel

$ sed  -n 's/.*\(rescue-.*\).*/\1/p' /proc/cmdline
rescue-system=2017.K-test01
$ awk -v RS=" " -v FS="=" '/rescue-system/ {print $2}' /proc/cmdline
2017.K-test01
# for i in in `cat /proc/cmdline`; do echo $i; done
...
rescue-system=2017.K-test01

Alternative:

TODO pls mangle to use case

/usr/lib/kernel/install.d/90-loaderentry.install

if ! [[ ${BOOT_OPTIONS[*]} ]]; then
    read -r -d '' -a line < /proc/cmdline
    for i in "${line[@]}"; do
        [[ "${i#initrd=*}" != "$i" ]] && continue
        BOOT_OPTIONS+=("$i")
    done
fi

to install it explicitly

opkg install rescue-install

Reflash of Rootfs

WARNING

You can easily destroy your device following these step. This has not yet been tested widely and is intended to be done in a laboratory environment where you have access to the boot loader and can restore the device by other means. Make sure you backup all your settings, metering data, etc. In case of problems request help on the irc channel, because this will definitely VOID YOUR WARRANTY

Reflash rootfs

# opkg install rescue-utils
# reflash_rootfs -h
Usage:
  reflash_rootfs [-u url] [-t feed]
Options:
  -t use a predefined url based on devel/testing/production
  -l list available feeds
  -u specify url to desired rootfs explicitly
  -h print this help
  -n no signature checks

Will create a reflash request file. This file serves as a command to the
rescue system, to erase the existing rootfs and re-install it from given url.

This will create a file /var/lib/rescue-system/reflash_request.conf. That folder serves as mailbox for communication with the rescue-system

ROOTFS_IMAGE_URL=url
ROOTFS_IMAGE_FILE=path   (use either IMAGE_URL or IMAGE_FILE)
ROOTFS_IMAGE_BYTES=number (optional)
ROOTFS_SIGNATURE_FILE=path (optional)
ROOTFS_CHECKSUM=number (optional)

Switching feeds

You can easily destroy your device following these step. If you don't know why this is needed, don't do it. Please read the Warning before continuing

# reflash_rootfs -n -t yocto-fieldtest

Production/Fieldtest uses keys that are better protected then devel/testing keys. If you need to switch from production/fieldtest to devel or testing, or reverse, image verification will fail. This is because the rescue system only knows the public key of its own feed. Hence production rescue system can not install devel rootfs that is signed with the less protected devel key. If you have root access you can bypass the test though

Verify the created /var/lib/rescue-system/reflash_request.conf and remove the SIGNATURE URL line

Issue reflash rootfs from rescue shell

DSS RESCUE /# ifup eth0
DSS RESCUE /# reflash-rootfs /etc/rescue-system/default_rootfs_download.conf

Selecting target kernel based on priority

Inspired by the Freedesktop Bootloader Specification
http://freedesktop.org/wiki/Specifications/BootLoaderSpec/

Idea is to drop/remove files into a folder without any re-configuration of the boot loader

The kernel to be booted is selected based on kernel version. Higher version means higher priority. The boot entry has to follow a certain naming schema. The description is ignored.

BOOT_ENTRY_NAME := 'linux-' VERSION '-description' '.conf'
VERSION := x.x.x | x.x

See https://git.digitalstrom.org/bsp/rescue-system/blob/master/src/dss-boot#L251 for how the selection mechanism works

Boot entry files support the following keywords:

IMAGE_FILE[exclusive] := path to zImage/uImage relative to /boot folder
IMAGE_VOLUME_DEVICE[exclusive] := /dev/mtd1, /dev/mmmcblk0p1 if kernel is stored in a raw partition
IMAGE_FORMAT[optional] := 'zImage' / 'uImage' if not obvious from image file name
IMAGE_CHECKSUM[optional] := sha256sum of the image
DTB_FILE := path to devicetree file, relative to /boot folder
DTB_CHECKSUM[optional] := sha256sum of device tree file
ARGUMENTS[optional] := use this a cmdline instead of forwarding u-boot cmdline
ATAGS[optional] := kernel uses legacy ATAGS mechanism instead of device tree files

Manually install your custom built kernel

Copy zImage and devicetree file into /boot folder of the main root filesystem. Then create a boot-entry in folder /boot/entries

$ cat linux-4.4.1-custom.conf
IMAGE_FILE=zImage-4.4.1
DTB_FILE=devicetree-zImage-imx6q-dss11e6x.dtb

Disable a kernel from being selected (kernel downgrade)

If the boot entry file does not match the 'linux-version-description.conf' pattern it will be ignored.

# cd /boot/entries
# mv linux-4.1.19.conf linux-4.1.19.conf_
# reboot

Later re-enable the new kernel:

# cd /boot/entries
# mv linux-4.1.19.conf_ linux-4.1.19.conf
# reboot

switch to specific kernel without reboot (kexec)

# opkg install rescue-utils
# boot-kernel -l
linux-4.1.20-panic-on-oops.conf
linux-4.1.19.conf
# boot-kernel linux-4.1.20-panic-on-oops.conf

USB upgrade with version check

rescue-system does not understand versions. It can only flash files, but version comparison has to happen in the rootfs.

digitalstrom-upgrade
├── digitalstrom-devel-swupdate-image--dss20-20171110064801.swu
├── dss20
│   └── update.conf
└── update.conf  # not yet used

content of update.conf:

# SWUPDATE_IMAGE_FILE=digitalstrom-devel-swupdate-image--dss20-20171110023105.swu
ROOTFS_VERSION=20171110033105

the SWUPDATE_IMAGE_FILE so far is optional
ensure that ROOTFS_VERSION file is bigger than the timestamp in /etc/version

NOTE: update.conf must be in subfolder ${MACHINE}

Forced USB rootfs reflash

NOTE: in transition to swupdate image format, pls check rescue-system documentation

Use in case your rootfs is corrupt, or you want to reset your device to factory settings. Mind that you will lose all your settings

Create a USB stick with this structure, this example works for dss11-sdc make sure you use the right machine otherwise the update wil be ignored

├── digitalstrom-upgrade
│   ├── dss11-sdc
│   │   ├── digitalstrom-devel-rootfs-dss11-sdc.jffs2
│   │   ├── digitalstrom-devel-rootfs-dss11-sdc.jffs2.gpg
│   │   └── update.conf
│   └── forceflashandreset

All the images can be found on the update server. Search for the update.conf containing the file names of the latest images.

$ cd /mnt/usb-stick
$ mkdir digitalstrom-upgrade/dss11-sdc
$ touch digitalstrom-upgrade/forceflashandreset
$ cd digitalstrom-upgrade/dss11-sdc
$ wget <image-url-from-update.conf>
$ wget <gpg-url from update.conf>
$ wget <update.conf itself>

Inspect and modify the update.conf, you need to remove the "http://" prefix, leave only the bare filenames. Also replace ROOTFS_IMAGE_URL by
ROOTFS_IMAGE_FILE. In case you are switching feeds, e.g. from production to testing, leave out the signatures. gpg verification only works if stay within the same feed.

sample update.conf

ROOTFS_IMAGE_FILE=digitalstrom-devel-rootfs-dss11-sdc-20161129025155.rootfs.jffs2
ROOTFS_CHECKSUM=b759cf5fb856610820faab605e02dd05a9fa58cf5eb750c21e1a29caabd20da6   # sha256sum
ROOTFS_SIGNATURE_FILE=digitalstrom-devel-rootfs-dss11-sdc-20161129025155.rootfs.jffs2.gpg
ROOTFS_VERSION=20161129025155

As soon as the flashing start you should see the blue led flashing

double check that the files in the update.conf really exist in the same folder. I recommend to change into the same folder and run ls filename

Once the blue LED stops flashing blue and turn orange, remove the stick. You have to remove the stick before the system will boot, this is to prevent an infinite reflash loop.

Testing

The rescue system itself needs to be updated. To always have a bootable system, the production system must be bootable without rescue-system. That means the u-boot must have a default kernel to select which is able to mount the rootfs.
This is a precaution against power failures during upgrade. This non-rescue boot path is termed "factory or fallback kernel boot"

The alternative, non rescue-system kernel works

During rescue-system upgrade, we configure u-boot to boot using the alternative kernel. Only after a successful update we re-enable the rescue-system. Although the alternative kernel is only used in the error case, we still need to verify it works with the current file system, especially usb upgrade should be tested.

To test this, we need to disable rescue. Verify it is disabled by either observing the console or grep /proc/cmdline as above Install. If it booted correctly without rescue-system continue testing basic functionality, needed to allow another attempt to install rescue-system or provide remote support

# opkg remove --force-depends rescue-install

This will not remove the rescue-system from the raw partitions, it only prevents that it will be re-installed

Temporarily disable the rescue system

From rootfs / Linux

We have two sorts of boot loaders, those with and without filesystem support.

With filesystem support, the boot loader will check the rescue partition for a boot.scr file, if we rename that file rescue is disabled

# mount /dev/mmcblk0p3 /mnt/rescue
# (cd /mnt/rescue; mv boot.scr #boot.scr)

The older models (dss11/d11-1gb) do not yet have filesystem support. The rescue-system has to be enabled in the u-boot environment. The special variable bootsel is used to enable rescue

# fw_setenv bootsel rescue  (enable)
# fw_setenv bootsel         (disable)
From u-boot console
If you have access to the u-boot console, you can bypass it directly:

Permanently remove rescue-system

dss11-sdc

root@SS-sdc: # fw_setenv bootsel   # <- removes rescue from the boot order, u-boot will not try to boot rescue
root@SS-sdc: # cat /proc/mtd 
dev:    size   erasesize  name
mtd0: 00020000 00020000 "bootstrap" 
mtd1: 00040000 00020000 "uboot" 
mtd2: 00040000 00020000 "uboot-env" 
mtd3: 00200000 00020000 "kernel-rescue" 
mtd4: 00200000 00020000 "kernel-prod" 
mtd5: 02000000 00020000 "rootfs-rescue" 
mtd6: 0c800000 00020000 "rootfs-prod" 
mtd7: 01360000 00020000 "config" 
root@dSS-sdc:~# flash_erase /dev/mtd5 0 0
Erasing 128 Kibyte @ 1fe0000 -- 100 % complete 
root@dSS-sdc:~# flash_erase /dev/mtd3 0 0
Erasing 128 Kibyte @ 1e0000 -- 100 % complete

analogous for dss11-1gb

root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# fw_setenv bootsel
root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00400000 00080000 "kernel-rescue" 
mtd1: 00400000 00080000 "kernel-prod" 
mtd2: 02000000 00080000 "rootfs-rescue" 
mtd3: 3d000000 00080000 "rootfs-prod" 
mtd4: 00800000 00080000 "config" 
mtd5: 00080000 00010000 "spi32766.0" 
mtd6: 3b4f0000 0007e000 "dss11-1gb-rootfs" 
root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# flash_erase /dev/mtd2 0 0
Erasing 512 Kibyte @ 1f80000 -- 100 % complete
root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# flash_erase /dev/mtd0 0 0
Erasing 512 Kibyte @ 380000 -- 100 % complete

For u-boot with filesystem support (dss11e/dss20),

# mount /dev/mmcblk0p3 /mnt/rescue
# (cd /mnt/rescue; rm -rf *)

Default environment

Modifying the u-boot environment means erasing, then rewriting it. In the better case (dss11-sdc) we have two environments and we always write the new before deleting the old one. But in case of dss-1gb we have to erase before writing the new one. This means, due to a power failure we might lose our environment.

fw_setenv is supposed to have the same default environment compiled in, as the u-boot does.... tbd.
open points:
- default u-boot env selects the factory kernel by default, mechanism to re-install rescue if not selected
- mac address stored in u-boot environment, #12048

env default -a

http://stackoverflow.com/questions/20621937/how-do-i-clear-environment-variables-previously-saved-with-u-boot

Default rootfs links

The rescue system has a fallback link compiled in. In case the rootfs is completely broken, it will download a new rootfs from a known location.
To exercise this code path you either need to destroy the rootfs or stop in the rescue-system and trigger the fallback manually

Stop in rescue system:

DSS RESCUE /# ifup eth0
DSS RESCUE /# reflash-dss-rootfs /etc/rescue-system/default_rootfs_download.conf

Alternatively you can try to destroy the rootfs.

From the rescue-system:

dss11e/dss20:

DSS RESCUE /# eval $(grep ROOTFS_DEVICE /etc/rescue-system/machine.conf); echo $ROOTFS_DEVICE
/dev/mmcblk0p3
DSS RESCUE /# dd if=/dev/zero of=$ROOTFS_DEVICE bs=512 count=1024

dss11-1gb/dss11-sdc:

DSS RESCUE /# eval $(grep ROOTFS_DEVICE /etc/rescue-system/machine.conf) echo $ROOTFS_DEVICE
/dev/mtd3
DSS RESCUE /# flash_erase $ROOTFS_DEVICE 0 0

From the rootfs

emmc-based(dss11e/dss20):

# ROOT=$(sed  -n 's/.*root=\([^ ]*\) .*/\1/p' /proc/cmdline); echo $ROOT
# dd if=/dev/zero of=$ROOT bs=512 count=1024
# sync   <- important, will produce lots of error since the filesystem is mounted, remove power at some point
# reboot

nand-based(sdc/1gb):

# ROOT=$(sed  -n 's#.*root=/dev/mtdblock\([^ ]*\) .*#/dev/mtd\1#p' /proc/cmdline); echo $ROOT
/dev/mtd6
# flash_erase $ROOT 0 1000  <- we can play with this number, 0 means erase all
# reboot

TODO (ubiupdatevol to save ubi counters)

Links

Repository:
https://git.digitalstrom.org/bsp/rescue-system

FAQ

I flashed oecore-devel and it keeps reflashing infinitely

oecore-devel is discontinued and no more built, it contains a very old rescue-system. The automatic switch tries to install a compressed rootfs which fails. Pls interrupt the boot process in the rescue shell and create a custom reflash request

see #14566

RESCUE_SYSTEM# cat > /tmp/reflash_request.conf <<EOF
ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-testing-eglibc/images/dss11-1gb-t1/digitalstrom-testing-rootfs-s2016.Y-test02-dss11-1gb-t1.ubifs
EOF
RESCUE_SYETEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf

dss20 disfunct due to xz decompression bug

For some reason streaming a xz compressed images stalls at certain percent value. see #15988

Pls interrupt the boot process in the rescue shell and create a custom reflash request.

RESCUE_SYSTEM# cat > /tmp/reflash_request.conf <<EOF
ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-devel/images/dss20/digitalstrom-devel-rootfs-dss20.ext3
EOF
RESCUE_SYETEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf

Rescue failes to boot main system

######################################
Press a key to enter the rescue shell.
UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "dss11-1gb-t1-rootfs", R/O mode
UBIFS (ubi0:0): LEB size: 516096 bytes (504 KiB), min./max. I/O unit sizes: 4096 bytes/4096 bytes
UBIFS (ubi0:0): FS size: 979550208 bytes (934 MiB, 1898 LEBs), journal size 10452992 bytes (9 MiB, 21 LEBs)
UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0), UUID D80204DE-0B69-4806-B0D1-C085D924F22F, small LPT model
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 409
Status reported to rootfs: ERR_OK
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
Attempting to boot configuration linux-4.1.19-fallback.conf
Can not read /etc/rescue-system/security.conf
Cannot open /proc/atags: No such file or directory
UBIFS (ubi0:0): un-mount UBI device 0
umount: can't umount /: Invalid argument
kexec: Starting new kernel
Bye!

This is caused by a broken kexec version in some versions of recue-system.

To recover do the following steps
  1. stop boot process in u-boot
  2. type setenv bootsel prod
  3. type bootd

This will bypass rescue and boot the main system directly. Normally, the rootfs already contains a fixed version of rescue-system that then is installed automatically.

Inspect content of initramfs

dd if=rescue-devel-initramfs-dss20.cpio.gz.u-boot bs=64 skip=1 | zcat | cpio -it

rescue_gdansk_2018.pdf (1.25 MB) Andreas Fenkart, 04/18/2018 10:22 AM