Rescue System » History » Version 156

« Previous - Version 156/202 (diff) - Next » - Current version
Branislav Katreniak, 05/11/2018 01:16 PM


Rescue System

A minimal Linux system, that is loaded before the main system.

- Select a kernel in /boot on the main rootfs based on priority, verify checksums
- Enable re-flashing the rootfs from a given URL

https://git.digitalstrom.org/bsp/rescue-system (part of this wiki moved to in-project documentation)

Presentation rescue_gdansk_2018.pdf

Switch dssip DSS to devel feed

# enable ssh access in web configurator
ssh dssadmin@dss-ip.local
su
opkg update
opkg install rescue-utils

sed  -n 's/.*\(rescue-.*\).*/\1/p' /proc/cmdline
# run <pre>`rescue-install.sh && reboot</pre>` if no match found

reflash_rootfs -n -t yocto-devel
reboot

Install

rescue system should be installed by default on devel/testing feeds, which can be checked from the kernel cmdline

Booted from rescue system and what is its version?

rescue-system appends its version to the cmdline of the 2nd stage kernel

$ sed  -n 's/.*\(rescue-.*\).*/\1/p' /proc/cmdline
rescue-system=2017.K-test01
$ awk -v RS=" " -v FS="=" '/rescue-system/ {print $2}' /proc/cmdline
2017.K-test01
# for i in in `cat /proc/cmdline`; do echo $i; done
...
rescue-system=2017.K-test01

Alternative:

TODO pls mangle to use case

/usr/lib/kernel/install.d/90-loaderentry.install

if ! [[ ${BOOT_OPTIONS[*]} ]]; then
    read -r -d '' -a line < /proc/cmdline
    for i in "${line[@]}"; do
        [[ "${i#initrd=*}" != "$i" ]] && continue
        BOOT_OPTIONS+=("$i")
    done
fi

to install it explicitly

opkg install rescue-install

Reflash of Rootfs

WARNING

You can easily destroy your device following these step. This has not yet been tested widely and is intended to be done in a laboratory environment where you have access to the boot loader and can restore the device by other means. Make sure you backup all your settings, metering data, etc. In case of problems request help on the irc channel, because this will definitely VOID YOUR WARRANTY

Reflash rootfs

# opkg install rescue-utils
# reflash_rootfs -h
Usage:
  reflash_rootfs [-u url] [-t feed]
Options:
  -t use a predefined url based on devel/testing/production
  -l list available feeds
  -u specify url to desired rootfs explicitly
  -h print this help
  -n no signature checks

Will create a reflash request file. This file serves as a command to the
rescue system, to erase the existing rootfs and re-install it from given url.

This will create a file /var/lib/rescue-system/reflash_request.conf. That folder serves as mailbox for communication with the rescue-system

ROOTFS_IMAGE_URL=url
ROOTFS_IMAGE_FILE=path   (use either IMAGE_URL or IMAGE_FILE)
ROOTFS_IMAGE_BYTES=number (optional)
ROOTFS_SIGNATURE_FILE=path (optional)
ROOTFS_CHECKSUM=number (optional)

Switching feeds

You can easily destroy your device following these step. If you don't know why this is needed, don't do it. Please read the Warning before continuing

# reflash_rootfs -n -t yocto-fieldtest

Production/Fieldtest uses keys that are better protected then devel/testing keys. If you need to switch from production/fieldtest to devel or testing, or reverse, image verification will fail. This is because the rescue system only knows the public key of its own feed. Hence production rescue system can not install devel rootfs that is signed with the less protected devel key. If you have root access you can bypass the test though

Verify the created /var/lib/rescue-system/reflash_request.conf and remove the SIGNATURE URL line

Issue reflash rootfs from rescue shell

DSS RESCUE /# ifup eth0
DSS RESCUE /# reflash-rootfs /etc/rescue-system/default_rootfs_download.conf

Selecting target kernel based on priority

Inspired by the Freedesktop Bootloader Specification
http://freedesktop.org/wiki/Specifications/BootLoaderSpec/

Idea is to drop/remove files into a folder without any re-configuration of the boot loader

The kernel to be booted is selected based on kernel version. Higher version means higher priority. The boot entry has to follow a certain naming schema. The description is ignored.

BOOT_ENTRY_NAME := 'linux-' VERSION '-description' '.conf'
VERSION := x.x.x | x.x

See https://git.digitalstrom.org/bsp/rescue-system/blob/master/src/dss-boot#L251 for how the selection mechanism works

Boot entry files support the following keywords:

IMAGE_FILE[exclusive] := path to zImage/uImage relative to /boot folder
IMAGE_VOLUME_DEVICE[exclusive] := /dev/mtd1, /dev/mmmcblk0p1 if kernel is stored in a raw partition
IMAGE_FORMAT[optional] := 'zImage' / 'uImage' if not obvious from image file name
IMAGE_CHECKSUM[optional] := sha256sum of the image
DTB_FILE := path to devicetree file, relative to /boot folder
DTB_CHECKSUM[optional] := sha256sum of device tree file
ARGUMENTS[optional] := use this a cmdline instead of forwarding u-boot cmdline
ATAGS[optional] := kernel uses legacy ATAGS mechanism instead of device tree files

Manually install your custom built kernel

Copy zImage and devicetree file into /boot folder of the main root filesystem. Then create a boot-entry in folder /boot/entries

$ cat linux-4.4.1-custom.conf
IMAGE_FILE=zImage-4.4.1
DTB_FILE=devicetree-zImage-imx6q-dss11e6x.dtb

Disable a kernel from being selected (kernel downgrade)

If the boot entry file does not match the 'linux-version-description.conf' pattern it will be ignored.

# cd /boot/entries
# mv linux-4.1.19.conf linux-4.1.19.conf_
# reboot

Later re-enable the new kernel:

# cd /boot/entries
# mv linux-4.1.19.conf_ linux-4.1.19.conf
# reboot

switch to specific kernel without reboot (kexec)

# opkg install rescue-utils
# boot-kernel -l
linux-4.1.20-panic-on-oops.conf
linux-4.1.19.conf
# boot-kernel linux-4.1.20-panic-on-oops.conf

USB upgrade with version check

latest usb-upgrade can be found here:
http://update.aizo.com/feeds/digitalstrom-devel/usb-upgrade/

  1. Unzip and copy the directory 'digitalstrom-upgrade' including all files into the root of a USB memory stick. The USB memory stick must have 1GB free disk space.
  2. Plug the USB memory stick into the USB connector of the running digitalSTROM-Server.
  3. The upgrade process will start automatically after a short moment. While updating, the dSS LED will blink blue, but might remain yellow or blue for several seconds.
  4. The system upgrade process is finished once the LED remains green for more than two minutes.
  5. After updating, please check manually for available dSM firmware updates from the configurators 'system / system update' menu. Install the dSM Update if indicated.

Rescue-system does not understand versions. It can only flash files, but version comparison has to happen in the rootfs. Hence the system will boot into the existing rootfs first and issue a reboot from there.

USB stick can remain inserted. Rescue-system will only install the update if rootfs granted the upgrade.

digitalstrom-upgrade/
├── dss20
│   ├── digitalstrom-devel-swupdate-image--dss20-20180425104635.swu
│   ├── rescue
│   │   ├── rescue-devel-update--dss20-20180425104548.swu
│   │   └── update.conf
│   └── update.conf
└── dssip
    ├── digitalstrom-devel-swupdate-image--dssip-20180425105014.swu
    ├── rescue
    │   ├── rescue-devel-update--dssip-20180425104933.swu
    │   └── update.conf
    └── update.conf

content of update.conf:

SWUPDATE_IMAGE_FILE=digitalstrom-devel-swupdate-image--dss20-20180425104635.swu
VERSION=20180425104635

compare VERSION is bigger than the timestamp in /etc/version

Forced USB rootfs reflash

details: rescue-system documentation

Use in case your rootfs is corrupt, or you want to reset your device to factory settings. Mind that you will lose all your settings

Same steps as in previous section, just add an extra empty file. If you insert the stick the system will immediately reboot, you can also insert then power on.

After flashing, the system will not continue booting into the new rootfs until the usb stick is removed. This is a consequence of not doing version comparison. It would reflash the same version over and over again in a busy loop, if the stick was not removed.

NOTE:
Remove the stick, once the LED stops blue flashing and turns yellow/light green. On systems without LED, you need to guess. Typically flashing is 2 min from power-on, 5min should be safe.

# cd /tmp
# wget http://update.aizo.com/feeds/digitalstrom-devel/usb-upgrade/digitalSTROM-devel--upgrade.zip
# cd /media/usb-stick
# unzip /tmp/usb-upgrade/digitalSTROM-devel--upgrade.zip
# touch digitalstrom-upgrade/forceflashandreset
digitalstrom-upgrade/
├── dss20
│   ├── digitalstrom-devel-swupdate-image--dss20-20180425104635.swu
│   ├── rescue
│   │   ├── rescue-devel-update--dss20-20180425104548.swu
│   │   └── update.conf
│   └── update.conf
├── dssip
│   ├── digitalstrom-devel-swupdate-image--dssip-20180425105014.swu
│   ├── rescue
│   │   ├── rescue-devel-update--dssip-20180425104933.swu
│   │   └── update.conf
│   └── update.conf
└── forceflashandreset

Testing

The rescue system itself needs to be updated. To always have a bootable system, the production system must be bootable without rescue-system. That means the u-boot must have a default kernel to select which is able to mount the rootfs.
This is a precaution against power failures during upgrade. This non-rescue boot path is termed "factory or fallback kernel boot"

The alternative, non rescue-system kernel works

During rescue-system upgrade, we configure u-boot to boot using the alternative kernel. Only after a successful update we re-enable the rescue-system. Although the alternative kernel is only used in the error case, we still need to verify it works with the current file system, especially usb upgrade should be tested.

To test this, we need to disable rescue. Verify it is disabled by either observing the console or grep /proc/cmdline as above Install. If it booted correctly without rescue-system continue testing basic functionality, needed to allow another attempt to install rescue-system or provide remote support

# opkg remove --force-depends rescue-install

This will not remove the rescue-system from the raw partitions, it only prevents that it will be re-installed

Temporarily disable the rescue system

From rootfs / Linux

We have two sorts of boot loaders, those with and without filesystem support.

With filesystem support, the boot loader will check the rescue partition for a boot.scr file, if we rename that file rescue is disabled

# mount /dev/mmcblk0p3 /mnt/rescue
# (cd /mnt/rescue; mv boot.scr #boot.scr)

The older models (dss11/d11-1gb) do not yet have filesystem support. The rescue-system has to be enabled in the u-boot environment. The special variable bootsel is used to enable rescue

# fw_setenv bootsel rescue  (enable)
# fw_setenv bootsel         (disable)
From u-boot console
If you have access to the u-boot console, you can bypass it directly:

Permanently remove rescue-system

dss11-sdc

root@SS-sdc: # fw_setenv bootsel   # <- removes rescue from the boot order, u-boot will not try to boot rescue
root@SS-sdc: # cat /proc/mtd 
dev:    size   erasesize  name
mtd0: 00020000 00020000 "bootstrap" 
mtd1: 00040000 00020000 "uboot" 
mtd2: 00040000 00020000 "uboot-env" 
mtd3: 00200000 00020000 "kernel-rescue" 
mtd4: 00200000 00020000 "kernel-prod" 
mtd5: 02000000 00020000 "rootfs-rescue" 
mtd6: 0c800000 00020000 "rootfs-prod" 
mtd7: 01360000 00020000 "config" 
root@dSS-sdc:~# flash_erase /dev/mtd5 0 0
Erasing 128 Kibyte @ 1fe0000 -- 100 % complete 
root@dSS-sdc:~# flash_erase /dev/mtd3 0 0
Erasing 128 Kibyte @ 1e0000 -- 100 % complete

analogous for dss11-1gb

root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# fw_setenv bootsel
root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00400000 00080000 "kernel-rescue" 
mtd1: 00400000 00080000 "kernel-prod" 
mtd2: 02000000 00080000 "rootfs-rescue" 
mtd3: 3d000000 00080000 "rootfs-prod" 
mtd4: 00800000 00080000 "config" 
mtd5: 00080000 00010000 "spi32766.0" 
mtd6: 3b4f0000 0007e000 "dss11-1gb-rootfs" 
root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# flash_erase /dev/mtd2 0 0
Erasing 512 Kibyte @ 1f80000 -- 100 % complete
root@dSS-TT-UpdateTester-dss11-1gb:/home/dssadmin# flash_erase /dev/mtd0 0 0
Erasing 512 Kibyte @ 380000 -- 100 % complete

For u-boot with filesystem support (dss11e/dss20),

# mount /dev/mmcblk0p3 /mnt/rescue
# (cd /mnt/rescue; rm -rf *)

Default environment

Modifying the u-boot environment means erasing, then rewriting it. In the better case (dss11-sdc) we have two environments and we always write the new before deleting the old one. But in case of dss-1gb we have to erase before writing the new one. This means, due to a power failure we might lose our environment.

fw_setenv is supposed to have the same default environment compiled in, as the u-boot does.... tbd.
open points:
- default u-boot env selects the factory kernel by default, mechanism to re-install rescue if not selected
- mac address stored in u-boot environment, #12048

env default -a

http://stackoverflow.com/questions/20621937/how-do-i-clear-environment-variables-previously-saved-with-u-boot

Default rootfs links

The rescue system has a fallback link compiled in. In case the rootfs is completely broken, it will download a new rootfs from a known location.
To exercise this code path you either need to destroy the rootfs or stop in the rescue-system and trigger the fallback manually

Stop in rescue system:

DSS RESCUE /# ifup eth0
DSS RESCUE /# reflash-dss-rootfs /etc/rescue-system/default_rootfs_download.conf

Alternatively you can try to destroy the rootfs.

From the rescue-system:

dss11e/dss20:

DSS RESCUE /# eval $(grep ROOTFS_DEVICE /etc/rescue-system/machine.conf); echo $ROOTFS_DEVICE
/dev/mmcblk0p3
DSS RESCUE /# dd if=/dev/zero of=$ROOTFS_DEVICE bs=512 count=1024

dss11-1gb/dss11-sdc:

DSS RESCUE /# eval $(grep ROOTFS_DEVICE /etc/rescue-system/machine.conf) echo $ROOTFS_DEVICE
/dev/mtd3
DSS RESCUE /# flash_erase $ROOTFS_DEVICE 0 0

From the rootfs

emmc-based(dss11e/dss20):

# ROOT=$(sed  -n 's/.*root=\([^ ]*\) .*/\1/p' /proc/cmdline); echo $ROOT
# dd if=/dev/zero of=$ROOT bs=512 count=1024
# sync   <- important, will produce lots of error since the filesystem is mounted, remove power at some point
# reboot

nand-based(sdc/1gb):

# ROOT=$(sed  -n 's#.*root=/dev/mtdblock\([^ ]*\) .*#/dev/mtd\1#p' /proc/cmdline); echo $ROOT
/dev/mtd6
# flash_erase $ROOT 0 1000  <- we can play with this number, 0 means erase all
# reboot

TODO (ubiupdatevol to save ubi counters)

Links

Repository:
https://git.digitalstrom.org/bsp/rescue-system

FAQ

I flashed oecore-devel and it keeps reflashing infinitely

oecore-devel is discontinued and no more built, it contains a very old rescue-system. The automatic switch tries to install a compressed rootfs which fails. Pls interrupt the boot process in the rescue shell and create a custom reflash request

see #14566

RESCUE_SYSTEM# cat > /tmp/reflash_request.conf <<EOF
ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-testing-eglibc/images/dss11-1gb-t1/digitalstrom-testing-rootfs-s2016.Y-test02-dss11-1gb-t1.ubifs
EOF
RESCUE_SYETEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf
RESCUE_SYSTEM# echo ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-devel/images/dss11-1gb-t1/digitalstrom-devel-rootfs-dss11-1gb-t1.ubifs > /tmp/reflash_reqeust.conf
RESCUE_SYSTEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf

dss20 disfunct due to xz decompression bug

For some reason streaming a xz compressed images stalls at certain percent value. see #15988

Pls interrupt the boot process in the rescue shell and create a custom reflash request.

RESCUE_SYSTEM# cat > /tmp/reflash_request.conf <<EOF
ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-devel/images/dss20/digitalstrom-devel-rootfs-dss20.ext3
EOF
RESCUE_SYETEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf

Rescue failes to boot main system

######################################
Press a key to enter the rescue shell.
UBIFS (ubi0:0): UBIFS: mounted UBI device 0, volume 0, name "dss11-1gb-t1-rootfs", R/O mode
UBIFS (ubi0:0): LEB size: 516096 bytes (504 KiB), min./max. I/O unit sizes: 4096 bytes/4096 bytes
UBIFS (ubi0:0): FS size: 979550208 bytes (934 MiB, 1898 LEBs), journal size 10452992 bytes (9 MiB, 21 LEBs)
UBIFS (ubi0:0): reserved for root: 0 bytes (0 KiB)
UBIFS (ubi0:0): media format: w4/r0 (latest is w4/r0), UUID D80204DE-0B69-4806-B0D1-C085D924F22F, small LPT model
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 409
Status reported to rootfs: ERR_OK
UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops
Attempting to boot configuration linux-4.1.19-fallback.conf
Can not read /etc/rescue-system/security.conf
Cannot open /proc/atags: No such file or directory
UBIFS (ubi0:0): un-mount UBI device 0
umount: can't umount /: Invalid argument
kexec: Starting new kernel
Bye!

This is caused by a broken kexec version in some versions of recue-system.

To recover do the following steps
  1. stop boot process in u-boot
  2. type setenv bootsel prod
  3. type bootd

This will bypass rescue and boot the main system directly. Normally, the rootfs already contains a fixed version of rescue-system that then is installed automatically.

Inspect content of initramfs

dd if=rescue-devel-initramfs-dss20.cpio.gz.u-boot bs=64 skip=1 | zcat | cpio -it

rescue_gdansk_2018.pdf (1.25 MB) Andreas Fenkart, 04/18/2018 10:22 AM