Rescue System

A minimal Linux system, that is loaded before the main system.

- Select a kernel in /boot on the main rootfs based on priority, verify checksums
- Enable re-flashing the rootfs from a given URL

https://git.digitalstrom.org/bsp/rescue-system (part of this wiki moved to in-project documentation)

Presentation rescue_gdansk_2018.pdf

Install

rescue system should be installed by default on devel/testing feeds, which can be checked from the kernel cmdline

Booted from rescue system and what is its version?

rescue-system appends its version to the cmdline of the 2nd stage kernel
hence you can easily check if rescue-system is installed and working like this:

# for i in in `cat /proc/cmdline`; do echo $i; done
...
rescue-system=2017.K-test01

Reflash of Rootfs

WARNING

You can easily destroy your device following these step. This has not yet been tested widely and is intended to be done in a laboratory environment where you have access to the boot loader and can restore the device by other means. Make sure you backup all your settings, metering data, etc. This will VOID YOUR WARRANTY

Reflash rootfs

# opkg update && opkg install rescue-utils
# reflash_rootfs -h
Usage:
  reflash_rootfs [-u url] [-t feed]
Options:
  -t use a predefined url based on devel/testing/production
  -l list available feeds
  -u specify url to desired rootfs explicitly
  -h print this help
  -n no signature checks

Will create a reflash request file. This file serves as a command to the
rescue system, to erase the existing rootfs and re-install it from given url.

See rescue-system for details

Switching feeds

You can easily destroy your device following these step. If you don't know why this is needed, don't do it. Please read the Warning before continuing.

Enable ssh access in web configurator

# ssh dssadmin@dssip.local
# su
# opkg update && opkg install rescue-utils
# reflash_rootfs -l
# reflash_rootfs -n -t yocto-fieldtest
# reboot

Note: Older production releases for dssip didn't write the rescue-system to the partition. Update to the latest production release first

Issue reflash rootfs from rescue shell

DSS RESCUE /# ifup -a
DSS RESCUE /# install-update -r

USB update

details are described in rescue-system documentation

USB updade with version check

Latest usb-upgrade for devel can be found here
fieldtest/production is not yet building rootfs updates

  1. Unpack the latest zip file into the top level folder of a USB stick. The USB memory stick must have 1GB free disk space.
  2. Plug the USB memory stick into the USB connector of the running digitalSTROM-Server.
  3. The upgrade process will start automatically after a short moment. While updating, the dSS LED will blink blue, but might remain yellow or blue for several seconds.
  4. The system upgrade process is finished once the LED remains green for more than two minutes.
  5. After updating, please check manually for available dSM firmware updates from the configurators 'system / system update' menu. Install the dSM Update if indicated.
# cd /tmp
# wget http://update.aizo.com/feeds/digitalstrom-devel/usb-update/digitalSTROM-devel--upgrade.zip
# cd /media/usb-stick
# unzip /tmp/usb-upgrade/digitalSTROM-devel--upgrade.zip

Forced USB rootfs reflash or Factory Reset via USB

Same as above but since versions are not checked, the image is re-installed upon each power-on. Remove the stick
once the LED is steady blue.

Use in case your rootfs is corrupt, or you want to reset your device to factory settings. Mind that you will lose all your settings

# cd /tmp
# wget http://update.aizo.com/feeds/digitalstrom-devel/usb-update/digitalSTROM-devel--upgrade.zip
# cd /media/usb-stick
# unzip /tmp/usb-upgrade/digitalSTROM-devel--upgrade.zip
# touch digitalstrom-upgrade/forceflashandreset

Mind the extra forceflashendreset, otherwise the same structure as with version check. If you insert the stick wile the system is running it will immediately reboot. But you can also insert the stick first then apply power, use the latter if your rootfs is broken.

After flashing, the system will not continue booting into the new rootfs until the usb stick is removed. This is a consequence of not doing version comparison. It would reflash the same version over and over again in a busy loop, if the stick was not removed.

NOTE:
Remove the stick, once the LED stops blue flashing and turns steady. On systems without LED, you need to guess. Typically flashing is 2 min from power-on, 5min should be safe.

Priority based kernel selection

Selecting target kernel based on priority. This is used for kernel upgrades while still using opkg package based rolling updates. When using full rootfs upgrades the feature is not needed. Still useful for easy deploying of an instrumented kernel or regression testing of new kernel.

See boot-kernel

Install latest rescue-system

Currently switching to swupdate for installing rescue-system, at91 only supported on devel/testing. Imx/dssip supported on all release types.

# swupdate_wrapper.sh -u http://update.aizo.com/feeds/digitalstrom-rescue-testing/images/dss20/rescue-testing-update-dss20.swu

Disable rescue-system

The rescue system itself needs to be updated. If power fails in the worst moment, rescue-system might be corrupted. There checksums for kernel/initramfs to implement a roll-over mechanism. If u-boot detects a corruption it will roll-over to a fallback kernel is able to mount the rootfs, from where rescue-system can be fixed

On devel feed the rescue-system will select a kernel randomly from the list of available kernels. That ensures
that on devel also the fallback kernel is tested periodically. But it does not ensure the u-boot logic implementing roll-over is working. This has to be done manually atm.

at91 / u-boot environment

dss11-1gb does not use double buffering for u-boot environmnent. Hence we should not write the environment frequently. Instead we rely on the crc sum in the uImage. If the checksum is corrupted, we rollover to fallback
boot.

Hence it is sufficient to delete either rescue initrd or kernel:

root@SS-sdc: # cat /proc/mtd 
dev:    size   erasesize  name
mtd0: 00020000 00020000 "bootstrap" 
mtd1: 00040000 00020000 "uboot" 
mtd2: 00040000 00020000 "uboot-env" 
mtd3: 00200000 00020000 "kernel-rescue" 
mtd4: 00200000 00020000 "kernel-prod" 
mtd5: 02000000 00020000 "rootfs-rescue" 
mtd6: 0c800000 00020000 "rootfs-prod" 
mtd7: 01360000 00020000 "config" 
root@dSS-sdc:~# flash_erase /dev/mtd5 0 0
Erasing 128 Kibyte @ 1fe0000 -- 100 % complete 
root@dSS-sdc:~# flash_erase /dev/mtd3 0 0
Erasing 128 Kibyte @ 1e0000 --

It's sufficient to erase either of the rescue- partitions

Then reboot, verify that rescue-system is not enabled using Install.

imx / u-boot with filesystem support

no changes to the environment required, since u-boot can read macro files from filesystems

dss20:
# mount /dev/mmcblk0p2 /mnt/rescue
# (cd /mnt/rescue; mv boot.scr #boot.scr)
dss11e:
# mount /dev/mmcblk0p3 /mnt/rescue
# (cd /mnt/rescue; mv boot.scr #boot.scr)

You can get creative by destroying files mentioned in boot.scr

There should be a boot.scr file on the main partition, that implements the fallback boot, without rescue-

You need to reboot to make sure /proc/cmdline does not show rescue-system anymore

dssip / android image with checksum

Uninstall rescue-system

Testing default environment

Modifying the u-boot environment means erasing, then rewriting it. In the better case (dss11-sdc) we have two environments and we always write the new before deleting the old one. But in case of dss-1gb we have to erase before writing the new one. This means, due to a power failure we might lose our environment.

Depending on paranoia levels random things can happen.

- fw_setenv does not have the same default environment compiled in, as the u-boot does.... tbd.
- ensure default u-boot env still able to boot the fallback kernel by default
- mac address stored in u-boot environment, #12048, lost if env is lost

env default -a

http://stackoverflow.com/questions/20621937/how-do-i-clear-environment-variables-previously-saved-with-u-boot

Install a rootfs dump onto your local dss20

there is not enough space to download the dump onto dss20 then flash need to stream

on your host:

$ nc -l 6000 < rootfs.img

then on your rescue:

DSS RESCUE /# nc 192.168.11.184 6000 > /dev/mmcblk0p3

FAQ

I flashed oecore-devel and it keeps reflashing infinitely

oecore-devel is discontinued and no more built, it contains a very old rescue-system. The automatic switch tries to install a compressed rootfs which fails. Pls interrupt the boot process in the rescue shell and create a custom reflash request

see #14566

RESCUE_SYSTEM# cat > /tmp/reflash_request.conf <<EOF
ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-testing-eglibc/images/dss11-1gb-t1/digitalstrom-testing-rootfs-s2016.Y-test02-dss11-1gb-t1.ubifs
EOF
RESCUE_SYETEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf
RESCUE_SYSTEM# echo ROOTFS_IMAGE_URL=http://update.aizo.com/feeds/digitalstrom-devel/images/dss11-1gb-t1/digitalstrom-devel-rootfs-dss11-1gb-t1.ubifs > /tmp/reflash_reqeust.conf
RESCUE_SYSTEM # ifup eth0
RESCUE_SYSTEM # reflash-rootfs -v /tmp/reflash_request.conf

Inspect content of initramfs

dd if=rescue-devel-initramfs-dss20.cpio.gz.u-boot bs=64 skip=1 | zcat | cpio -it

Inspect content of swu files

# cpio  -it < rescue-devel-update--dss11-sdc-20180611085714.swu 
sw-description
sw-description.sig
postinstall.sh
uImage-kernel--dss11-sdc.bin
rescue-devel-initramfs-dss11-sdc.cpio.gz.u-boot

extract the meta-information only, extracts all files if omitted

# cpio -i sw-description < rescue-devel-update--dss11-sdc-20180611085714.swu

Regressions / known issues

Swupdate on dss11-1gb runs out of RAM during installation (#24160)

swupdate_wrapper uses /tmp (RAM disk) to pre-fetch and also unpack artifacts before installation.
workaround is to pre-fetch the swu file into root folder, then install from filesystem to prevent pre-fetching into RAM

observed symptoms are:

swupdate only emits, then quits:

Licensed under GPLv2. See source distribution for detailed copyright notices.

or installation fails when create snippets file with updated version:

[TRACE] : SWUPDATE running :  [LUAstackDump] : (1) [string] [string "..."]:29: failed to open file: /var/tmp/rescue-system/snippets/rescue-system.version

workaround:

export TMPDIR=/
swupdate_wrapper.sh -u http://update.aizo.com/feeds/digitalstrom-rescue-testing/images/dss11-1gb-t1/rescue-testing-update-dss11-1gb-t1.swu

rescue-system install "Rescue system update failed!"

[INFO ] : SWUPDATE running :  Schedule rescue-system-20180817013043 for removal
wc: rescue-system-20180821013014: No such file or directory
wc: /meta/checksums: No such file or directory

regression existed on devel, in the week of 2018-08-20

change was reverted:
https://git.digitalstrom.org/dss-oe/dss-oe/merge_requests/1083/diffs

apply manually to upgrade again:

# vi /usr/share/lua/5.2/swupdate_handlers/rescuefs_handler.lua

remove this section:

        -- perform postinstallation sha256sum healthcheck
        -- status - did command succeed           
        -- term - did command exit or was it terminated
        -- exit - exit status or signal responsible for termination
        status, term, ret = os.execute("test $(wc -l " .. pkg .. "/meta/checksums) -ge 1")
        if term ~= "exit" then                                  
                notify("Health check terminated with signal: " .. tostring(exit))
                return false
        elseif exit ~= 0 then
                notify("meta/checksums file is empty, corrupted image")  
                return false               
        end

dss20 rootfs update failed

a handler on the rootfs has a bug, you need to update the handler before the update works again
see https://redmine.digitalstrom.org/issues/24742

opkg update
opkg install swupdate

rescue_gdansk_2018.pdf (1.25 MB) Andreas Fenkart, 04/18/2018 10:22 AM