Dorian 6cfb8082c5 fix: xorriso append_partition for real USB boot + grub-mkstandalone
Root cause of USB boot failure: our xorriso used -e boot/grub/efi.img
to embed the EFI image inside the ISO. This works for CD-ROM and QEMU
but NOT for USB on real UEFI hardware.

Fix: use the Will Haley / Debian live-build approach:
- -append_partition 2 (GPT type EFI) appends efi.img AFTER ISO data
- -e --interval:appended_partition_2:all:: references the appended partition
- --mbr-force-bootable forces MBR active flag
- grub-mkstandalone with embedded bootstrap config (searches for grub.cfg)
- grub.cfg placed in both /boot/grub/ AND /EFI/BOOT/ on ISO
- grub.cfg uses search --label ARCHIPELAGO to find the ISO root

This is the exact approach used by StartOS, TAILS, and every production
custom Debian live ISO that boots from USB.

Also: iso-debug, iso-branding skills + reference docs, dev-start.sh
option 0 for branding dev, improved dev-branding.sh and test-iso-qemu.sh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:34:29 +00:00

176 lines
6.9 KiB
Markdown

---
name: iso-debug
description: Diagnose and fix Archipelago ISO boot failures. Covers hybrid MBR/GPT, UEFI/BIOS boot chains, live-boot initramfs, GRUB/ISOLINUX configuration, xorriso packaging, and USB boot compatibility. Use when ISO doesn't boot, installer doesn't start, kernel panics, or USB isn't recognized by BIOS/UEFI.
allowed-tools: Bash, Read, Grep, Glob, Agent, Edit
---
# ISO Boot Debugging — Archipelago Custom Base
Systematic diagnosis of ISO boot failures for the Archipelago debootstrap-based installer.
## Architecture
The ISO boot chain has 5 stages. Failure at any stage has distinct symptoms:
| Stage | Component | Symptom if broken |
|-------|-----------|-------------------|
| 1. BIOS/UEFI recognition | Hybrid MBR + GPT | USB not in boot menu at all |
| 2. Bootloader | ISOLINUX (BIOS) or GRUB EFI (UEFI) | Black screen after selecting USB |
| 3. Kernel + initramfs | vmlinuz + initrd.img with live-boot | Kernel panic or initramfs shell |
| 4. Root filesystem | live-boot mounts filesystem.squashfs | "No root device" or blank screen |
| 5. Installer | systemd service + auto-install.sh | Boots to shell but no installer prompt |
## Stage 1: USB Not Recognized
**Most common cause**: Wrong MBR code in the ISO hybrid boot sector.
### Diagnosis
```bash
# Compare first 16 bytes of working vs broken ISO
xxd -l 16 working.iso
xxd -l 16 broken.iso
# Check for valid boot signature at offset 510
xxd -s 510 -l 2 broken.iso
# Must show: 55aa
```
### Known MBR codes
- `4552` — Debian Live MBR (extracted from Debian Live ISO). **Works on all tested hardware.**
- `33ed` — ISOLINUX package generic isohdpfx.bin. **Does NOT work on some UEFI hardware.**
### Fix
The project ships the proven MBR at `image-recipe/branding/isohdpfx.bin` (432 bytes, starts with `4552`).
Build script uses it via: `-isohybrid-mbr "$SCRIPT_DIR/branding/isohdpfx.bin"`
### xorriso flags that matter
- `-isohybrid-mbr <file>` — Embeds MBR code for USB hybrid boot
- `-isohybrid-gpt-basdat` — Adds GPT partition entry for EFI (REQUIRED for UEFI USB boot)
- `-partition_offset 16` — Reserves space for GPT table (REQUIRED — without this some UEFI firmware won't see the USB)
- `-eltorito-alt-boot -e boot/grub/efi.img -no-emul-boot` — EFI boot catalog entry
### Balena Etcher
Writes raw ISO to USB — no special formatting. If the ISO boots in QEMU but not on hardware, the MBR code is the issue, not Etcher.
## Stage 2: Bootloader Failure
### BIOS path: ISOLINUX
Required files in ISO: `isolinux/isolinux.bin`, `isolinux/ldlinux.c32`, `isolinux/boot.cat`
Config: `isolinux/isolinux.cfg`
### UEFI path: GRUB
Required files: `EFI/BOOT/BOOTX64.EFI`, `boot/grub/efi.img`, `boot/grub/grub.cfg`
The EFI image is a FAT32 filesystem containing the GRUB binary, built with:
```bash
grub-mkimage -O x86_64-efi -o BOOTX64.EFI -p /boot/grub \
part_gpt part_msdos fat iso9660 udf normal boot linux search \
search_fs_uuid search_fs_file search_label configfile echo cat \
ls test true loopback gfxterm gfxmenu font png all_video video \
video_bochs video_cirrus efi_gop efi_uga
```
**Critical**: `all_video`, `efi_gop`, `efi_uga` needed for display on real hardware.
### Diagnosis
```bash
# Mount ISO and verify files
sudo mount -o loop,ro broken.iso /mnt
ls -la /mnt/isolinux/
ls -la /mnt/EFI/BOOT/
cat /mnt/boot/grub/grub.cfg
cat /mnt/isolinux/isolinux.cfg
sudo umount /mnt
```
## Stage 3: Kernel / Initramfs
### live-boot
The initramfs must contain live-boot hooks. Without them, the kernel boots but can't find root.
**Kernel params required**: `boot=live components`
- `boot=live` — triggers live-boot's initramfs scripts
- `components` — tells live-boot to scan live/ for squashfs files
### Verify initramfs has live-boot
```bash
TMPDIR=$(mktemp -d)
unmkinitramfs /path/to/initrd.img $TMPDIR
# live-boot installs scripts/live as a FILE (not directory)
ls -la $TMPDIR/scripts/live # or $TMPDIR/main/scripts/live
file $TMPDIR/scripts/live # Should say "ASCII text"
```
### Common initramfs failures
1. **live-boot not installed**: debootstrap `--include` can't resolve its deps. Must install via `chroot apt-get` after debootstrap.
2. **Broken initramfs from container build**: `update-initramfs` needs `/proc`, `/sys`, `/dev` mounted in the chroot.
3. **scripts/live is a FILE not directory**: Verification code must use `[ -e ]` not `[ -d ]`.
## Stage 4: Root Filesystem
live-boot searches for squashfs files in `live/` on the boot media.
- Mounts boot media (USB/CDROM) at `/run/live/medium`
- Finds `live/filesystem.squashfs`
- Mounts it read-only, creates tmpfs overlay
- pivot_root into the combined root
### Diagnosis
If you get an initramfs shell prompt `(initramfs)`:
```bash
# Inside initramfs shell:
ls /run/live/medium/ # Is boot media mounted?
ls /run/live/medium/live/ # Is squashfs there?
cat /proc/cmdline # Does it have boot=live?
```
## Stage 5: Installer Not Starting
The installer auto-starts via:
1. Getty auto-login on tty1 (root, no password)
2. systemd service `archipelago-installer.service`
3. Wrapper script searches for boot media at: `/run/live/medium`, `/run/archiso`, `/cdrom`
### Diagnosis
If you get a shell but no installer prompt:
```bash
systemctl status archipelago-installer.service
cat /usr/local/bin/archipelago-start-installer
ls /run/live/medium/archipelago/auto-install.sh
```
## Quick Verification Checklist
Run against any ISO before flashing:
```bash
ISO=path/to/iso
MNT=$(mktemp -d)
sudo mount -o loop,ro $ISO $MNT
echo "=== MBR ===" && xxd -l 4 $ISO
echo "=== Boot sig ===" && xxd -s 510 -l 2 $ISO
echo "=== Files ===" && for f in live/vmlinuz live/initrd.img live/filesystem.squashfs isolinux/isolinux.bin EFI/BOOT/BOOTX64.EFI boot/grub/grub.cfg archipelago/auto-install.sh; do [ -e $MNT/$f ] && echo "OK: $f" || echo "MISSING: $f"; done
echo "=== Kernel params ===" && grep "boot=live" $MNT/boot/grub/grub.cfg && echo OK || echo MISSING
echo "=== live-boot ===" && INITRD=$(mktemp -d) && unmkinitramfs $MNT/live/initrd.img $INITRD 2>/dev/null && ([ -e $INITRD/scripts/live ] && echo "OK" || echo "MISSING")
sudo umount $MNT
```
## Key Files
| File | Purpose |
|------|---------|
| `image-recipe/build-auto-installer-iso.sh` | Main build script (~2600 lines) |
| `image-recipe/branding/isohdpfx.bin` | Proven MBR code (432 bytes) |
| `image-recipe/branding/grub-theme/` | GRUB theme (theme.txt + background.png) |
| `image-recipe/branding/plymouth-theme/` | Plymouth boot splash |
| `.gitea/workflows/build-iso-dev.yml` | CI workflow with smoke test |
| `image-recipe/test-iso-qemu.sh` | QEMU testing script |
| `image-recipe/dev-branding.sh` | Quick branding iteration (patch + repackage) |
## Infrastructure
| What | Where |
|------|-------|
| CI runner | gitea-runner.service on 192.168.1.228 |
| ISO builds | FileBrowser at http://192.168.1.228:8083 → Builds/ |
| Dev branch | dev-iso (separate CI: build-iso-dev.yml) |
| Main branch | main (CI: build-iso.yml) — DO NOT break |