The lack of developer attention does not imply that the 32-bit ARM port has ceased to make economic sense, though. Instead, it has evolved from being one of the spearheads of Linux innovation to a stable and mature platform, and while funding its upstream development may not make sense in the long term, deploying 32-bit ARM into the field today most certainly still makes economic sense when margins are razor thin and BOM costs need to be kept to an absolute minimum. This is why 32-bit ARM is still widely used in embedded systems like set-top boxes and wireless routers.
Running 32-bit Linux on 64-bit ARM systems
Ironically, at these low price points, the DRAM is actually the dominant component in terms of BOM cost, and many of these 32-bit ARM systems incorporate a cheap ARMv8 SoC that happens to be capable of running in 64-bit mode as well. The reason for running 32-bit applications nonetheless is that these generally use less of the expensive DRAM, and can be deployed directly without the need to recompile the binaries. As 32-bit applications don’t need a 64-bit kernel (which itself uses more memory due to its internal use of 64-bit pointers), the product ships with a 32-bit kernel instead.
If you’re choosing to use a 32-bit kernel for its smaller memory footprint, it’s not without risks. You’ll likely experience performance issues, unpatched vulnerabilities, and unexpected misbehaviors such as:
- 32-bit kernels generally cannot manage more than 1 GiB of physical memory without resorting to HIGHMEM bouncing, and cannot provide a full virtual address space of 4 GiB to user space, as 64-bit kernels can.
- Side channels or other flaws caused by silicon errata may exist that haven’t been mitigated in 32-bit kernels. For example, the hardening against Spectre and Meltdown vulnerabilities were only done for ARMv7 32-bit only CPUs, and many ARMv8 cores running in 32-bit mode may still be vulnerable (only Cortex-A73 and A75 are handled specifically). And in general, silicon flaws in 64-bit parts that affect the 32-bit kernel are less likely to be found or documented, simply because the silicon validation teams don’t prioritize them.
- The 32-bit ARM kernel does not implement the elaborate alternatives patching framework that is used by other architectures to implement handling of silicon errata, which are particular to certain revisions of certain CPUs. Instead, on 32-bit multiplatform kernels, we simply enable all errata workarounds that may be needed by any of the cores that may ever run the image in question, potentially affecting performance unnecessarily on cores that have no need for them.
- Silicon vendors are phasing out 32-bit support in the longer term. Given an ecosystem containing a handful of operating systems and thousands of applications, support for 32-bit operating systems (which is more complex technically) is highly likely to be dropped first. For products with longer life cycles, long-term procurement contracts for components available today are usually much more costly than adjusting the BOM over time and using newer, cheaper parts.
- The 32-bit kernel does not implement kernel address space randomization, and even if it did, its comparatively tiny address space simply leaves very little space for randomization. Other hardening features, such as rodata=full or hierarchical eXecute Never attributes, are missing as well on 32-bit, and are not likely to be implemented, either due to lack of support in the architecture, or because of the complexity of the 32-bit memory management code, which still supports all of the different architecture revisions dating back to the initial Linux port running on the Risc PC.
Keeping the 32-bit ARM kernel secure
There are cases, though, where using the 32-bit kernel is the only option, e.g., if the CPUs are in fact 32-bit only (which is the case even for some ARMv8 cores such as Cortex-A32), or when relying on an existing 32-bit only codebase running in the kernel (drivers for legacy peripherals). Note that in such cases, it still makes sense to use the most recent kernel version compatible with the hardware, since we are in fact making an effort to enable some of the existing hardening features on 32-bit ARM as well.
- THREAD_INFO_IN_TASK for v7 SMP cores
The v5.16 release of the Linux kernel implements support for THREAD_INFO_IN_TASK when running on ARMv7 SMP systems. This protects the kernel’s per-task bookkeeping (called thread_info), which lives on the far (and normally unused) end of the stack, against stack overflows which may occur in rare -yet sometimes exploitable- cases where the control flow of the program simply ends up accumulating more state than the stack can hold. (Note that a stack overflow is not the same as a stack buffer overflow, where the overflow happens in the opposite direction.)
By moving thread_info off the stack and into the kernel heap, and by using a special SMP CPU register to keep track of its location, we can mitigate the risk of stack overflows resulting in thread_info corruption. However, it does not prevent stack overflows themselves: these may still occur, and result in corruption of other data structures that happen to be adjacent to the task stack in memory.
- THREAD_INFO_IN_TASK for other cores
For CPUs that lack this special SMP CPU register, we also proposed an implementation of THREAD_INFO_IN_TASK that is expected to land in v5.18. Instead of a special register, it uses a global variable to keep track of the location of thread_info.
Preventing stack overflows from corrupting unrelated memory contents is the goal of VMAP_STACK, which we are enabling for 32-bit ARM as well. When VMAP_STACK is enabled, kernel mode stacks are allocated from the kernel heap as before, but mapped into a different part of the kernel’s address space, and surrounded by guard regions, which are guaranteed to be kept unpopulated. Given that accesses to such unpopulated regions will trigger an exception, the kernel’s memory management layer can step in and terminate the program as soon as a stack overflow occurs, and prevent it from causing memory corruption.
Support for IRQ stacks
Coming up with a bounded worst case on which to base the size of the kernel stack is rather hard, especially given the fact that it is shared between the program itself and any exception handling routines that may be called on its behalf, including interrupt handlers. To mitigate the risk of a pathological worst case occurring, where an interrupt fires that needs a lot of stack space right at a time when most of the stack is already being used by the program, we are also enabling IRQ_STACKS for 32-bit ARM, which will run handlers of both hard and soft interrupts from a dedicated stack, one for each CPU. By decoupling the task and interrupt contexts like this, the likelihood that a well-behaved program needs to be terminated due to stack overflow should be all but eliminated.
With these changes in place, kernel stack overflow protection will be available for all ARM systems supported by Linux, including ancient ones like the Risc PC or Netwinder, provided that it runs a Linux distribution that is keeping up with the times.
However, relying on legacy hardware and software comes with a risk, and even though we try to help keep users of the 32-bit kernel as safe as we reasonably can, it is not the right choice for new designs that incorporate 64-bit capable hardware.