How to use RISC-V custom instructions with Ubuntu

Wait 5 sec.

IntroductionMy previous blog talked about the importance of instruction set standardization for ecosystem stability and growth through the use of profiles. And standardization is indeed important, but since one of RISC-V’s great benefits is the ability to customize the instruction set, we should also consider how to support that ability.This blog looks at what is needed in the software layer to support hardware custom instructions and how you can make that work with Ubuntu.What is a custom instruction?A custom instruction is simply an instruction that is not part of the base instruction set definition or ratified extensions. When RISC-V was created, there was an explicit desire to support innovation at the level of CPU architecture. While much innovation has been done at the microarchitecture level (for example, speculative execution, superscalar pipelines, and so forth), there are very few ISAs that allow for novelty at the architecture level itself. To support this, RISC-V created explicit instruction encoding space for instructions that are not part of the standard ISA or standard extensions. Why customize?In an embedded microcontroller, it’s easy to imagine the benefits of custom instructions. For example in security operations it might be accelerating specific cryptographic operations, or for audio it might be custom DSP acceleration. For example Espressif’s ESP32-P4 provides custom extensions for SIMD DSP.In these systems the implementor usually controls both hardware and software – or at least software toolchain. However when we think of Linux based systems, it is usually a richer environment, with user deployed applications distributed in binary form. Linux can be used for embedded systems where the software is still tightly controlled, and there are increasing numbers of applications where the scale of deployment makes custom silicon viable.In a world where software and hardware codesign becomes more common and companies are creating vertically integrated solutions comprising both hardware and software, customized silicon can address opportunities that may not have been possible before. The most well known example in recent years is probably Apple creating their own laptop silicon, but here are a couple of simple examples too:Custom data types. With machine learning evolving at a very fast pace, using custom data types for a specific application might provide significant benefits in performance or power efficiencyControl and data flow to an external accelerator. RISC-V CPUs are often used alongside custom accelerators, where custom instructions in the host CPU can be used to more efficiently manage the accelerator than connecting it as a simple memory mapped peripheralWhile software compiled for custom hardware will only run on that hardware, it can still be worthwhile for the performance or power benefits.These custom instructions are unlikely to be used by the operating system (OS), but there are situations where the OS needs to know about them. Specifically if the instructions require additional processor state, the OS needs to know about it. Let’s explain that a little further.What is state space?State space is things that persist over time – for example, registers containing data values and status flags reporting on the output of instructions. At the OS level, these are important because the OS needs to be able to save and restore this state across events like interrupts. It is also common that registers need to be enabled by the OS. This can either be at boot time, or on a per-process basis at runtime For example, floating point and vector register files (where implemented) are disabled by default, and it is only after enabling them in the OS that application code can make use of them. Even if an implementation didn’t require the OS to enable user access to a particular state, failing to account for it in the OS is likely to cause data corruption or execution problems. How to support custom data processing instructionsWhere a custom instruction only impacts data processing, but does not require any additional state space, it can be managed without having to modify the OS code. For example, instructions that treat data as a 4-bit datatype but only using the normal integer register file and status registers could be implemented without needing the OS to be aware of them.Building applications to use these instructions needs either precompiled libraries with them implemented, or a custom toolchain that can target those instructions. Ubuntu’s launchpad.net build infrastructure supports custom toolchains using Private Package Archives (PPA). The toolchain can either be built in its own right in its own PPA, or a pre-built binary toolchain can be included as part of the application code source tree. In either case, the application code is then built in turn and made available in the PPA.By combining PPAs to manage customizations with an Ubuntu kernel and general package distribution, users can benefit both from security and maintenance patches from Canonical as well as the performance gains from customized hardwareHow to support custom instructions that need state spaceAs discussed above, custom instructions that require state space are more complex since they need a custom kernel to handle saving and restoring the context around interrupts or permit access from user space to the extra state. Therefore, you will need more than a custom toolchain and application code: you will also need to create a custom kernel. Even here, the launchpad.net infrastructure can still be used to help. Again, the first step is making the custom toolchain available. Canonical has worked with several RISC-V partners and developed an image cookbook which walks through the steps needed to create a custom kernel package. If you’re thinking of building a custom kernel, this should be your starting point.Once your custom kernel is ready, you can proceed as above, building the toolchain, kernel, and user packages.The downside of a custom kernel is that it won’t be maintained by Canonical, so security updates and patches are something you will have to manage yourself for the packages in the PPAs (any standard packages from our main repositories will of course still be supported and updated by Canonical).Best practices for portabilitySo far we have assumed that the software will only ever be run on hardware with the related custom instruction support. While this might be true in embedded systems, for engineers building Linux binary packages this creates software that isn’t portable. Running a binary that assumed a given custom instruction was available will cause an illegal instruction trap on hardware that doesn’t support it.What would be more useful is to write the software in a way that detects at runtime whether the custom instructions are available, and then calls the appropriate code path. This can also be used for standard extensions – for example detecting whether floating point instructions are implemented. If they are, then hardware floating point can be used; if not, then the software can still use a soft floating point implementation, rather than crashing or refusing to run.The mechanism to do this within linux is hwprobe. It’s beyond the scope of this blog to explain all the details, but in short, it provides a mechanism for user-level code to query the kernel and ask what extensions are supported. In turn the kernel will learn from the boot firmware what instructions are implemented on the specific hardware it is running on.Earlier I argued that it’s for stateless instructions it’s not strictly necessary for the kernel to know about them. While true, it would mean this binary code may no longer be portable between different CPU implementations which have different extensions implemented. Using descriptions of the extensions in the firmware with a kernel that can identify them and hwprobe provides a more scalable way to support both stateful and stateless custom instructions. ConclusionWe have discussed how Ubuntu’s launchpad.net infrastructure can be used to support custom instructions, whether simpler data processing only, or more complex ones involving state space. This shows how RISC-V’s promise of allowing innovation through customization works in a complex Linux environment. While chips with custom instructions are less likely to be generally available to developers than vanilla RVA23 designs, it is almost certain there will be applications where Linux + custom RISC-V provides benefits to justify the investment. These will be high volume, high performance applications – for example networking, storage management or AI inference.Canonical works directly with silicon companies to provide optimized open source solutions. Our Silicon partner page describes more about our partner program or if you’re ready to work with us please get in touch Further readingUbuntu image cookbookEspressif ESP32-P4 Technical reference manualRISC-V profiles documentation