MCU land, part 5: IRQ and chill on Cortex-M7
How to save power and keep track of time on Microchip S70 / E70 / V70 series chips and other 32-bit ARM microcontrollers.
In the previous installment of the series, I showcased a microcontroller-based toy built on top of an 8-bit AVR DA series chip. On the software side, the project employed a hardware timer interrupt as a way to maintain a steady audio sampling rate.
In today’s feature, I wanted to walk through the process of setting up a similar interrupt on a SAM E7x / S7x / V7x series MCU. I introduced these devices two posts ago, hailing them as an accessible yet high-performance alternative to a variety of Linux-based “whole computer” SoCs that are popular in hobby trade.
Interrupts may seem like an oddly specific topic, but on powerful MCUs, they don’t merely assist with repetitive tasks; they are also essential for power management. If you’re writing code for an 8-bit chip, you might be forgiven for letting the MCU spin its wheels to pass the time — and indeed, a simple busy loop is at the heart of the _delay_ms() function familiar to all AVR fans. Yet, doing the same on a power-hungry 32-bit MCU would be an outright crime — so let’s try to devise an alternative.
The SAM RTT subsystem
The SAM series of MCUs offers a variety of general-purpose timers, but the canonical way of generating slow-paced ticks is the Real-Time Timer (RTT). This low-power peripheral has a dedicated IRQ and a corresponding entry in the processor’s interrupt vector table (IVT).
Although the IVT can be modified by hand to register a handler of your choice, if your program contains a function named RTT_Handler(), it will be automatically pulled in. For the curious, this is done with a symbol resolution trick at build time:
void RTT_Handler(void) __attribute__ ((weak, alias("Dummy_Handler")));
In other words, this name — permanently referenced in the IVT structure — always resolves to something. It’s either your code, or absent that, a dummy function that hangs the MCU.
Similarly to timers on the AVR Dx MCUs, the RTT requires the programmer to manually acknowledge the interrupt from within the IRQ routine; otherwise, the event will keep firing constantly and nothing else will be getting done. It suffices to read the RTT_SR register and discard the value, like so:
void RTT_Handler() { RTT->RTT_SR; /* Clear the interrupt */ /* Blink LEDs or take care of other business... */ }
The RTT_SR register is declared as “volatile”, so the compiler will not optimize out the access or complain about this seemingly pointless statement.
With the interrupt handler in place, the next step is to configure the RTT; this is done via the RTT_MR register. Bit #17 (RTTINCIEN) disables (0) or enables (1) the interrupt. The sixteen least significant bits (RTPRES), together with a “1” in bit #18 (RTTRST), change the prescaler (divisor) for the 32.768 kHz clock signal driving the RTT — but this can only be done with the interrupt turned off.
In essence, the initial setup of a 1 Hz interrupt boils down to this:
RTT->RTT_MR = (1<<18) | 32768; /* Bit 18: Load new prescaler */ RTT->RTT_MR = (1<<17) | 32768; /* Bit 17: Enable the interrupt */
Any subsequent prescaler changes would require temporarily clearing the RTTINCIEN bit first.
The Cortex-M NVIC
The next stop is the Nested Vector Interrupt Controller (NVIC), a piece of hardware common to all Cortex-M chips. The NVIC is a rather sophisticated subsystem capable of prioritizing exceptions and preempting their execution when a higher-priority event is received. That said, all we need from it right now is to enable the passthrough of interrupts from the RTT. This is done with a single line of code:
NVIC_EnableIRQ(RTT_IRQn);
For the curious, the function expands to the following register write:
NVIC->ISER[irq_no >> 5] = 1 << (irq_no & 0b11111);
ISER stands for the “interrupt set enable register”; because NVIC supports a lot of interrupt vectors, the bitmap spans multiple 32-bit values, hence the ugly addressing scheme.
The “wfi” opcode
At this point, the RTT interrupt should be up and running — and as hinted earlier, it will be useful not just for event timing, but also for power management.
The simplest battery-saving trick is the __WFI() call, equivalent to asm(“wfi”). This processor opcode disconnects the core CPU clock until the arrival of the next interrupt. At that point, you can decide whether it’s time to do some work, or to call __WFI() once more.
To illustrate, a basic power-saving sleep() routine could be designed the following way:
volatile uint32_t rtt_ticks; void RTT_Handler() { RTT->RTT_SR; rtt_ticks++; } void sleep(unit32_t seconds) { uint32_t start_ticks = rtt_ticks; while (seconds >= rtt_ticks - start_ticks) __WFI(); }
If the ATSAMS70J21B chip is running at 300 MHz, this trick alone should cut its idle power consumption from 65 to 20 mA. Further savings could be realized by temporarily lowering system bus speed (MCK) or turning off unneeded peripherals while in WFI. For example, switching the bus to a 12 MHz clock (PMC->PMC_MCKR CSS field set to 1) drops the sleep mode current to 5 mA.
In practice, the 1 Hz tick resolution is too coarse for some program timing needs. This can be remedied by changing the RTPRES value in the RTT_MR register; say, setting RTPRES to 32 will result in a 1024 Hz tick. The highest attainable RTT frequency is 10922 Hz (RTPRES=3). For more, one could turn to general-purpose timer / counter modules (TC0 through TC3) — with the caveat that they run off the system bus clock (MCK) and are affected by any frequency adjustments made along the way.
The alternate route: SysTick
It should be noted that the RTT will be accurate only if the chip is connected to a 32.768 kHz crystal oscillator. Without an external crystal, an internal “32 kHz” RC oscillator is used, but the oscillator isn’t precision-trimmed, and it has a frequency of 32 kHz in name only. Deviations in excess of +/- 25% are not a rare sight.
To deal with this issue, it’s possible to leverage the SysTick subsystem, which is a part of the Cortex core (in contrast to the RTT peripheral and the TC0-TC3 timers, which are Microchip-specific). In principle, the subsystem functions in a similar way:
void SysTick_Handler() { /* Do stuff... */ } /* Set prescaler (24 bit). This is 5 Hz at default CPU speed. */ SysTick->LOAD = 1200000; /* Internal clock source, interrupt enable, tick enable. */ SysTick->CTRL = 0b111;
The difference is that SysTick runs off the same clock source as the CPU (divided by 2 on the SAM S70 die). On Microchip dies, the main RC oscillator is precision-trimmed and should stay within 5% of the specified frequency.
On the flip side, the subsystem’s timings can be thrown off by CPU speed adjustments. On some other Cortex-M chips, the SysTick signal is also lost in the WFI sleep state — although this is not an issue with this particular MCU.
The only other difference of note is that unlike with the RTT subsystem, the SysTick interrupt is enabled by default, and you don’t need to acknowledge anything from the IRQ handler itself.
Postscript: where do all these macros and functions come from?
At this point, some readers might be scratching their heads: we’re supposed to be programming a bare-metal MCU, yet there are all these register names, macros, and unfamiliar functions cropping up all over the place.
All of this originates with the toolchain. The definitions of registers and their bit fields come from vendor- or community-provided .h files; in the world of Microchip / Atmel products, these are known as Device Family Packs (DFPs). For Cortex-based MCUs, the information is further augmented by a library known as the Common Microcontroller Software Interface Standard (CMSIS).
It is important to note that these specifications do very little and don’t add bloat to your programs. For example, this is the entirety of what happens behind the scenes when you type in “PIOA->PIO_OER = 1 << 3” to enable output on PA3:
#define __O volatile typedef struct { /* ... */ __O uint32_t PIO_OER; /* (PIO Offset: 0x10) ... */ /* ... */ } Pio; #define PIOA ((Pio *)0x400E0E00U)
In effect, you’re writing directly to memory; you’re just spared the need to memorize the address of the register.
The programming environment usually also provides basic library functions such as sprintf() or memcpy(); these originate with small-footprint libraries such as avr-libc (for 8-bit AVR chips) or picolibc / newlib-nano (for ARM). The libraries are designed so that only the code you’re actually using gets pulled into your project; in other words, the convenience comes at practically no cost.
For AVR and SAM MCUs, you might also come across the Advanced Software Framework (ASF). This is a set of higher-level functions for configuring MCU peripherals. I’m unconvinced that the ASF is particularly useful, as it doesn’t grant you reprieve from understanding the hardware. Nevertheless, the ASF is a great source of implementation insights whenever the spec gets too obtuse.
Check out the next article in the series: DMA on Cortex-M7. To review the entire series of articles on digital and analog electronics, visit this page.
For readers wondering if the information in this series applies to other SAM series MCUs from Microchip, the answer is generally "yes". Cortex-M4 chips (SAM E5x, D5x, G5x, and 4x) are particularly close and differ chiefly in performance, price, and the physical ordering of pins.
The applicability to Cortex-M chips from other manufacturers is a more complicated story. The cores are the same, so certain aspects of the tutorial - e.g., SysTick and NVIC - should translate directly. On the flip side, on-die peripherals, such as the GPIO controller or the clock subsystem, tend to be vendor-specific, and will use somewhat different keywords or semantics.
As a trivial example, on a SAM chip, this is how you output "1" on PA5:
PIOA->PIO_ODSR = (1 << 5);
The same on an STM32 chip may look like this:
GPIOA->ODR = (1 << 5);
The differences can go beyond labels. For example, the clock generation and distribution architecture is different on STM32. That said, the same general concepts apply: there are register-selectable clock sources, prescalers, and programmable PLLs. In other words, the knowledge transfers easily, but implementation details do not.