MCU land, part 7: life in full color

Driving a full-color SEPS525 OLED screen and recreating a classic puzzle game on an 8-bit AVR microcontroller.

Feb 24, 2023

In one of the earlier articles, I talked about my quest to recreate a a variety of classic computer games on barebone 8-bit microcontrollers paired with retro displays. The projects were in no way groundbreaking, but they proved to be a great opportunity to teach my kids about game design — and to have some harmless fun.

Today, I’d like to wrap up the saga by adding the final missing chapter: an AVR128DA28-based game that pushes 8-bit tech closer to its limits by driving a full-color 160×128 OLED module (NHD-1.69-160128UGC3). This project proved to be a good opportunity to experiment with optimizations in a setting where a sane person would just reach for a 32-bit MCU. But above all, it was an excuse to reimplement one of my favorite classics: Sokoban.

If you’re not familiar with this puzzle game, you play a a warehouse worker who can push — but not pull - crates on a square game grid. Only one crate can be moved at once. The objective is to get all the crates to their designated locations, but if you’re not careful, it’s easy to end up with a deadlock. You can give it a go online.

The hardware for my version — dubbed Sir Box-a-Lot — is fairly unremarkable; there’s a couple of buttons, a speaker, and a four-contact board-mount “joystick” connected to the MCU in a rather textbook manner. The OLED module (or its SYNCOAM SEPS525 controller) is hooked up via the parallel “8080” interface, a scheme already discussed in an earlier post. So, let’s jump straight to the gotchas on the software side!

Making room for RGB data

The OLED module at the heart of this project has a resolution of 160×128 RGB pixels; the color of each pixel is determined by three bytes of data, corresponding to the intensity of three additive primaries (red, green, and blue). A framebuffer for this display nominally requires 160 × 128 × 3 = 60 kB. There are 16 kB of SRAM available on the AVR128DA28 MCU, so we are coming up quite a bit short. External memory, such as 23LC1024, could be the answer — but shuffling pixel data back and forth over a serial bus would add latency.

The prehistory of computing offers some lessons: in the 1980s and 1990s, engineers worked around such constraints in two clever ways. The first method relied on a reduced-size, indexed palette — akin to GIF images, each of which is constrained to 256 out of 16,777,216 possible RGB values. That said, in this case, I’d need to cut deep: 16 colors would be the safest choice that still leaves some room for other variables and stack.

The other popular trick, still alive today, is chroma subsampling: for example, on ZX Spectrum computers, a high-resolution monochrome bitmap (256×192 pixels) had a low-resolution attribute overlay that defined foreground and background colors for every 8×8 square. This scheme was efficient but restrictive: blocky artifacts cropped up whenever two differently-colored objects had to be painted in close proximity.

In the end, I decided to take a different route. I still divided the screen into a grid of 6×6 pixel cells, but instead of containing arbitrary pixel data, each of the cells can be occupied by one of up to 256 full-color sprites pulled from program memory. To make this more than a glorified text mode, each sprite cell is accompanied by an optional +/- 5 pixel rendering offset for smooth animations:

static struct sprite {
  u8 type:5,                     /* SPRITE_TYPE_*                */
     rgb:3;                      /* Text color (3bpp)            */
  u8 num;                        /* Sprite lookup number         */
  s8 x_off:4,                    /* X offset from grid position  */
     y_off:4;                    /* Y offset from grid position  */
} sprite_cell[26 * 21];

The “type” field also allows the use of monochrome character bitmaps in addition to RGB sprites, reusing the 6x6 font I developed for one of my earlier games; in that case, the “rgb” field is used to select one of eight supported text colors.

The savings are pretty dramatic: the total size of the array is 1.6 kB. The approach is more complicated than a pixel-level framebuffer, but it has some perks: for example, the grid can double as a level map, further reducing the game’s memory footprint. Game world objects on the screen are represented as discrete entries, easily distinguishable and trivial to move around.

Updating the display

Although the scheme is memory-efficient, it brings about another problem: some time is needed to translate the array into RGB values for the OLED module, and even more is needed to push the data out. All in all, I was looking at a refresh rate of around 10-15 fps — enough for Sokoban to be playable, but too low to achieve a polished look.

Luckily, in Sokoban and most other programs of this sort, only a small portion of the screen changes between two frames; large deltas occur only when loading new levels or exiting to a menu — and even then, most pixels remain black (or whatever is your favorite background color).

Consequently, as an optimization, the code keeps track of modified (“dirty”) cells, automatically flagging them from within sprite manipulation and text handling routines. For example, the following routine is used to place a new RGB sprite on the grid:

static void place_rgb(u8 cell_x, u8 cell_y, u8 sprite_no, s8 x_off, s8 y_off) {
  struct sprite *s = get_cell(cell_x, cell_y);
  mark_clean(cell_x, cell_y);
  s->type  = SPRITE_TYPE_RGB;
  s->num   = sprite_no;
  s->x_off = x_off;
  s->y_off = y_off;
  mark_dirty(cell_x, cell_y);
}

These dirty cells are then automatically picked up by a 10 kHz interrupt routine, then incrementally shipped off to the OLED and marked as clean. The process is essentially transparent to the rest of the code: any changes to the sprite grid just magically appear on the screen.

Implementing undo

Many Sokoban moves can lead to deadlocks, so modern clones of the game usually have an “undo” button. This user-friendly feature allows the player to backtrack without having to restart the level over and over again.

The lazy way of implementing undo is to take a snapshot of game state, and then restore it down the line. Alas, for Sokoban, the state is the entire map; with 1.6 kB per snapshot, even with compression, I wouldn’t be able to keep a whole lot of movement history.

A more nuanced solution is to keep a journal of individual state changes, and then play them back in reverse. This, too, is somewhat complicated for Sokoban. Multiple things can happen in a single step: in addition to player movement, a crate may be pushed to a new location. When that happens, the crate may become “solved”, swallowing a target marker and changing its own color; it may become “unsolved”, spawning a marker in its wake; or it might glide across multiple target markers clustered next to each other. Players traversing across markers trigger similarly messy state changes, too.

In the end, I didn’t want to implement complex journaling, so I settled on another trick. In Sokoban, the effects of a move span at most three cells in a straight line extending from the player’s starting position. My undo implementation saves and restores these three cells, plus some gameplay stats. This buys me 128 undo levels for just 1 kB of RAM:

static struct undo_state {
  u8 cx:6, cy:6,              /* Cell segment starting coordinates */
     vertical:1,              /* Cell segment orientation          */
  ...
  u8 cells[3];                /* Cell values prior to movement     */
} undo_stack[UNDO_DEPTH];

Sourcing and storing levels

Sokoban level design is a form of art I can’t imitate, but luckily, there are thousands of community-designed designs available on the internet.

After some playtesting, I initially settled on Sokoban Jr 1 and Sokoban Jr 2 level sets hosted by Borgar Thorsteinsson as a part of his GPL-licensed, web-based implementation of the game. Borgar’s website doesn’t provide attribution for these sets, but based on some other internet hits, I suspect they might have been created by Laura Wheeler. If that’s the case, she did a darn good job: it’s a varied set well-suited for kids, and virtually all the levels are small enough to fit on Sir Box-a-Lot’s game grid. There’s also enough of them (100+) to keep my kids busy for a while.

I was concerned about being able to cram all these levels into the MCU’s non-volatile program memory, but that proved to be a non-issue. I used a very simple and readable packing approach to encode two map cells per byte: the first four bits (i.e., the first hex digit) correspond to the item in the first cell; the next four describe the next cell. Much tighter packing is possible but unnecessary; the current levels take up about 25 kB, leaving room for several hundred more.

A more serious constraint is EEPROM, used to store high scores. For every level, I’m recording three types of achievements, taking up four bytes total. The AVR128DA28 chip has 512 bytes of EEPROM, so I am nearly maxing it out with just 100 levels in tow:

struct level_rec {
  u8  solved_by:4,           /* Player who solved this level first */
      best_pushes_by:4,      /* Player with the lowest push count  */
      best_moves_by:4;       /* Player with the lowest move count  */
  u16 best_pushes:10,        /* Lowest push count                  */
      best_moves:10;         /* Lowest move count                  */
};

(For v2, I decided to add 200 more levels. One option was to leverage the ability of the MCU to self-program its non-volatile program memory — but having proved my point, I took the easy way out with a small standalone serial EEPROM chip.)

And now, the moment you’ve all been waiting for!

Here’s a quick video showing Sir Box-a-Lot v1 in action on a prototyping board; I was recording myself, with the game mounted in a vise to minimize unintended movements, so please accept my apologies for the uninspired takes.

Was it worth it? A pragmatic person would probably say “no”. I invested several hours of work to deal with the limitations of an 8-bit platform when it’d cost me $2 more to use a 32-bit Cortex-M4 MCU with ten times as much built-in SRAM (e.g., ATSAMG55J19A).

At the same time, I did not have to do anything weird or fundamentally hard; the code is pretty simple, maintainable, and its RGB OLED handling primitives can be reused for more essential projects down the line. Plus, there’s some satisfaction in reviving an 8-bit classic on a modern-day 8-bit MCU.

The game turned out to be a huge hit, so a bit later, I ended up creating a custom PCB and put together dedicated page for those who want to replicate the design.

If you liked this article, please subscribe! Unlike most other social media, Substack is not a walled garden and not an addictive doomscrolling experience. It’s just a way to stay in touch with the writers you like.

For yet another gaming project, click here. To check out earlier MCU- and electronics-related articles on this blog, please visit this page.

1 Comment

lcamtuf

Feb 25, 2023·edited Jun 5Author

Final stats:

* Program memory usage: 45,070 bytes (34% full)

* SRAM usage, excluding stack: 3,568 bytes (22% full)

* EEPROM usage: 396 bytes (77% full)

* 10 cell delta attainable "frame rate": 1,000 fps

* Full screen redraw attainable frame rate: 18 fps

In other words, although we're pushing the chip closer to its limits, there's still a good amount of slack.

Expand full comment

lcamtuf’s thing