Booting Ethernut 3

Original Turtelizer
(RS-232)

Turtelizer 2 (USB)

This document gives detailed information about the different possibilities to boot Ethernut 3.

Readers should be familiar with the Ethernut 3 JTAG Interface. There are several programming tools available. Here we will exemplarily use the Turtelizer and Turtelizer 2 adapters.

Nut/OS Version 4.2 or later is required, because earlier versions do not provide the required linker scripts.

Application Memory

There are three general memory environments, that an application may run in:

Loaded directly into and running in RAM.
Loaded from Flash into RAM and running in RAM.
Running in Flash, using RAM for data only.

Specialized applications may use a mixture of the listed memory layouts.

Applications loaded into and running in RAM

This is the optimal procedure during development, because it is fast, doesn't wear out flash memory and simplifies on-chip debugging.

However, as RAM is volatile memory, which loses its contents if the poser supply is removed, the application needs to be uploaded somehow. There are actually two way to do this:

Using a JTAG adapter
Using a boot loader

The first method requires additional hardware and special software. Initially the Turtelizer programming adapter had been designed for this purpose, which connects to the desktop computer via RS-232. Later this had been replaced by the Turtelizer 2, which uses USB instead. It is included in the Ethernut 3 Starter Kit, available from www.egnite.de .

No additional hardware is required for the second method, using a boot loader. This is software, which had been programmed into the non-volatile flash memory previously. It starts running each time the Ethernut board is powered up or reset, loads the application code from a specific source into RAM and jumps to it's start entry, typically located at memroy address 0x00000000.

Boot loaders may receive the application code from almost any interface or non-volatile memory. For Ethernut 3 the following sources are available:

RS-232 interface
The code is loaded from a device attached to the RS-232 interface, typically a PC, by using a specific protocol, typically X-, Y- or Z-Modem. A special software, like a terminal emulator running on a PC, is required to feed the boot loader.
Ethernet interface
Any protocol may be used. TFTP had been designed for this purpose. A TFTP client is simple and needs a few kBytes only. However, a TFTP server is required to which the client can connect to and request the code image. Alternatively the boot loader may run any TCP/IP server, like HTTP. In this case only a web browser is required to upload the application code.
MultiMedia or SD Card
Loading applications from a memory card may not be trivial and may require more code than any other method described here. However, it is very flexible and easy to handle, because the card can be written with any PC, using a low cost card reader.
Flash memory
This might look a bit weird first, because if the application is available in flash already, it may not need a boot loader at all. However, in such configurations the boot loader is additionally able to replace the application image in flash memory. These firmware updates may be loaded through any other interface described above.

Applications loaded from Flash into RAM

Actually this is similar to the boot loader, which loads the application from flash. However, in this case the transfer takes place in the application code itself, during runtime initialization.

If firmware updates are required, the application itself must provide the code to program its new image into flash memory.

Flash memory programming is quite slow, which makes this method less appropriate for program development. Furthermore, the ARM CPU supports two hardware breakpoint registers only. Adding more during on-chip debugging requires re-programming of the related flash memory sectors. Not all debuggers support this.

Applications running in Flash

While Ethernut 3 applications running in RAM are limited to 256 kBytes (program and data space), larger applications may be executed directly from flash memory. This offers up to 4 MBytes of program space and the whole RAM is available for variable data. However, when running in flash, applications execute at reduced speed for two reasons.

Flash memory is slow. Four additional CPU wait cycles are added when accessing this memory type.
The external memory interface of the ARM7 CPU is 16-bit only. Thus, two memory accesses are required for each CPU instruction.

Execution speed may be reduced by a factor of 8. It should be noted, that the ARM7TDMI used on Ethernut 3 can run in a special 16-bit mode, called thumb mode. It uses a reduced instruction set and requires a single 16-bit instruction fetch only. However, this mode isn't yet available in the Nut/OS environment.

Mixing memory layouts

To summarize: Layout #1 is preferred during development, because turn around cycles (changing code, compiling, linking and testing) are short and the code runs at maximum speed. Layout #2 is available for small to medium sized applications, which will automatically run after power up at maximum speed. Layout #3 allows to run very large applications, but execution speed is significantly reduced.

These are general layouts and mixtures are possible. For example, a large application like an Internet Radio with extended graphic display functions, Web interface etc. may run most of the time in flash memory, but run some performance critical parts like MP3 decoding in RAM. However, so far no linker scripts and startup files exist, which support this. If you do not know, what this is all about, then the next chapter will be even more interesting for you.

Linking and Runtime Initialization

The compiler mainly translates C code to machine instructions and doesn't care much about memory layouts. However, it prepares the process by splitting code and data into several segments.

Actually, modern compilers allow the developer to specify any number of different code and data segments, but most applications work fine with the three basic types:

.text contains the program code. It may be located in RAM or flash.
.data contains initialized data. May be initially located in flash, but must be loaded into RAM before the application starts.
.bss contains uninitialized global data in RAM, which is cleared to zero before the application starts.

Looking to these three segment types it should soon become clear, that they need to be handled differently for our three memory layouts, which were presented in the previous chapter.

When the application is directly loaded into RAM, all three segments are as well and no specific copying is required. However, as it is faster to clear a RAM area to zero than to transfer a block of zeros into it, the .bss segment is not part of the program image. Instead, tt will be cleared by the so called runtime initialization.

Storing the application in flash memory means, that the .text and .data segments are stored there. The runtime initialization will either copy both or just the .data segment into the RAM area, depending on wether the code should run in RAM or flash.

What exactly is the runtime initialization? Nothing mysterious really, just a piece of code that is placed in front of the application code. Though possible to be written at least partly in high level C, it is usually fully done in assembly language. While used in all applications, it makes a lot of sense to keep it as compact and fast as possible. Further, a lot of low level stuff is required, which would also make C code non-portable and quite cryptic. Beside copying segments, another tasks of the runtime initialization is to early initialize the basic hardware like PLL clocks, memory remapping and the like.

At the time of this writing, Nut/OS offers three major runtime initializations for the Ethernut 3 Board.

crtat91_ram.S is used for applications, which are directly loaded into RAM, either by using a bootloader or by using the JTAG interface.
crtat91_boot.S allows to store the application in flash, but run it in RAM.
crtat91_rom.S supports application code running in flash memory.

The source code of all runtime initialization files is located in the Nut/OS source tree in subdirectory nut/arch/arm/init/. Even if you are not familiar with ARM assembly programming, you will find valuable things inside. Like the ARM remapping table, which can be easily modified in order to change the memory address of the Ethernet Controller, modify wait cycles or add your own memory mapped peripherals. Note, that in order to make use of any changes, you need to re-build both, Nut/OS and your application.

As explained above, the compiler breaks the application into segments and the runtime initialization copies and/or clears certain memory segements. The last missing link is ... the linker. When running make to build a Nut/OS application, the compiler will be called first for each source code file, also known as modules. Then the linker is called to put it all together, the modules of the compiled application and the Nut/OS libraries. While doing this, the linker collects the different segments and calculates their sizes and start addresses.

The Ethernut 3 Memory Map shows the start addresses and sizes of RAM and flash memory and of course the linker needs this information as well. It further needs to know, which segment should go into which memory type, flash or RAM. This information is provided by the linker script.

You probably guessed it, there are three major linker scripts for Ethernut 3.

at91_ram.ld for applications loaded into and running in RAM.
at91_boot.ld for applications, which are stored in flash, but run in RAM.
at91_rom.ld for applications running in flash memory.

The Nut/OS Configurator is used to select the proper linker script. The associated runtime initialization will be selected automatically.

Then rebuild the build tree and your application, in this order.

The Ethernut 3 linker scripts are located in subdirectory nut/arch/arm/ldscripts.

Flash Loading

Programming the flash memory chip on Ethernut 3 is required, if we want to upload applications, which are automatically started at power-up. The same is true for boot loaders, which are, in fact, special applications.

As it will turn out, this part is the most difficult, particulary if you are used to devices devices with internal flash memory like the AVR or the AT91SAM families.

The first problem is, that the flash memory chip doesn't provide any serial interface like JTAG or SPI for reprogramming. Instead the memory bus is used to modify its contents. Thus, some software must run on the Ethernut CPU to access the flash memory chip.

We can think of three solutions, at least. All of them use the JTAG interface.

Loading an application via JTAG into RAM, which contains the flash code image to be transfered to flash memory. A file system like UROM will simplify this task.
Loading a so called flasher via JTAG into RAM, which receives the flash code image from another interface, including the JTAG COMM Channel.
Using the JTAG interface to execute simple instruction sequences to indirectly access the flash memory chip.

Using the UROM File System

This simple solution had been demonstrated previously by the XSVF Executor. This tool, a simple Nut/OS application, is used to load an XSVF image into the Ethernut's on-board CPLD. The XSVF file is stored in the UROM file system, which is part of the Executor's binary program image. The whole binary is loaded via JTAG into the internal RAM of the AT91R40008 CPU and started. Instead of transfering the UROM file to the CPLD, we could burn the flash Chip.

You may also take a look to the HTTP Server Sample, which is included in the Nut/OS distribution. It shows how to create and use the UROM file system.

Using a Flasher

Large applications, however, will not fit in the internal RAM. As an alternative, we can use the PHAT File System to load the image from MMC or SD Card.

And, of course, we can use other interfaces as well to receive the image to be flashed. The most interesting is the JTAG interface itself.

Enjoy,
Castrop-Rauxel, March the 7th, 2008