Skip to content

Latest commit

 

History

History
479 lines (367 loc) · 25.6 KB

LabPart1.md

File metadata and controls

479 lines (367 loc) · 25.6 KB

Embedded Systems

Coursework 2 Part 1: Real time systems

This coursework is based on a music synthesiser. You need to write the embedded software to make it work. Several real time tasks will need to be executed concurrently, such as detecting key presses and generating the output waveform.

The first part of the lab notes will lead you through the implementation of some of the core features of the synthesiser, including:

  • Scanning the key matrix to find out which keys are pressed
  • Using an interrupt to generate a sawtooth wave
  • Using threads to allow the key scan and display tasks to be decoupled and executed concurrently

The inputs to the synthesiser are 12 keys (1 octave), 4 control knobs and a joystick. The outputs are 2 channels of audio and an OLED display. There is a serial port for communication with a host via USB, and a CAN bus that allows multiple modules to be stacked together to make a larger keyboard.

The keyboard is controlled using a ST NUCLEO-L432KC microcontroller module that contains a STM32L432KCU6U processor, which has an Arm Cortex-M4 core.

Features of the synth module

1. Load the starter code onto the keyboard

The development flow for the project is based on Platformio, which is an IDE customisation for Visual Studio Code. Platformio can target many different embedded platforms using different frameworks and libraries. We will use the STM32duino framework, which provides an Arduino-like environment that makes it easy to access microcontroller hardware features. The overall development stack looks like this:

Synthesiser development stack

Get started by installing Platformio and forking the starter code

  1. Install Visual Studio Code, if you don’t have it already, and add Platformio from the website, or by searching for it in the VS Code extensions marketplace
  2. Fork the starter code from GitHub. You can use the GitHub extension for VS Code, use git from the command line or any other client, or download the zipped project files from GitHub. Open the project folder in VS Code
  3. Switch to the Platformio Home tab with the 🏠 button on the bottom toolbar and select the libraries view. Search for the U8g2 display driver library, select the latest version and add it to the project.
  4. (Windows only) Install the STLINK driver. A copy is available in Teams if you want to avoid creating an account.
  5. Connect the microcontroller module on the synth to your computer with a USB cable.
  6. Compile and the code and flash it to the MCU using the → button on the bottom toolbar
  7. The ‘Hello World’ message should appear on the OLED display.
  8. Open the serial monitor using the 🔌 button on the toolbar. Press the reset button on the synth (SW19) or the MCU module (B1) and you will see the `Hello World’ message on the terminal

Hello World

Tip

The STM32duino framework is useful because you can use familiar functions to access hardware and get access to useful libraries. However, it does not include libraries for all the hardware modules and you may need to edit the framework source code to unlock advanced features, such as DMA and the full DAC resolution.

STM32duino is built on top of STM32Cube, which is the manufacturer’s hardware abstraction layer (HAL) for STM32 microcontrollers. You can access the HAL in STM32duino just by including the relevant header files, but a few things require edits to the STM32duino files.

You can also build a project with STM32Cube from scratch. You can compile STM32Cube projects in Platformio, but you will probably need to start by generating initialisation code using STM32CubeMX, which is a GUI-based tool. You can also use STM32CubeIDE, which is the manufacturer’s Eclipse-based IDE, as an alternative to VS Code and Platformio.

The libraries used in these lab instructions are based on STM32duino, so they won’t work with STM32Cube. You will need to locate or create ports for libraries, for example by defining callbacks for U8g2 to access I2C and GPIO hardware

2. Read inputs

The keys and knobs on the keyboard module are connected to a key matrix, which allows many keys to be read with a small number of microcontroller pins.

Synthesiser key matrix

[RA2,RA1,RA0] C0 C1 C2 C3
0 Key C Key C♯ Key D Key D♯
1 Key E Key F Key F♯ Key G
2 Key G♯ Key A Key A♯ Key B
3 Knob 3 A Knob 3 B Knob 2 A Knob 2 B
4 Knob 1 A Knob 1 B Knob 0 A Knob 0 B
5 Knob 2 S Knob 3 S Joystick S West Detect
6 Knob 0 S Knob 1 S Unused East Detect
7 Unused Unused Unused Unused
  1. Read a single row of the switch matrix

    1. Write a function that will read the inputs from the four columns of the switch matrix (C0, C1, C2, C3) and return the four bits as a C++ bitset, which is a fixed-sized vector of Booleans.

      #include <bitset>
      …
      std::bitset<4> readCols(){
      std::bitset<4> result;
      result[0] = …
      }

      Add lines to the start of your function to set each row select address (RA0, RA1, RA2) low and the row select enable (REN) high. This will drive R0 low and allow you to read Row 0, notes C–D♯. Use the function DigitalWrite() to set the outputs and DigitalRead() to read the inputs.

    2. Modify the main loop of the function to call the readCols() function and print the result on the OLED display at coordinates (2,20).

      std::bitset<4> inputs = readCols();
      u8g2.setCursor(2,20);
      u8g2.print(inputs.to_ulong(),HEX); 

      You will need to replace the existing statement u8g2.print(count++);, which prints the iteration count

    3. Upload and test your code. Pressing each of the four left-most keys of the keyboard should change the number that is displayed on the screen. The keys read as logic 0 when they are pressed so if you press all four of the keys the number will change to 0.

  2. Read all the keys

    1. Write a function that will select a given row of the switch matrix by setting the value of each row select address pin. Disable (set low) the row select enable before the row select address pins are being changed, then enable it again at the end of the function. This prevents glitches as the address pins are changed.

      void setRow(uint8_t rowIdx){
      …
      }

      Remove the lines that control the row select addresses and row select enable from the readCols() function.

    2. In the loop() function, create a bit vector that will store the state of each element in the key matrix

      std::bitset<32> inputs;

      Place a for loop around your call to readCols(). This will be the key scanning loop and it should loop over the row numbers 0 to 2. For each row, it should set the row select address then read the columns and copy the results into the inputs bitset

      inputs has 32 elements because the key matrix has 32 elements, but we’ll only read elements 0 to 11 for now because that range covers the 12 music keys.

      The switch matrix columns take some time to switch from logic 0 to logic 1 when the row select changes due to parasitic capacitance. Add a small delay inside your loop between the calls to setRow() and readCols():

      delayMicroseconds(3);
    3. Upload the code and you should now see a 3-digit hexadecimal number representing the state of all 12 keys. Check that each key press is detected.

Tip

The complete assignment of microcontoller pins in the synth module is as follows:

Starter code name STM32duino name MCU pin Function
RA0_PIN D3 PB0 Row select address bit 0
RA1_PIN D6 PB1 Row select address bit 1
RA2_PIN D12 PB4 Row select address bit 2
REN_PIN A5 PA6 Row select enable
C0_PIN A2 PA3 Key matrix column 0
C1_PIN D9 PA8 Key matrix column 1
C2_PIN A6 PA7 Key matrix column 2
C3_PIN D1 PA9 Key matrix column 3
OUT_PIN D11 PB5 Multiplexed output for display enable and handshaking signals
OUTL_PIN A4 PA5 Analogue audio output left
OUTR_PIN A3 PA4 Analogue audio output right
JOYX_PIN A0 PA1 Analogue joystick input X
JOYY_PIN A1 PA0 Analogue joystick input Y
D4 PB7 Display I2C SDA
D5 PB6 Display I2C SCL
D10 PA11 CAN bus RXD
D2 PA12 CAN bus TXD
LED_BUILTIN LED_BUILTIN PB3 LED LD3
D0 PA10 Knob change interrupt (V2.x keyboard only)

3. Generate Sound

The next basic function of the keyboard is to generate sound. We will begin by generating a sawtooth wave with a frequency according to the key that is pressed.

Most digital systems for generating and processing signals use a constant sample rate: we will use a sample rate $f_\mathrm{s}$ of 22kHz. A fixed sample rate means that we can change the frequency of the note by changing the number of samples that make up one period of the waveform.

Illustration of generating sawtooth waves with a constant sample rate

Therefore, we need to convert each note frequency into a step size for a phase accumulator. Over time, the phase accumulator will count up until it overflows and starts again. Each overflow of the phase accumulator represents one period of the output waveform. Increasing the step size causes the phase accumulator to overflow after fewer sample periods and therefore the frequency is higher.

We will use a 32-bit unsigned phase accumulator because that is the word size of the CPU. That means it will overflow with a modulus of $2^{32}$ and the step size $S$ required to achieve a certain frequency $f$ is given by:

$$S=\frac{2^{32}f}{f_\mathrm{s}}$$

  1. Define an array of the phase step sizes required for each of the 12 notes of your keyboard. Since these values will be constants, use a const array initialiser of the form:

    const uint32_t stepSizes [] = { … };

    You could also use constexpr to evaluate the step sizes in your code during compilation.

    Configure your keyboard to use equal temperament, which means that the difference in frequency between adjacent notes is a factor of $\sqrt[12]{2}$. Therefore, a span of 12 keys results in a doubling of frequency, which is one octave. Base your tuning on a frequency of 440Hz for the note A, which is the 10th key from the left of your keyboard and element 9 of your notes array.

    keyboard

  2. Add code to your main loop that will check the state of each key in inputs and look up the corresponding step size in the stepSizes array if the key is pressed. Store the result in a global variable:

    volatile uint32_t currentStepSize;

    This variable will be accessed by more than one concurrent task, so it is declared with the keyword volatile. This instructs the compiler to access the variable in memory each time it appears in the source code. Otherwise, the compiled code may keep a copy of the variable in a CPU register and miss updates made by other tasks.

    You will only be able to play one note at once at first, so if multiple keys are pressed just use the step size from the last key to be checked. If no keys are pressed then the step size should be set to zero.

    Add information to the OLED display to show which note is selected.

    Keyboard with key press detection

  3. Write a new function that will update the phase accumulator and set the analogue output voltage at each sample interval:

    void sampleISR() {
    …
    }

    It will be an interrupt service routine, which means that it cannot have arguments or a return value. The function will be triggered by an interrupt 22,000 times per second. It will add currentStepSize to the phase accumulator to generate the output waveform. Define the phase accumulator as a static local variable, so that its value will be stored between successive calls of sampleISR():

    static uint32_t phaseAcc = 0;
    phaseAcc += currentStepSize;

    The conversion from the phase accumulator to a sawtooth wave output voltage is quite simple because the value of a sawtooth function is directly proportional to the phase. Right-shift (divide by $2^n$) the phase accumlator and subtract $2^7$, to scale the range to $-2^7\leq V_\text{out}\leq2^7-1$:

    int32_t Vout = (phaseAcc >> 24) - 128;

    The Arduino analogWrite() function has a range of 0-255: 0 produces 0V and 255 produces 3.3V. Therefore, you need to add 128 so that the median voltage (DC offset) is 1.65V.

    analogWrite(OUTR_PIN, Vout + 128);

    You may wonder why 128 is subtracted, then added again. In future, you will need to multiply and add signals, for example to implement a volume control or polyphony. That will be easier when samples have an offset of zero because the offset will be unaffected by mathematical operations. Meanwhile, the phase accumulator itself cannot have a zero offset because that would require a signed integer and the overflow of signed integers results in undefined behaviour in C and C++.

    Different waveform functions will require more maths to convert phase into output voltage. For example, a sine wave would require the calculation of a sin function. Whatever the waveform, it’s best to define the function to have a midpoint of zero and then add the DC offset in the final step.

Tip

Numerically Controlled Oscillator

It may seem unnecessary to use a 32-bit phase accumulator if only 8 bits are needed to create the waveform. The technique is known as a numerically controlled oscillator and it allows a more accurate frequency than would be possible with an 8-bit accumulator. The down sampling from the 32-bit accumulator to the 8-bit output means that each individual cycle of the waveform may have an inaccurate period, but that error is averaged out over multiple cycles. The result is phase jitter, which is less obvious than a continuous frequency error.

If you try to generate very high tones you will hear aliased frequency components arising from the periodicity of the jitter, particularly for discontinuous waveforms like the sawtooth. This is a limitation of the 22kHz sample rate.

  1. A timer is needed to trigger the interrupt that will call sampleISR(). Create a timer in the setup function using the stm32duino library class HardwareTimer:

    TIM_TypeDef *Instance = TIM1;
    HardwareTimer *sampleTimer = new HardwareTimer(Instance);

    The timer is configured by setting the period, attaching the ISR and starting the timer, also in the setup function:

    sampleTimer->setOverflow(22000, HERTZ_FORMAT);
    sampleTimer->attachInterrupt(sampleISR);
    sampleTimer->resume();

    See the documentation for the timer library for more information.

  2. Test your code. You should hear a note from the speaker when you press each key. You could test that the notes are correct with a guitar tuner app if you like.

Caution

Do not use headphones until you have tested the loudness with the headphones away from your ears.

  1. Even though the code works, there is a possible synchronisation bug. The currentStepSize variable could be read in sampleISR() when it has been partially modified in the main loop.

    The first improvement is to reduce the number of accesses to currentStepSize in the main loop to a single store operation. You were asked to check each key and update currentStepSize if the key is pressed. The code can be improved by using a local variable for the step size until all the keys have been checked. Then, when the final value is known, the local variable can be copied to currentStepSize so that the global variable is only accessed once.

    Next, we can force the write to currentStepSize in the main loop to be an atomic operation using a built-in compiler function. The variable is a 32-bit integer so any write is likely to be atomic by default because it can be completed in a single CPU operation. However, using an atomic store function makes certain and shows anyone who maintains the code in future that the operation is intended to be atomic:

    __atomic_store_n(&currentStepSize, localCurrentStepSize, __ATOMIC_RELAXED);

    This function (actually a compile macro) stores localCurrentStepSize in currentStepSize as an atomic operation. The parameter __ATOMIC_RELAXED indicates that we need an atomic store, but we’re not concerned about the ordering of other instructions that don’t use the two variables in question. Refer to the documentation for more information about this parameter.

    A complementary call to __atomic_load_n() is not functionally necessary because the variable is read in an ISR that cannot be interrupted. However, including the atomic access ensures the programmers intent is preserved and guards against errors if a higher-priority interrupt is introduced in future.

    __atomic_load_n(&currentStepSize, &localCurrentStepSize, __ATOMIC_RELAXED);

    See the compiler documentation for more information about built-in atomic operations.

4. Split key scanning and display update tasks with threading

Currently, the keys are read once every execution of the main loop. The main loop is also used to update the display, which is not ideal because it forces these tasks to have the same initiation interval. We will separate these two processes into different tasks by creating a thread to run the key scanning task.

  1. Create a global struct that will store system state that is used in more than one thread:

    struct {
    std::bitset<32> inputs;  
    } sysState;

    For now, the struct only contains the input bitset. The only other global variable is currentStepSize, but keep that apart from sysState because it is accessed by an ISR and the synchronisation method will be different.

  2. Move all your code for scanning the keyboard into a single function:

    void scanKeysTask(void * pvParameters) {
    …
    }

    The function should do the following:

    • Loop through the rows of the key matrix
    • Read the columns of the matrix and store the result in sysState.inputs
    • Look up the phase step size for the key that is pressed and update currentStepSize

    Test your code by calling scanKeysTask() in the main loop. The parameter pvParameters will be used by the thread initialiser — just set it to NULL in your call. Everything should work as before.

  3. Install the 'STM32duino FreeRTOS' library with the Platformio library manager. Include its header file at the start of your source file:

    #include <STM32FreeRTOS.h>

    Add this function call at the end of the setup function to start the RTOS scheduler:

    vTaskStartScheduler();

    Now make scanKeysTask() an independent thread. Convert it to an infinite loop by wrapping contents of the function in a while loop:

    while (1) {
    …
    }

    Add the following code into your setup() function to initialise and run the thread:

    TaskHandle_t scanKeysHandle = NULL;
    xTaskCreate(
    scanKeysTask,		/* Function that implements the task */
    "scanKeys",		/* Text name for the task */
    64,      		/* Stack size in words, not bytes */
    NULL,			/* Parameter passed into the task */
    1,			/* Task priority */
    &scanKeysHandle );	/* Pointer to store the task handle */

    See the API reference for more information about this function call. We have used a stack size of 64 words (256 bytes) for the thread. The stack needs to be large enough to store all the local variables of the functions called in the thread.

    Remove the call to scanKeysTask() from the main loop.

Important

Your code will not run if you include STM32duino FreeRTOS as a library dependency but you don't start the scheduler. Initialise everything else before starting the scheduler with vTaskStartScheduler().

  1. The thread will need to execute at a constant rate, which will be the sample rate of our keyboard. We can use the RTOS function vTaskDelayUntil() to do this — it blocks execution until a certain time has passed since the last time the function was completed.

    Declare two local variables in scanKeysTask(), before the loop:

    const TickType_t xFrequency = 50/portTICK_PERIOD_MS;
    TickType_t xLastWakeTime = xTaskGetTickCount();

    xFrequency will be the initiation interval of the task. It is given in units of RTOS scheduler ticks and we can use the constant portTICK_PERIOD_MS to convert a time in milliseconds to scheduler ticks. Here we have set the initiation interval to 50ms.

    xLastWakeTime will store the time (tick count) of the last initiation. We initialise it with the API call xTaskGetTickCount() to get the current time.

    Now you can add the blocking call to vTaskDelayUntil() at the start of your infinite loop:

    vTaskDelayUntil( &xLastWakeTime, xFrequency );

    This function call blocks execution of the thread until xFrequency ticks have happened since the last execution of the loop. As an RTOS function, it places the thread into the waiting state and allows the CPU to do other tasks until it is time to run the function again. When the required time has passed, xLastWakeTime is updated by the RTOS ready for the next iteration. See the API reference for more information about this function call.

  2. Test your code. It should behave as before. You may have noticed another potential synchronisation bug with the sysState struct. sysState cannot be treated as a simple atomic variable because it interacting with it may take multiple memory accesses, for example in the member functions of bitset. We will solve the problem in the next section using a mutex.

  3. The main loop is usually left empty in FreeRTOS systems. Create another thread to run the display update task (name the function displayUpdateTask()) with a 100ms initiation interval. Remove the original, polling-based rate control implemented with if (millis() > next) {…} or while (millis() < next); and replace it with an infinite loop and a call to vTaskDelayUntil().

    Since 100ms is longer than 50ms, set the priority of the display update thread to 1 and the key scanning thread to 2 (higher priority). Use a stack size of 256 words for displayUpdateTask(). Your loop() function should now be empty.

    The starter code toggles the LED on every 100ms loop — you should keep this statement in your display update thread to satisfy a coursework specification. Since the display update thread will probably have the longest initiation interval in your system, if it meets its deadline, then every other task in the system will also meet its deadline (assuming rate-monotonic scheduling). Therefore, the LED is a useful indicator that the code meets its real time requirements. As you add functionality, look out for the LED flash rate slowing, especially under heavy workload such as polyphony with the maximum number of simultaneous notes.

Tip

How much stack?

The stack stores the arguments, local variables and return pointers for functions that are called. Each thread has its own stack. The total stack required depends on the worst-case combination of function calls. Recursive functions are a bad idea when the stack size is fixed because the worst-case stack requirement depends on the data and it can be hard to determine.

There are two methods to determine the amount of stack to allocate to a thread:

  1. Examine the compiler output to find the stack footprint of every function. Then, add together the combinations of functions that could all be in progress at the same time inside one thread. The use of libraries makes this process more difficult because they might have their own chains of function calls that are hard to inspect. The Inspect view in Platformio can be used to explore memory usage in your project.
  2. Find it at runtime. The FreeRTOS function uxTaskGetStackHighWaterMark() returns the largest amount of stack that a thread has ever needed. You can allocate a large stack at first and then optimise when the code is working. You need to ensure that all the code has been exercised before you report the stack high water mark.

If a thread in your system runs out of stack the RTOS will enter an error state in an infinite loop. The LED on the microcontroller module will flash in bursts of 4 flashes.

You can reduce the stack requirement by placing local variables in dynamically allocated memory with new, malloc() or higher-level methods of creating dynamic memory objects. Dynamic memory comes from a single pool (the heap), so it is more flexible than the per-thread allocation of stack memory. Avoid allocating dynamic memory outside of the initialisation code. In some industries dynamic allocation is banned due to its runtime uncertainty. Raw new and malloc are discouraged in general programming due to the possibility of memory leaks, but that is not a concern if memory is allocated only during initialisation and never deallocated.