Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with frame buffer and DMA #23

Open
bjorndm opened this issue Jun 18, 2021 · 28 comments
Open

Improve performance with frame buffer and DMA #23

bjorndm opened this issue Jun 18, 2021 · 28 comments

Comments

@bjorndm
Copy link

bjorndm commented Jun 18, 2021

I enjoy using this library but the performance of drawing could be improved. In stead of drawing directly to the display, I suggest drawing to a memory frame buffer and then sending the frame buffer to the display using DMA. Here is an example of how this could work:

https://esp32.com/viewtopic.php?t=20108

@nopnop2002
Copy link
Owner

nopnop2002 commented Jun 18, 2021

Thank you for comment.

This example uses a lot of memory to display JPEG and PNG.

To use the framebuffer, I need to consume less memory for JPEG and PNG display.

@bjorndm
Copy link
Author

bjorndm commented Jun 18, 2021

Yes, looking at it in more detail, it is true true that a full screen framebuffer takes too much DRAM.

However, it would also be possible to use small DMA buffers, for example of 32x32 pixels (4k DRAM) to speed up font drawing and other graphic operations. These small buffers could also be used as "sprites" or "tiles". Smaller fonts could then also be loaded as such DMA-accessible sprites.

Perhaps there are other non-DMA techniques to improve performance as well?
EDIT: like this?
https://ioprog.com/2020/04/11/performance-improvement-for-stm32f030-st7789-graphics-library/

@nopnop2002
Copy link
Owner

nopnop2002 commented Jun 18, 2021

Perhaps there are other non-DMA techniques to improve performance as well?

Yes.

I know that register operations run about 5 times faster than gpio_set_level.

In the case of ESP32, it is divided into registers from GPIO00 to GPIO31 and registers from GPIO32 to GPIO39.
In the case of ESP32-S2, it is divided into registers from GPIO00 to GPIO31 and registers from GPIO32 to GPIO53.

However, it may not have much impact on overall performance.


Try this.

I (13816) MAIN: diff(gpio_set_level)=75
I (13936) MAIN: diff(register)=12
I (14216) MAIN: diff(func)=28
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"

#include "driver/gpio.h"
#include "esp_log.h"

#define _gpio_set_level(GPIO_PIN) (GPIO.out_w1ts = (1 << GPIO_PIN))
#define _gpio_clear_level(GPIO_PIN) (GPIO.out_w1tc = (1 << GPIO_PIN))

void func_gpio_set_level(int GPIO_PIN) {
        GPIO.out_w1ts = (1 << GPIO_PIN);
}

void func_gpio_clear_level(int GPIO_PIN) {
        GPIO.out_w1tc = (1 << GPIO_PIN);
}


#define GPIO_PIN 2

#define TAG "MAIN"

void app_main()
{
        gpio_pad_select_gpio( GPIO_PIN );
        gpio_set_direction( GPIO_PIN, GPIO_MODE_OUTPUT );
        gpio_set_level( GPIO_PIN, 0 );

        gpio_set_level( GPIO_PIN, 1 );
        vTaskDelay(100);
        gpio_set_level( GPIO_PIN, 0 );
        vTaskDelay(100);

        GPIO.out_w1ts = (1 << GPIO_PIN);
        vTaskDelay(100);
        GPIO.out_w1tc = (1 << GPIO_PIN);
        vTaskDelay(100);

        _gpio_set_level( GPIO_PIN );
        vTaskDelay(100);
        _gpio_clear_level( GPIO_PIN );
        vTaskDelay(100);

        TickType_t start;
        TickType_t end;
        TickType_t diff;
        start = xTaskGetTickCount();
        for(long i=0;i<1000000;i++) {
                gpio_set_level( GPIO_PIN, 1 );
                gpio_set_level( GPIO_PIN, 0 );
        }
        end = xTaskGetTickCount();
        diff = end - start;
        ESP_LOGI(TAG,"diff(gpio_set_level)=%d", diff);

        start = xTaskGetTickCount();
        for(long i=0;i<1000000;i++) {
                _gpio_set_level( GPIO_PIN );
                _gpio_clear_level( GPIO_PIN );
        }
        end = xTaskGetTickCount();
        diff = end - start;
        ESP_LOGI(TAG,"diff(register)=%d", diff);

        start = xTaskGetTickCount();
        for(long i=0;i<1000000;i++) {
                func_gpio_set_level( GPIO_PIN );
                func_gpio_clear_level( GPIO_PIN );
        }
        end = xTaskGetTickCount();
        diff = end - start;
        ESP_LOGI(TAG,"diff(func)=%d", diff);
}

@bjorndm
Copy link
Author

bjorndm commented Jun 19, 2021

Ok, I will try it out and if I see a performance improvement, I'll try to apply it to this esp-idf-st7789 project

@randyfan
Copy link

Hi, was just wondering if using register operations resulted in a noticeable performance improvement?

This library is awesome, but the only thing holding me back from using it for a project is the refresh rate. I'm trying to get text to refresh without a wasted frame where a black rectangle is drawn over it.

@nopnop2002
Copy link
Owner

I'm trying to get text to refresh without a wasted frame where a black rectangle is drawn over it.

I don't know what kind of drawing you want.

@randyfan
Copy link

randyfan commented Aug 1, 2022

Thanks for the reply. When I use lcdDrawString() with dev->_font_fill enabled, the rectangle drawing method makes it a partial refresh https://github.com/nopnop2002/esp-idf-st7789/blob/master/main/st7789.c#L763, which is cool; however, I can see the frame where the rectangle is drawn over the string. Is there any method that goes straight from one string to another string?

Also, I noticed if I uncomment and use https://github.com/nopnop2002/esp-idf-st7789/blob/master/main/st7789.c#L784 instead of the rectangle drawing method, the rectangle disappears but the refresh becomes noticeably sequential (string characters update from left to right)

Edit: Perhaps I should have posted here instead: #20. Basically want to see if there's a faster approach than using lcdDrawPixel() and lcdDrawFillRect() for partial refreshes.

@nopnop2002
Copy link
Owner

nopnop2002 commented Aug 1, 2022

@randyfan

Is there any method that goes straight from one string to another string?

lcdFillScreen(dev, BLACK);
strcpy((char *)ascii, "ABC");
lcdDrawString(dev, fx, xpos, ypos, ascii, WHITE); // Display ABC
vTaskDelay(1000);
lcdDrawString(dev, fx, xpos, ypos, ascii, BLACK); // Erase ABC
strcpy((char *)ascii, "abc");
lcdDrawString(dev, fx, xpos, ypos, ascii, WHITE); // Display abc at same position

@DaveDavenport
Copy link

I made a framebuffer version (for esp32s3 I had enough memory to do this) that uses large SPI transfers to redraw the screen in one go (docu indicates it should use dma todo this, atm I still did it blocking). With this I tested up to 15fps redraws and do not notices 'glitches' (there will be some, but rare) on esp.
I tested it with internal memory and SPIRAM. I also changed it to use 18bit (666) colors instead of 16 (565).

If there is interest I can upload this code, its very hacking for now.

@nopnop2002
Copy link
Owner

@DaveDavenport

Can you change your repository to public?

@DaveDavenport
Copy link

its not on github so no repository to set 'public'. I could share the (in very rough state) code if there is interest.

@DaveDavenport
Copy link

DaveDavenport commented Sep 27, 2023

I quickly cloned your repo and started adding my code:
https://git.sr.ht/~qball/esp-idf-st7789.git
What is done:

  • Conversion to RGB (666) (its 24bit, lower 2 bits are ignored).
  • Fix SPI mode according to datasheet.
  • (Optional) framebuffer
  • Less hardcoded magick numbers.

Things I need to port back:

  • Hardware Flipping of screen.
  • speed demo
  • Fast clearing of framebuffer.
  • PWM option for backlight dimming.

@DaveDavenport
Copy link

output
mjpeg (on 25MHz spi bus).

@DaveDavenport
Copy link

DaveDavenport commented Sep 27, 2023

output

And drawing text on screen without (much) glitching and framebuffer and background in SPIRAM .
Backlight dimmed to 20%.

@nopnop2002
Copy link
Owner

nopnop2002 commented Sep 27, 2023

thank you. I've cloned your code.

I'll take a closer look this weekend.

Conversion to RGB (666) (its 24bit, lower 2 bits are ignored).

Probably ESP32S2/C2 causes memory overflow when displaying JPEG and PNG

@DaveDavenport
Copy link

 Probably ESP32S2/C2 causes memory overflow when displaying JPEG and PNG

When not using framebuffer, it should be fine.
It is on my todo to remove/reduce the large static buffers in the code.

@DaveDavenport
Copy link

Another todo:

  • Support 320x240 screen.. (more then 16 bit pixels)

@DaveDavenport
Copy link

DaveDavenport commented Sep 28, 2023

Probably ESP32S2/C2 causes memory overflow when displaying JPEG and PNG

Just tested my branch on an esp32c3 and this works (with and without framebuffer). However I got jpg/png disabled.

image

@nopnop2002
Copy link
Owner

nopnop2002 commented Sep 28, 2023

Just tested my branch on an esp32c3 and this works

ESP32C3
384 KB ROM
400 KB SRAM

ESP32C2
576 KB ROM
272 KB SRAM ---> too small

ESP32S2
128 KB ROM
320 KB SRAM

@DaveDavenport
Copy link

Never used the C2, and the S2 is NRND.
Anyway in my patch the framebuffer is optional.

@nopnop2002
Copy link
Owner

nopnop2002 commented Sep 28, 2023

S2 is NRND

No.
S2 is Mass Production.
Pls check here.
https://products.espressif.com/#/product-selector?names=

@DaveDavenport
Copy link

DaveDavenport commented Sep 28, 2023

owh good to know, because I really liked it (and keep some in stock).
I had some clocking things of peripherals (in combination with dvfs), I did not manage to get working on the c3.

Mouser indicated that espressif marked it NRND, I see now it comes in another form factor.

@DaveDavenport
Copy link

For smaller memory usage, we can probably make a smaller framebuffer where we first draw part of what we want to show in the buffer, and then push that in one go to the screen. This should help with text, if we for example push one line of text in one go.

@nopnop2002
Copy link
Owner

JPEG and PNG display did not become faster even after changing to FrameBuffer.

This is because image analysis takes time.

Without Frame Buffer(rgb565)

I (2734) FillTest: elapsed time[ms]:1150
I (6784) ColorBarTest: elapsed time[ms]:50
I (11064) ArrowTest: elapsed time[ms]:280
I (17254) LineTest: elapsed time[ms]:2190
I (23194) CircleTest: elapsed time[ms]:1940
I (29174) RoundRectTest: elapsed time[ms]:1980
I (39554) RectAngleTest: elapsed time[ms]:6380
I (50564) TriangleTest: elapsed time[ms]:7010
I (55014) DirectionTest: elapsed time[ms]:450
I (60094) HorizontalTest: elapsed time[ms]:1070
I (65164) VerticalTest: elapsed time[ms]:1070
I (69354) FillRectTest: elapsed time[ms]:190
I (73614) ColorTest: elapsed time[ms]:260
I (78684) CodeTest: elapsed time[ms]:1070
I (84374) CodeTest: elapsed time[ms]:1690
I (95534) BMPTest: elapsed time[ms]:7160
I (102084) JPEGTest: elapsed time[ms]:2550
I (108934) PNGTest: elapsed time[ms]:2850
I (113154) QRTest: elapsed time[ms]:220

With Frame buffer(rgb565)

I (2735) FillTest: elapsed time[ms]:1150
I (6805) ColorBarTest: elapsed time[ms]:70
I (10865) ArrowTest: elapsed time[ms]:60
I (14915) LineTest: elapsed time[ms]:50
I (18975) CircleTest: elapsed time[ms]:60
I (23025) RoundRectTest: elapsed time[ms]:50
I (27095) DirectionTest: elapsed time[ms]:70
I (31165) HorizontalTest: elapsed time[ms]:70
I (35235) VerticalTest: elapsed time[ms]:70
I (39295) FillRectTest: elapsed time[ms]:60
I (43355) ColorTest: elapsed time[ms]:60
I (47455) CodeTest: elapsed time[ms]:100
I (51545) CodeTest: elapsed time[ms]:90
I (62605) BMPTest: elapsed time[ms]:7060
I (69155) JPEGTest: elapsed time[ms]:2550
I (75995) PNGTest: elapsed time[ms]:2840
I (80115) QRTest: elapsed time[ms]:120

@DaveDavenport
Copy link

DaveDavenport commented Sep 28, 2023

That is to be expected (the big bunny video was another jpeg decoder on non-esp hardware where the drawing was the bottleneck).
For me the visible drawing of text was the main reason to update. It looked odd and I could not update all text fast enough.

@nopnop2002
Copy link
Owner

nopnop2002 commented Sep 28, 2023

If you main purpose is to display text, it's well worth using FrameBuffer.

If your main purpose is to display images, there is no value in using FrameBuffer.

I'll publish it after some more testing.

Thank you.

@DaveDavenport
Copy link

DaveDavenport commented Sep 28, 2023

If your main purpose is to display images, there is no value in using FrameBuffer.

I think this depends on situation, for me the image updating in one go, instead see it build up while the decoder runs, looks better.
There are some use-cases where perceived speed (compared to actual speed) can make a difference.
In a small internet radio this helped to give a better experience.
image

I have some ideas to improve things more (keep track of exposed region to redraw only needed parts, in the above radio I now have a status bar on top that updates more often. But not sure if/when I have time.

Note: I am getting some oddness in the bmp test if I loop over it repeatedly. Free complains that a block is already free-ed . So I might have mixed something up.. Its all a bit of a rush job in a a few minutes I have here and there.

@DaveDavenport
Copy link

Thanks again for your library, its been very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants