Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always type erase drivers #2572

Closed
MabezDev opened this issue Nov 20, 2024 · 27 comments · Fixed by #2907
Closed

Always type erase drivers #2572

MabezDev opened this issue Nov 20, 2024 · 27 comments · Fixed by #2907
Assignees
Milestone

Comments

@MabezDev
Copy link
Member

For our drivers we have the option of opting into the new_typed method to create a driver without type erasure, i.e Spi<SPI2, Blocking>.

I'm wondering what use cases this opens up (if any) and whether we should always type erase.

I believe having the concrete types helps at least with DMA channels on PDMA devices, as we can (at compile time) check a user isn't passing an invalid DMA channel for the peripheral. We still have a runtime check for the case where they pass AnyDmaChannel, but there is merit to a compile time check.

One downside is that our compiler allocated "inference slot" (the last generic param can be inferred by rustc automatically, but any others can't) it taken by the instance type. This means if we ever want the compiler to infer a generic parameter we can't.

@MabezDev MabezDev added RFC Request for comment 1.0-blocker labels Nov 20, 2024
@bugadani
Copy link
Contributor

bugadani commented Nov 20, 2024

I believe having the concrete types helps at least with DMA channels on PDMA devices, as we can (at compile time) check a user isn't passing an invalid DMA channel for the peripheral. We still have a runtime check for the case where they pass AnyDmaChannel, but there is merit to a compile time check.

I'd like to point out that there is only a single place where this makes any difference:

  • On ESP32-S2, checking for DMA_SPI2 vs DMA_SPI3.
  • In all other cases, DMA channels are either
    • cross-compatible (ESP32 and SPI DMA, UDMA)
    • or there is only a single peripheral (ESP32-S2 I2S)
    • or a single shared DMA controller (ESP32-S2 UDMA, CryptoDMA).

@bugadani
Copy link
Contributor

One place where optional type retention makes sense is GPIOs, where the extra instance dispatch is probably comparable to the IO speed. Are we okay with treating GPIOs as an exception here?

@bugadani
Copy link
Contributor

bugadani commented Nov 20, 2024

Thirdly, we can opt back into type checking DMA channels if we approach constructors differently. Instead of Spi::new().with_dma() we can provide SpiDma::new() that checks, then erases. There are advantages to this approach, namely that we don't forget out the pin setters from the API, which are currently not available on SpiDma.

This would make it harder to implement SpiDma -> Spi conversions if we ever get to supporting releasing just the DMA channel. I'm not sure whether this is a valuable enough capability, though.

@MabezDev
Copy link
Member Author

One place where optional type retention makes sense is GPIOs, where the extra instance dispatch is probably comparable to the IO speed. Are we okay with treating GPIOs as an exception here?

I would be okay with that. It would be nice to maybe do a small benchmark on this to see the difference.

@MabezDev
Copy link
Member Author

Following on from the discussion here: #2573 (comment):

I was looking at our drivers, and future chips, namely the P4. Fortunately I don't think we'll have any issues with our current driver set, with the possible exception of USB, the P4 has one full speed usb and one high speed usb, and the driver is currently not instanced. It's not clear to me how much the two will share, so maybe it's not an issue in this case.

I think this leads us towards type erase by default though, because it only takes a new chip to ruin this. We could work around this later down the line with cfgs etc but I'd rather avoid the issue entirely.

@bugadani
Copy link
Contributor

bugadani commented Nov 20, 2024

I would be okay with that. It would be nice to maybe do a small benchmark on this to see the difference.

My LA isn't the best, but on an ESP32:

Typed

#![no_std]
#![no_main]

use esp_backtrace as _;
use esp_hal::{
    delay::Delay,
    gpio::{GpioPin, Input, Level, Output, Pin, Pull},
    prelude::*,
};

#[entry]
fn main() -> ! {
    let peripherals = esp_hal::init(esp_hal::Config::default());

    // Set LED GPIOs as an output:
    let led = Output::new_typed(peripherals.GPIO2, Level::Low);

    toggle_pins(led)
}

fn toggle_pins(mut pin: Output<GpioPin<2>>) -> ! {
    loop {
        pin.toggle();
    }
}

image

Erased

#![no_std]
#![no_main]

use esp_backtrace as _;
use esp_hal::{
    delay::Delay,
    gpio::{Input, Level, Output, Pin, Pull},
    prelude::*,
};

#[entry]
fn main() -> ! {
    let peripherals = esp_hal::init(esp_hal::Config::default());

    // Set LED GPIOs as an output:
    let led = Output::new(peripherals.GPIO2, Level::Low);

    toggle_pins(led)
}

fn toggle_pins(mut pin: Output) -> ! {
    loop {
        pin.toggle();
    }
}

image

There seems to be a ~3x difference in this one experiment.

@MabezDev
Copy link
Member Author

So I'm happy with GPIO being the exception. I'm now just wondering what happens if we want to make this exception again in the future, or maybe there are some other drivers that exist now that might benefit from it (RMT maybe? latency is quite important there).

@bugadani
Copy link
Contributor

GPIO is an exception in another matter: we currently must use enums to represent the AnyPin, because not everything can be neatly indexed. With other peripherals this isn't the case - the Any peripheral can be a pointer to the Instnance (or the register block if critical) (or two pointers with State), which may be easier propagated by the compiler.

@Dominaezzz
Copy link
Collaborator

Just wanted to sprinkle my 2 cents.
Before going the route of always erasing the instance type, you should explore adding instance specific features to see how well it plays out long term.

For example SPI2 can do things that SPI3 can't, how do you want to rectify this discrepancy without type safety.

I2S1 can do things that I2S0 can't on the base ESP32, albeit not very well documented.

@MabezDev
Copy link
Member Author

Before going the route of always erasing the instance type, you should explore adding instance specific features to see how well it plays out long term.
For example SPI2 can do things that SPI3 can't, how do you want to rectify this discrepancy without type safety.
I2S1 can do things that I2S0 can't on the base ESP32, albeit not very well documented.

Whilst these are good points, I think they are a bit orthogonal to this issue, as we'll have to solve this regardless because erasure is already implemented. The short answer is a runtime check, but depending on the functionality disparity we may be able to keep some type safety checks. For instance if a driver doesn't support some large'ish mode, like slave mode we can probably type check that on the constructor. If some config/submode on a specific driver isn't supported it will probably be a runtime error of some kind. I think these need to be assessed on a case-by-case basis, there won't be a silver bullet.

@bugadani
Copy link
Contributor

Wildly different hardware features may also be best implemented as different drivers, which aligns well with @MabezDev's typecheck-in-constructor point.

@MabezDev
Copy link
Member Author

GPIO is an exception in another matter: we currently must use enums to represent the AnyPin, because not everything can be neatly indexed

Is this really the case by the way? I understand that we have different banks of GPIOs, and that might make things a bit harder, but I think it should still be doable? With adequate perf from the type erased GPIO driver (doesn't have to be stellar, just not 3 times slower :D) it might be able to make the decision for us.

@bugadani
Copy link
Contributor

I guess we are already switching on the value for stuff so we could just as well switch on the pin number, but at least with an enum we don't have unreachable branches...

@tom-borcin tom-borcin added the investigation Not needed for 1.0, but don't know if we can support without breaking the driver. label Dec 18, 2024
@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 19, 2024

Just out of curiosity I looked at the generated assembly from the GPIO test (for C6)
(on main, cde6169, rustc 1.84.0-nightly (86d69c705 2024-10-22) )

Erased

420001b4:	c38c                	sw	a1,0(a5)
            })
        }

        fn is_set_high(&self, _: private::Internal) -> bool {
            handle_gpio_output!(&self.0, target, {
                OutputPin::is_set_high(target, private::Internal)
420001b6:	00482783          	lw	a5,4(a6)
420001ba:	00d7f533          	and	a0,a5,a3
420001be:	87ba                	mv	a5,a4
                OutputPin::set_output_high(target, high, private::Internal)
420001c0:	d975                	beqz	a0,420001b4 <_ZN3foo11toggle_pins17haabe4394cfcb062aE+0x34>
420001c2:	87b2                	mv	a5,a2
420001c4:	bfc5                	j	420001b4 <_ZN3foo11toggle_pins17haabe4394cfcb062aE+0x34>

Non-erased

420001d0:	c314                	sw	a3,0(a4)
        intrinsics::volatile_load(src)
420001d2:	4158                	lw	a4,4(a0)
        self.gpio_bank(private::Internal).read_output() & self.mask() != 0
420001d4:	00477793          	and	a5,a4,4
420001d8:	8732                	mv	a4,a2
420001da:	dbfd                	beqz	a5,420001d0 <main+0x52>
420001dc:	872e                	mv	a4,a1
420001de:	bfcd                	j	420001d0 <main+0x52>

Which is surprising! And also the LA shows pretty much the same for both tests

@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 19, 2024

On ESP32

Not-erased 1.67MHz

Asssembly looks almost like C6

400d0665:	0020c0        	memw
400d0668:	0c99      	s32i.n	a9, a12, 0
400d066a:	0020c0        	memw
400d066d:	08c8      	l32i.n	a12, a8, 0
        self.gpio_bank(private::Internal).read_output() & self.mask() != 0
400d066f:	10dc90        	and	a13, a12, a9
400d0672:	0bcd      	mov.n	a12, a11
        if set {
400d0674:	ed1d77        	beq	a13, a7, 400d0665 <main+0x35>
400d0677:	0acd      	mov.n	a12, a10
400d0679:	fffa06        	j	400d0665 <main+0x35>

Erased 0.63MHz

The assembly looks very confusing with lots of callx8 - the enum-dispatching set_output_high pretty much looks unoptimized.

@MabezDev
Copy link
Member Author

To confirm that this might be a Xtensa backend issue, would you be able to do the test on the least weird Xtensa chip (esp32s3)? The esp32 has some panicking code paths that might affect optimisation in general, so I just want to rule that out.

@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 19, 2024

Sure ....

Erased: 0.69MHz
Not Erased: 1.54MHz

(And pretty much same assembly code)

Same thing 😮

@Dominaezzz
Copy link
Collaborator

Not erased: 0.69kHz Erased: 1.54MHz

Erased is faster???

@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 19, 2024

Ooops - fixed

@bugadani
Copy link
Contributor

It should be interesting to see what happens to the C2. My current working hypothesis is that the "hole" in the GPIO range prevents optimizing the enum dispatch away.

@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 20, 2024

It makes some difference in terms of optimizations in other places but the actual code which toggles the GPIO looks the same

C2

Erased 1.90 MHz

4200017c:	c314                	sw	a3,0(a4)
4200017e:	4158                	lw	a4,4(a0)
42000180:	00477793          	and	a5,a4,4
42000184:	8732                	mv	a4,a2
42000186:	dbfd                	beqz	a5,4200017c <main+0x4a>
42000188:	872e                	mv	a4,a1
4200018a:	bfcd                	j	4200017c <main+0x4a>

Not-Erased 1.90 MHz

42000168:	c38c                	sw	a1,0(a5)
4200016a:	00482783          	lw	a5,4(a6)
4200016e:	00d7f533          	and	a0,a5,a3
42000172:	87ba                	mv	a5,a4
42000174:	d975                	beqz	a0,42000168 <_ZN10non_erased11toggle_pins17h2ebd1791ed089c58E+0x34>
42000176:	87b2                	mv	a5,a2
42000178:	bfc5                	j	42000168 <_ZN10non_erased11toggle_pins17h2ebd1791ed089c58E+0x34>

@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 20, 2024

Ok mystery solved (Daniel was right about inlining being the key)

ESP32 and S3 have a lot more pins so the enum-dispatching functions generate more code which makes the code-gen not inline the function and the general dispatching function obviously cannot get optimized

In the end it doesn't change the conclusion here - GPIOs benefit from being typed. In other places we don't have many variants to dispatch to, so they should get optimized fine

@bugadani
Copy link
Contributor

In the end it doesn't change the conclusion here - GPIOs benefit from being typed.

If this is just inlining, but otherwise setting/reading pins is equally fast in the two versions, I will have to disagree. We should try and refactor the driver to optimize better. It is possible we can get away with AnyPin(u8) with some internal refactors.

The time it takes to configure a pin is less important, but if actually working with the pins is the same, the performance argument to maintain the optional typed API is not a real one.

@bjoernQ
Copy link
Contributor

bjoernQ commented Dec 20, 2024

True - just for fun I sprinkled the code with some #[inline(always)] and I get it (for this limited test at least) to the same level of performance. (Actually, now erased.rs (1.56MHz) is faster than typed.rs (1.54MHz) on ESP32-S3 - but let's blame that on the test setup)

@bugadani
Copy link
Contributor

The pathological device is ESP32, where some pins are not output pins - we have a big generated match in each related function that essentially asserts if an output-related function can be called.

Removing this match, and just yolo-calling raises the achievable frequency from ~660kHz to 1MHz for me. Better (but incorrect), but still far from the 1.8 in my previous test.

@MabezDev
Copy link
Member Author

MabezDev commented Jan 6, 2025

With the refactor of GPIO in #2854, I think we're closer to erase by default, and I'm inclined to make that decision. I'm looking for any final comments as to why we shouldn't always type erase the drivers.

@MabezDev
Copy link
Member Author

MabezDev commented Jan 8, 2025

Let's erase by default. To complete this work we need to:

  • Remove the new_typed methods
  • Update the API-guideline docs to remove mentions of:
    • Instance generic params
    • new_typed

@MabezDev MabezDev removed RFC Request for comment investigation Not needed for 1.0, but don't know if we can support without breaking the driver. labels Jan 8, 2025
@MabezDev MabezDev added this to the 1.0.0-beta.0 milestone Jan 8, 2025
@bugadani bugadani self-assigned this Jan 8, 2025
@MabezDev MabezDev changed the title [RFC] The value of new_typed vs always type erasing drivers Always type erase drivers Jan 8, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in esp-rs Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants