Read serial line in chunks instead of character by character #164

wentasah · 2023-04-01T21:27:52Z

This significantly reduces CPU load caused by this driver.

Without this change, CPU load on a powerful x86 machine with a USB-connected GPS is 36%. The output of the strace command looks like this:

...
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
read(33, "$", 1)                        = 1
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
read(33, "G", 1)                        = 1
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
read(33, "P", 1)                        = 1
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997180})
read(33, "R", 1)                        = 1
...

It can be seen that characters are read from the serial line one by one. This is due to how IOBase.readline() interacts with pySerial.

In this commit, we implement the same functionality as IOBase.readline(), but in a way allowing to read the serial line characters in bigger chunks. With it, CPU load decreases to 7.5% and strace reports this:

...
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=926286000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=906510472})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 422) = 143
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=906287000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=886945652})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 279) = 143
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=886721000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=866882570})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 136) = 136
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999996000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999991549})
read(33, ",,*57\r\n", 1024)             = 7
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999827000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=985949688})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 1017) = 143
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=985746000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=966235438})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 874) = 143
...

Reading from the kernel happens most of the time in chunks of 143 bytes (for our GPS) and therefore, the system call overhead is 143× lower.

Tested on Septentrio mosaic-H GPS.

src/libnmea_navsat_driver/nodes/nmea_serial_driver.py

This significantly reduces CPU load caused by this driver. Without this change, CPU load on a powerful x86 machine with a USB-connected GPS is 36%. The output of the strace command looks like this: ... pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160}) read(33, "$", 1) = 1 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160}) read(33, "G", 1) = 1 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160}) read(33, "P", 1) = 1 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997180}) read(33, "R", 1) = 1 ... It can be seen that characters are read from the serial line one by one. This is due to how IOBase.readline() interacts with pySerial. In this commit, we implement the same functionality as IOBase.readline(), but in a way allowing to read the serial line characters in bigger chunks. With it, CPU load decreases to 7.5% and strace reports this: ... pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=926286000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=906510472}) read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 422) = 143 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=906287000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=886945652}) read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 279) = 143 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=886721000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=866882570}) read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 136) = 136 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999996000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999991549}) read(33, ",,*57\r\n", 1024) = 7 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999827000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=985949688}) read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 1017) = 143 pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=985746000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=966235438}) read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 874) = 143 ... Reading from the kernel happens most of the time in chunks of 143 bytes (for our GPS) and therefore, the system call overhead is 143× lower. Tested on Septentrio mosaic-H GPS.

evenator · 2023-04-19T01:56:15Z

I'm testing this locally and seeing some pretty problematic behavior due to the interaction of the timeout (2 seconds) and the buffer length (1024 char) for my device. The call to read() will always wait for 1024 characters or the full two seconds, which can hold several complete sentences in the buffer, introducing a lot of latency (i.e. up to 2 seconds).

Would GPS.read_until(size=1024) achieve the same performance improvement while avoiding the buffering latency?

This commit copies the implementation of seria.Serial.read() and removes one line from it: "if timeout.expired():". With this change, the function provides classical POSIX semantics of read(), which returns as soon as some bytes are received. The implementation in pySerial either waits for the full buffer or timeout expiration - nothing in between. This means that with pySerial, you either pay big overherd by reading character-by-character or have increased latency due to waiting. Unfortunately, there is no way to fix this in pySerial without changing the semantics and possibly breaking some applications or introducing a fixed function with a different name. Also, it seems that pySerial is no loger maintained (last commit 16 months ago, PRs without maintainer responses). It would be much simpler not to use pySerial and use /dev/ttyXXX directly. This project does not need the features offered by pySerial (mainly abort read from another thread). The only problem of not using pySerial would be lower portability.

wentasah · 2023-04-19T15:22:22Z

Unfortunately, GPS.read_until would have the same overhead. I tried to come up with a better way of using pySerial API and there is no way how to use it without either overhead or latency.

In the new commit, I copied the implementation of read() from pySerial and fixed the problem by removing the line. With it, the both overhead and latency is low. I copy my commit message here:

This commit copies the implementation of seria.Serial.read() and removes one line from it: "if timeout.expired():". With this change, the function provides classical POSIX semantics of read(), which returns as soon as some bytes are received. The implementation in pySerial either waits for the full buffer or timeout expiration - nothing in between. This means that with pySerial, you either pay big overherd by reading character-by-character or have increased latency due to waiting.

Unfortunately, there is no way to fix this in pySerial without changing the semantics and possibly breaking some applications or introducing a fixed function with a different name. Also, it seems that pySerial is no loger maintained (last commit 16 months ago, PRs without maintainer responses).

It would be much simpler not to use pySerial and use /dev/ttyXXX directly. This project does not need the features offered by pySerial (mainly abort read from another thread). The only problem of not using pySerial would be lower portability.

What do you think? Do you want to use pySerial to stay compatible with many OSes? If you care only about Linux, I'd suggest getting rid of pySerial. I'll provide the implementation.

ggaessler · 2023-04-19T16:22:42Z

Wouldn't a good and simple solution be to just set the timeout to zero and the read size to something like the proposed 1024? That should be fine for performance and latency. That's also the way I used it for quite some years by now without any problems:
https://github.com/boschresearch/nmea_navsat_driver/blob/master/src/libnmea_navsat_driver/nodes/nmea_serial_driver.py#L177-L186
EDIT: The line separation would still be needed. I don't need it because I anyway process mixed binary and ASCII data from the receiver since I also enabled proprietary messages.

wentasah · 2023-04-19T19:37:33Z

I think that this way, you would end up busy waiting for the serial line. The CPU load would always be 100%. At least this is what I saw when digging through the source code. Is your experience different?

evenator · 2023-04-19T22:15:56Z

What about a timeout that's fast in GPS terms but slow in CPU terms? Would 10ms be an acceptable compromise?

…

On Wed, Apr 19, 2023, 3:37 PM Michal Sojka ***@***.***> wrote: I think that this way, you would end up busy waiting for the serial line. The CPU load would always be 100%. At least this is what I saw when digging through the source code. Is your experience different? — Reply to this email directly, view it on GitHub <#164 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEKUHEPB6EASGJ4SQRALVDXCA5IPANCNFSM6AAAAAAWP3V5WI> . You are receiving this because you were assigned.Message ID: ***@***.***>

ggaessler · 2023-04-20T11:07:33Z

Sorry my changes were a long time back and I didn't check the full code again. Yes, there indeed was some additional sleep for a time short in GNSS, but long in CPU terms, by default 10 ms, just as @evenator proposed.
I've never tried setting the timeout parameter that way, but that could be the cleaner solution compared to my timeout zero plus sleep.

evenator · 2023-12-26T16:28:38Z

@wentasah Are you still working on this enhancement, or should I close the PR as abandoned?

wentasah · 2023-12-29T18:34:31Z

I'll continue working on this in upcomming weeks.

wentasah force-pushed the fast-serial-read branch 3 times, most recently from 95edd26 to 239d727 Compare April 4, 2023 13:47

wentasah marked this pull request as ready for review April 4, 2023 13:49

evenator self-assigned this Apr 7, 2023

evenator added enhancement Fix in Review A fix has been created. Awaiting original reporter's review. labels Apr 7, 2023

evenator reviewed Apr 7, 2023

View reviewed changes

src/libnmea_navsat_driver/nodes/nmea_serial_driver.py Outdated Show resolved Hide resolved

wentasah force-pushed the fast-serial-read branch from 239d727 to c463711 Compare April 7, 2023 06:13

evenator assigned wentasah and unassigned evenator Apr 19, 2023

wentasah marked this pull request as draft July 9, 2023 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read serial line in chunks instead of character by character #164

Read serial line in chunks instead of character by character #164

wentasah commented Apr 1, 2023 •

edited

Loading

evenator commented Apr 19, 2023

wentasah commented Apr 19, 2023

ggaessler commented Apr 19, 2023 •

edited

Loading

wentasah commented Apr 19, 2023

evenator commented Apr 19, 2023 via email

ggaessler commented Apr 20, 2023

evenator commented Dec 26, 2023

wentasah commented Dec 29, 2023 via email

Read serial line in chunks instead of character by character #164

Are you sure you want to change the base?

Read serial line in chunks instead of character by character #164

Conversation

wentasah commented Apr 1, 2023 • edited Loading

evenator commented Apr 19, 2023

wentasah commented Apr 19, 2023

ggaessler commented Apr 19, 2023 • edited Loading

wentasah commented Apr 19, 2023

evenator commented Apr 19, 2023 via email

ggaessler commented Apr 20, 2023

evenator commented Dec 26, 2023

wentasah commented Dec 29, 2023 via email

wentasah commented Apr 1, 2023 •

edited

Loading

ggaessler commented Apr 19, 2023 •

edited

Loading