Be more careful with array 'len' fields. #205

ernestum · 2018-07-10T14:58:47Z

It happens easily and it is dangerous, that the length field of dynamic length arrays is not set.

Especially in the higher level languages like C++ or Python where the array data types already take care of the array length, users do not expect that they need to set the len field that corresponds to an dynamic length array manually.

This can result in crashes either locally or even on remote zcm nodes because if len is not initialized, it has a random value causing access to arbitrary memory sections when parsing the message.

Therefore I propose to

at least initialize all fields specifying array lengths with 0 so we at least do not have any segfaults.
Change the emitters for the higher level languages in such a way, that the len fields are overwritten with the actual lengths of the arrays. I see no use case where the len field would be set to anything but the actual length of the array.

The text was updated successfully, but these errors were encountered:

jbendes · 2018-07-11T07:17:47Z

So this one I'm not sure I agree with. It should be expected that a zcmtype (which is essentially a struct with some helper functions for encoding and decoding) would be uninitialized on construction. Everything in c++ is constructed uninitialized. If you want zero initialization, simply zero initialize your zcmtype on construction.

zcmtype_t msg; // uninitialized
zcmtype_t msg2 = {}; // zero initialized c++11 syntax

I could be swayed on zero initialization if people are gung ho about it but I feel semi-strongly that zcm shouldn't be doing zero initialization in c++ for you.

As for the length field being pulled out of the vector, this one I actually disagree with only because I've used this behavior all the time - have a long vector that you only care to send the first 5 elements of? Simply set size to 5. This one I could be convinced on as well.

Interested to hear @olsoni thoughts?

olsoni · 2018-07-11T19:52:25Z

I could get behind 0 initialization in languages that promote it, though it does lead to different behavior from one language to another. Always operating under the assumption that nothing is set within the type upon declaration / construction is certainly safe for any language.

That being said, I'm open to aiming for more language specific initialization semantics. In C++ for example, you'd always expect a new vector's .size() call to return 0 unless you specifically asked for something else. This shouldn't be too much trouble to implement within zcm-gen, but it'll have to be language by language implemented. Having the zero initialization in place for some could make it easier to neglect for those languages that don't implement it.

Regarding the length fields being redundant in some higher level languages, I've never personally abused them as @jbendes suggests but I suppose I can see a case or 2 where that could be useful. I think the main reason to leave them as separate fields within the type would be so that semantically you can always refer to your size as the same thing rather (e.g. always my_type.n) rather than the reference of that data being very language dependent (e.g. my_type.n in C, my_type.data.size() in C++, or perhaps length(my_type.data) in Julia).

ernestum · 2018-07-12T08:59:16Z

I can understand that the (initialization) semantics should be as close as possible in every language to prevent any misunderstandings and therefore can see why we should not change the initialization code for every language.

Abusing the len field is something I have very mixed feelings about. Especially when you do that in C, where do you store the actual length of the array?

Nonetheless, even if we want to support the use case of using the len fields to artificially shorten the length of the sent messages, we should not allow len fields larger than the actual array because that is what causes segfaults on local and remote ZCM nodes.

jbendes · 2018-07-12T11:30:21Z

Zcm is intended to be a lightweight protocol. It doesn't do error checking for you in terms of sanitizing your messages. If you have 4 nested arrays (or nested zcmtypes which have nested arrays), how deep would you want zcm to go when running checks? When would it run these checks? To me, this level of error checking is outside the scope of what I would want zcm to be responsible for. Checking for sane inputs could be an entire library all to itself.

As for causing segfaults on remote machines, that is completely unacceptable. Can you put together an example of when that would happen? I didn't think that was possible with the current architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be more careful with array 'len' fields. #205

Be more careful with array 'len' fields. #205

ernestum commented Jul 10, 2018

jbendes commented Jul 11, 2018

olsoni commented Jul 11, 2018 •

edited

Loading

ernestum commented Jul 12, 2018

jbendes commented Jul 12, 2018

Be more careful with array 'len' fields. #205

Be more careful with array 'len' fields. #205

Comments

ernestum commented Jul 10, 2018

jbendes commented Jul 11, 2018

olsoni commented Jul 11, 2018 • edited Loading

ernestum commented Jul 12, 2018

jbendes commented Jul 12, 2018

olsoni commented Jul 11, 2018 •

edited

Loading