Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce chance of string buffer overflow exploits #17

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

broukema
Copy link

@broukema broukema commented Oct 2, 2020

Gcc-10.2.0 gives many compile warnings of the risk of string
buffer overflows. The uncontrolled sprintf commands risk segmentation violations if long
path lengths are used, and risk security problems more generally, which is why
the warnings exist.

This commit aims to reduce the chance of buffer overflows by
using snprintf instead of sprintf and extending from just
the C preprocesor macro MAX_STRING_LEN to extra macros MAX_BUF_SIZE
and MAX_OUTFILE_SIZE, defined e.g.
as (3*MAX_STRING_LEN+40). A few unused variables are also
commented out in response to the recommendations from gcc.

This commit compiles cleanly for me with gcc-10.2.0, with
make OPTIONS=-fcommon to work around Issue #16 .

Gcc-10.2.0 gives many compile warnings of the risk of string
buffer overflows. These risk segmentation violations with long
path lengths and security problems more generally, which is why
the warnings exist.

This commit aims to reduce the chance of buffer overflows by
using `snprintf` instead of `sprintf` and extending from just
the C preprocesor macro MAX_STRING_LEN to MAX_BUF_SIZE defined
as (3*MAX_STRING_LEN+40). A few unused variables are also
commented out in response to the recommendations from gcc.

This commit compiles cleanly for me with gcc-10.2.0, with
`make OPTIONS=-fcommon`.
@darrencroton
Copy link
Owner

Thanks broukema. Let me have a look.

@broukema
Copy link
Author

broukema commented Oct 5, 2020

Darren, my name is actually "Boud" (lower case is fine in this context, of course - boud) :), not broukema. The broukema alias is only because someone else than me grabbed 'boud' on github before I did... :P

@darrencroton

code/core_save.c Outdated
@@ -57,7 +61,8 @@ void save_galaxies(int filenr, int tree)
// only open the file if it is not already open.
if( !save_fd[n] )
{
snprintf(buf, MAX_STRING_LEN - 1, "%s/%s_z%1.3f_%d", OutputDir, FileNameGalaxies, ZZ[ListOutputSnaps[n]], filenr);
snprintf(buf, MAX_BUF_SIZE - 1, "%s/%s_z%1.3f_%d", OutputDir, FileNameGalaxies, ZZ[ListOutputSnaps[n]], filenr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since buf is allocated as MAX_STRING_LEN, the second parameter should be MAX_STRING_LEN - 1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi Manodeep. You're right that buf is declared to be smaller. But I would rather fix that by changing line 22

char buf[MAX_STRING_LEN];

to

char buf[MAX_BUF_SIZE];

because there are two potentially long strings - OutputDir and FileNameGalaxies - plus a bit of extra stuff that are written to buf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - good call

Comment on lines +24 to +25
static const struct GALAXY_OUTPUT galaxy_output_null = {0};
struct GALAXY_OUTPUT galaxy_output = galaxy_output_null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious - what's the advantage of splitting up this way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, you're welcome to drop that, depending on how interested you are in future integration with our elaphrocentre galaxy formation pipeline, which we just posted to ArXiv:2010.03742 a few days ago. If you look at line 1042 of Marius' @mpeper patch

https://codeberg.org/boud/elaphrocentre/src/branch/N128cubed/reproduce/software/patches/marius_sage_hack.patch#L1042

which is applied before the other two patches at lines 1610--1612 of the current commit of

https://codeberg.org/boud/elaphrocentre/src/branch/N128cubed/reproduce/software/make/high-level.mk#L1610

then you'll see why we did that. For a given "final" halo at the current epoch, we add up the mass infall histories of all the predecessor haloes. Since this is a loop through "final" haloes, it's safer to zero the galaxy_output structure between iterations.

You're welcome to come over to codeberg and propose improvements to our sage patches, especially for more modularity. Or you can try to convince @mpeper to modularise and tidy his patch enough to propose a pull request here ;).

@@ -20,13 +20,15 @@ FILE *load_fd;

// External Functions //

#define MAX_BUF_SIZE (3*MAX_STRING_LEN+40)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Should we check whether MAX_BUF_SIZE is defined? Something along the lines of

#ifndef MAX_BUF_SIZE
#define MAX_BUF_SIZE ...
#endif

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but in that case there should be a consistent "universal" value is - the safe value would obviously be the biggest. My version of the pull request currently gives

$ grep --color -rn "define.*MAX_BUF_SIZE " code/*
code/core_io_tree.c:19:#define MAX_BUF_SIZE (3*MAX_STRING_LEN+20)
code/core_save.c:17:#define MAX_BUF_SIZE (2*MAX_STRING_LEN+40)
code/io/tree_binary.c:23:#define MAX_BUF_SIZE (3*MAX_STRING_LEN+40)

which is not globally consistent.

So

#ifndef MAX_BUF_SIZE
#define MAX_BUF_SIZE (3*MAX_STRING_LEN+40)
#endif

would seem the safest in all three places.

void load_tree_table_binary(int32_t filenr)
{
int i, totNHalos;
char buf[MAX_STRING_LEN];
char buf[MAX_BUF_SIZE];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps allocate to MAX_BUF_SIZE + 1 (to avoid passing MAX_BUF_SIZE - 1 to snprintf). This would be true here and elsewhere in this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no particular preference either way, though I would tend to agree. I guess we could ask "which convention is a typical cosmologist most likely to expect?"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see that the simplest option is for me to propose an updated pull request. And that the - 1 only occurs in one place. I'm preparing an update...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the best way is to pass MAX_STRING_LEN - 1 as the second parameter to snprintf. That way calculations for MAX_BUF_SIZE will still be correct.

code/main.c Outdated
@@ -16,7 +16,8 @@

#include "io/io_save_hdf5.h"

char bufz0[1000];
#define MAX_BUFZ0_SIZE (3*MAX_STRING_LEN+20)
char bufz0[MAX_BUFZ0_SIZE]; /* 3 strings + max 19 bytes for a number */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you have the extra byte already accounted for - but might be good to change MAX_BUFZ0_SIZE to the 3*MAX_STRING_LEN + 19, and then add the + 1 in the allocation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a few (2 or 3) unused bytes is low risk. Nobody's going to hide exploitable code there. The risk is people overwriting the buffer and jumping into RAM where they're not supposed to go. Caveat: this assumes that I understand the security issue here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not worried about the wasted bytes - simply thinking about being explicit about the calculation. The extra + 1 would show that that was the space for terminating NULL byte.

@@ -58,6 +58,7 @@ double infall_recipe(int centralgal, int ngal, double Zcurr)
infallingMass =
reionization_modifier * BaryonFrac * Gal[centralgal].Mvir - (tot_stellarMass + tot_coldMass + tot_hotMass + tot_ejected + tot_BHMass + tot_ICS);
// reionization_modifier * BaryonFrac * Gal[centralgal].deltaMvir - newSatBaryons;
UNUSED(newSatBaryons); /* for the moment this is not used */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@@ -7,7 +7,7 @@
#include "core_allvars.h"
#include "core_proto.h"


#define UNUSED(foo) (void)(foo)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move this to core_allvars.h. Or at least undefine UNUSED at the bottom of the file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it were shifted to core_allvars.h, then yes, I agree that undefining UNUSED at the end would make sense. On the other hand, core/model_infall.c is not going to be included in any other file, so I'm not convinced of the point of undefining it.


sprintf(fname, "%s", FileWithSnapList);
snprintf(fname, MAX_STRING_LEN, "%s", FileWithSnapList);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this comment further down - but would be good to either allocate with a MAX_STRING_LEN + 1 or pass MAX_STRING_LEN - 1 to snprintf

@@ -10,7 +10,8 @@
void read_parameter_file(char *fname)
{
FILE *fd;
char buf[MAX_STRING_LEN], buf1[MAX_STRING_LEN];
#define MAX_BUF_SIZE_FILE_LIST (3*MAX_STRING_LEN)
char buf[MAX_BUF_SIZE_FILE_LIST], buf1[MAX_STRING_LEN];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are fixing this, may be we should put buf1 in the next line

@manodeep
Copy link
Collaborator

While I said "request changes", really the only necessary change is the first one

@manodeep
Copy link
Collaborator

And many thanks for this PR and improving the code quality!

This commit implements several small fixes more or
less as proposed by Manodeep Singh at
darrencroton#17 .

These compile for me without error either with gcc-10.2.0
using `make OPTIONS=-fcommon` or with gcc-6.3.0-18+deb9u1.

I don't have a convenient regression test ready to test it, so
feel free to do a regression test before merging. The previous
version worked as implemented in the full `elaphrocentre`
galaxy formation pipeline - https://codeberg.org/boud/elaphrocentre .

Possible TODO: There's a section of `code/io/tree_hdf5.c` with
`char fname[1000]` and several `sprintf` statements. These should
are not warned about by gcc-10.2.0 and there's probably no point
being pedantic and switching them to `snprintf` statements.
This commit implements several small fixes more or
less as proposed by Manodeep Sinha at
darrencroton#17 .

These compile for me without error either with gcc-10.2.0
using `make OPTIONS=-fcommon` or with gcc-6.3.0-18+deb9u1.

I don't have a convenient regression test ready to test it, so
feel free to do a regression test before merging. The previous
version worked as implemented in the full `elaphrocentre`
galaxy formation pipeline - https://codeberg.org/boud/elaphrocentre .

Possible TODO: There's a section of `code/io/tree_hdf5.c` with
`char fname[1000]` and several `sprintf` statements. These should
are not warned about by gcc-10.2.0 and there's probably no point
being pedantic and switching them to `snprintf` statements.
@broukema
Copy link
Author

broukema commented Oct 12, 2020

@manodeep So I see that "conversation" is another non-git service that github provides on top of git and risks creating vendor lock-in, because it's not easy to shift to other git repository hosters, and is nevertheless useful for people who wish to trace what happened, when and why in order to debug or tidy up code.

Anyway, I've updated this pull request. Thanks for the review! :)

@@ -120,7 +121,7 @@ int main(int argc, char **argv)
else
fclose(fd);

sprintf(bufz0, "%s/%s_z%1.3f_%d", OutputDir, FileNameGalaxies, ZZ[ListOutputSnaps[0]], filenr);
snprintf(bufz0, MAX_BUFZ0_SIZE, "%s/%s_z%1.3f_%d", OutputDir, FileNameGalaxies, ZZ[ListOutputSnaps[0]], filenr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parameter should be MAX_BUFZ0_SIZE - 1

@@ -111,7 +112,7 @@ int main(int argc, char **argv)
for(filenr = FirstFile; filenr <= LastFile; filenr++)
#endif
{
sprintf(bufz0, "%s/%s.%d%s", SimulationDir, TreeName, filenr, TreeExtension);
snprintf(bufz0, MAX_BUFZ0_SIZE, "%s/%s.%d%s", SimulationDir, TreeName, filenr, TreeExtension);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parameter should be MAX_BUFZ0_SIZE - 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants