Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple run-time errors issued when MILC is compiled with address sanitizer enabled #37

Open
maddyscientist opened this issue Jul 7, 2020 · 1 comment

Comments

@maddyscientist
Copy link
Contributor

When MILC is compiled with address sanitizer enabled, multiple run-time errors are found when running the NERSC small RHMD benchmark on 3 processes.

The first issue is found in ranstuff.c, and looks like it is simply a case that seed is being given a number that exceeds that of what is representable in a 32-bit integer.

LAYOUT = Hypercubes, options = hyper_prime,
QMP with automatic hyper_prime layout
ON EACH NODE (RANK) 18 x 18 x 18 x 12
../generic/ranstuff.c:75:27: runtime error: signed integer overflow: 4563421 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:77:27: runtime error: signed integer overflow: -1903219036 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:79:27: runtime error: signed integer overflow: -806615499 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:81:27: runtime error: signed integer overflow: -2086651380 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:83:27: runtime error: signed integer overflow: -759901939 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:85:27: runtime error: signed integer overflow: -1405893900 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:87:27: runtime error: signed integer overflow: -981149083 * 1749223 cannot be represented in type 'int'
../generic/ranstuff.c:89:27: runtime error: signed integer overflow: -1085755044 * 1749223 cannot be represented in type 'int'
Mallocing 109.7 MBytes per node for lattice

The second issue is in io_lat4.c, and appears to be a similar 32-bit overflow issue.

mass 0.5
naik_term_epsilon 0
error_for_propagator 1e-08
rel_error_for_propagator 0
reload_parallel 18x18x18x36.chklat
forget 
../generic/io_lat4.c:1495:43: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
../generic/io_lat4.c:1496:43: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
../generic/io_lat4.c:1496:43: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
../generic/io_lat4.c:1495:43: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
../generic/io_lat4.c:1496:43: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
../generic/io_lat4.c:1495:43: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'
Restored binary gauge configuration in parallel from file 18x18x18x36.chklat
Time stamp Wed Nov  4 17:32:43 2015
Checksums 63b670e1 16bbc0f1 OK
Time to reload gauge configuration = 6.549597e-02
@maddyscientist
Copy link
Contributor Author

maddyscientist commented Jul 7, 2020

I should add, to enable compilation with address sanitizer (ASAN) and undefined behaviour sanitizer (UBSAN), the changes to the Makefile are trivial (supported on both clang and modern gcc)

CDEBUG += -fsanitize=address,undefined
LDFLAGS += -fsanitize=address,undefined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant