-
Notifications
You must be signed in to change notification settings - Fork 76
Critical - crash in multithreaded environment, when using nedrealloc (yes, again) #15
Comments
previous versions from 2013 crashing too |
The last time I put together some form of release was for v1.10 beta 3 in 2012. I agree that nedmalloc definitely needs a regularly executed stress test suite, and in fact I have recently purchased a server for a Jenkins CI which you can see at https://ci.nedprod.com/. As it happens, I was fired from BlackBerry on Monday, so I suddenly have some free time. I'll look into figuring out some form of automated solution to the many breakages which have slipped into nedmalloc over the years by accident. Thanks for reporting the bug Geri. You're a trooper. Niall |
thankyou for creating and supporting this wonderfull software. i suggest to create a stresstest based on my multithreaded testcases, like the current one, and those i posted before. they are simply enough, and its easy to debug them. |
hi. was you able to repeat the crash? |
It'll be a few days yet. I'm currently mentoring gsoc and I need to finish two work items to enable the student to proceed as he is waiting on me. |
i did some test: |
It is on my radar. Integrating the new items from http://boostafio.uservoice.com/forums/218980-boost-afio-feature-request before GSoC ends in ten days has proved harder than expected. |
i wrapped nedrealloc to nedfree and nedmalloc functions in my code until the fix done, no need to hurry |
I should be able to look into this now. Can you put the test case above, which is too mangled to make much sense, into a gist so I can get it demangled? Thanks. |
https://gist.github.com/Gerilgfx/6953861 i hope it was this one. |
Bad news: I can't replicate this on my Ubuntu 12.04 x64 machine with a i7-3770K CPU. I tried: GCC v4.6.4 It could be a timing issue where your CPU finds a race mine doesn't. Or it could be a bug in GCC 4.7.1 which has since been fixed. There is some order sensitive code in the threadcache, a slight reordering from what is specified in the code would introduce exactly this kind of race. In theory a compiler shouldn't do such a reorder, but maybe there was a bug in GCC v4.7.1. Niall |
Also, try setting THREADCACHEMAX to 0. That will help me determine if it's dlmalloc or the thread cache which is at fault. |
with: test 3 begins... |
changing: test[i]=realloc_vpool(test[i], iteracio); to: if(test[i]) free_vpool(test[i]); works. i think the bug is precisely in your realloc implementation. |
this works too. |
changed to:
result: test 3 begins... |
That's very useful - it'll either be dlmalloc or my changes to dlmalloc. I'll try rewalking the code path. |
pastelink.me/dl/15d838#sthash.Yt123CLX.dpuf here is a binary compiled on my computer from this source. |
I'm off to the GSoC mentors summit in California tomorrow, then onto Seattle returning in about a week. Thanks for the binaries, and I'll look into them when I get back. |
Crap, sorry pastelink.me/dl/15d838#sthash.Yt123CLX.dpuf deletes its files after 7 days. I'll be at home for the next week though, definitely can run it and see what happens. |
okay, leave me a message and i will reupload |
Message to here, or do you want me to PM you or something? |
just leave message here. it notifyes me in email. |
btw if you use skype or IM like that, i can pick you up there too. |
any step forward? |
If you remember (see thread above) I was waiting on some precompiled binaries from you as I was unable to replicate the problem here. I needed to rule out compiler/platform differences. Note that currently everything I own is in a container being shipped from Canada to Ireland, and so any ability to run anything will be delayed until the container arrives in February. In particular, right now my access to Linux is very restricted, but I may be able to borrow time on someone's server. |
i have sent it to your mail account back then. it seems you havent recived it. i will recompile them then and upload somewhere. |
http://www.sendspace.com/file/wae5fj (click on ,,Click here to start download from sendspace'') |
I have the file, I'll see if I can arrange access to a Linux box. Thanks Geri. |
okay, i am curious to see if it crashes or not. |
any success? |
Well I only got a hard line to the internet this past week, and therefore a stable SSH connection. But I admit I forgot, thanks for the remind. On the machine I have access to, a dual core Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz running Ubuntu 12.04 LTS x64, I ran nedmalloctester three times and saw nothing but success. Given that you compiled it using your compiler, it's probably a CPU timing difference. They're very tricky to track down especially as the usual race condition testing tools don't work so well with nedmalloc. Sorry to be so unhelpful. Niall |
i will have chance to test it on via cpu too after a few weeks, but that only have one core and 32 bit however. |
Generally timing bugs appear most on faster clocked CPUs with as many cores as possible. If you can lay your hands on a 32 core 3.8Ghz CPU, that would be the most useful! Probably better is to mark up nedmalloc with the metadata for a race condition solver. That's a lot of work though, and for me nedmalloc is nearing EOL. Niall |
i have downclocked my computer to ~3.3 ghz (previously i used it on ~3.4 ghz) |
for(int iteracio=1;iteracio<90;iteracio+=1){ try replacing the for cycle to this, this causes even faster crash for me. |
i tested two dlmalloc version:
both works fine. i find a nedmalloc from 2010, that crashes too. |
setting affinity masks:
the crash goes away. |
I have exactly same issue. On real world multithreaded app (1 main thread, 3 workers), on a 2 cores processor, it randomly crash with same error Can it a be a MALLOC_GLOBAL_LOCK issue ? |
I'll be honest, nedmalloc is pretty much EOL for me, system allocators are nowadays plenty fast enough. https://github.com/jemalloc/jemalloc is likely an excellent substitute for nedmalloc. |
Critical - crash in multithreaded environment, when using nedrealloc (yes, again)
crash appears when nedrealloc being called on multiple threads, reallocating small (or null) memory area to larger buffers again and again. The crash occurs mostly before reaching the first percent in the test. If the algo able to reach that point, software mostly survives. To reproduce the crash, its good to have other processes working too, for example, watching hd yourube video in the front.
Crash type: memory corruption
Version affected: newest (older versions not yet tested)
compiler flag:
g++ nedmalloctester3.c -o nedmalloctester -O3 -s -lpthread -m64
compiler version:
g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.7/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.7 --enable-ssp --disable-libssp --disable-libitm --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --program-suffix=-4.7 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.7.1 20120723 [gcc-4_7-branch revision 189773](SUSE Linux)
system version:
uname -r -a
Linux a1 3.4.6-2.10-desktop #1 SMP PREEMPT Thu Jul 26 09:36:26 UTC 2012 (641c197) x86_64 x86_64 x86_64 GNU/Linux
output:
g++ nedmalloctester3.c -o nedmalloctester -O3 -s -lpthread
./nedmalloctester
test 3 begins...
nedmalloc: nedprealloc() called with a block not created by nedmalloc!
Aborted
./nedmalloctester
test 3 begins...
0 percent finished
^C
./nedmalloctester
test 3 begins...
0 percent finished
^C
./nedmalloctester
test 3 begins...
0 percent finished
^C
./nedmalloctester
test 3 begins...
nedmalloc: nedprealloc() called with a block not created by nedmalloc!
Aborted
testcase:
// g++ nedmalloctester3.c -o nedmalloctester3 -O3 -s -pthread
include <stdio.h>
include <stdlib.h>
include <string.h>
include <pthread.h>
define USE_LOCKS 1
define USE_DL_PREFIX 1
define NDEBUG
define NO_NED_NAMESPACE
include "nedmalloc/nedmalloc_2013_apr/ori/nedmalloc.h"
include "nedmalloc/nedmalloc_2013_apr/ori/nedmalloc.c"
define malloc_vpool nedmalloc
define free_vpool nedfree
define realloc_vpool nedrealloc
/*#define malloc_vpool malloc
define free_vpool free
define realloc_vpool realloc*/
define TESTMEMMAX 1024_1024_2
void ** test=NULL;
int div_w=8; // block size to be sure that we touching pointers allocated from different thread ID-s
void malt(int thread){
for(int iteracio=1;iteracio<80;iteracio+=4){
for(int i=0;i<TESTMEMMAX;i++){
if(((i/div_w)%10)!=thread) continue; // 10 thread
// printf("%d\n", i);
test[i]=realloc_vpool(test[i], iteracio);
memset(test[i], 1, iteracio);
}
}
}
void *malt2(void * threadid){malt(1);}
void *malt3(void * threadid){malt(2);}
void *malt4(void * threadid){malt(3);}
void *malt5(void * threadid){malt(4);}
void *malt6(void * threadid){malt(5);}
void *malt7(void * threadid){malt(6);}
void *malt8(void * threadid){malt(7);}
void *malt9(void * threadid){malt(8);}
void *malt10(void * threadid){malt(9);}
void MallocStabTest3(){
printf("test 3 begins...\n");
}
int main(){
MallocStabTest3();
}
The text was updated successfully, but these errors were encountered: