Discussion:
[asterisk-dev] Deadlock in pthread_exit due to lazy binding with libgcc
Yousf Ateya
2015-07-15 13:41:43 UTC
Permalink
Dear,

I started to see a strange deadlock in some asterisk nodes. For every
call, when calling pthread_exit from pbx_thread, the caller thread is stuck
inside pthread_exit.

After a while, there will be tens-of-thousands of threads having the same
backtrace. After some googling, I found this happens because of the default
lazy linking of gcc linker.

Related issue of stackoverflow:
http://stackoverflow.com/questions/11954527/dlopen-malloc-deadlock

Tried to recompile asterisk using:
export LDFLAGS=-Wl,-z,now
./configure && make && make install

and this deadlock problem didn't happen again; the problem cause is lazy
binding with libgcc.

Shall we add this option by default or add it in menuselect?

I am using Asterisk 13.4 compiled on Ubuntu 14.04 64 bit with gcc 4.8.2,
but probably this applies to other OSs/compilers.


============
Deadlock backtrace
============

Thread 6138 (Thread 0x2acfc684b700 (LWP 12259)):
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00002ace66628672 in _L_lock_953 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00002ace666284da in __GI___pthread_mutex_lock (mutex=0x2ace64b1b968
<_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:114
#3 0x00002ace6490c34e in _dl_open (file=0x2ace66630ccf "libgcc_s.so.1",
mode=-2147483647, caller_dlopen=0x2ace6662ea43 <pthread_cancel_init+35>,
nsid=-2, argc=1, argv=0x7fff1f751498, env=0x1907320)
at dl-open.c:613
#4 0x00002ace670c30f2 in do_dlopen (ptr=***@entry=0x2acfc684ad80) at
dl-libc.c:87
#5 0x00002ace64907ff4 in _dl_catch_error (objname=0x2acfc684ad60,
errstring=0x2acfc684ad70, mallocedp=0x2acfc684ad50, operate=0x2ace670c30b0
<do_dlopen>, args=0x2acfc684ad80) at dl-error.c:187
#6 0x00002ace670c31b2 in dlerror_run (args=0x2acfc684ad80,
operate=0x2ace670c30b0 <do_dlopen>) at dl-libc.c:46
#7 __GI___libc_dlopen_mode (name=<optimized out>, mode=<optimized out>) at
dl-libc.c:163
#8 0x00002ace6662ea43 in pthread_cancel_init () at
../nptl/sysdeps/pthread/unwind-forcedunwind.c:52
#9 0x00002ace6662ec0c in _Unwind_ForcedUnwind (exc=0x2acfc684bd70,
stop=***@entry=0x2ace6662cbc0 <unwind_stop>, stop_argument=0x2acfc684ae60)
at ../nptl/sysdeps/pthread/unwind-forcedunwind.c:129
#10 0x00002ace6662cd40 in __GI___pthread_unwind (buf=<optimized out>) at
unwind.c:129
#11 0x00002ace66627535 in __do_cancel () at pthreadP.h:280
#12 __pthread_exit (value=***@entry=0x0) at pthread_exit.c:29
#13 0x000000000057a523 in pbx_thread (data=***@entry=0x2acf0c171588) at
pbx.c:6773
#14 0x00000000005d734a in dummy_start (data=<optimized out>) at utils.c:1237
#15 0x00002ace66626182 in start_thread (arg=0x2acfc684b700) at
pthread_create.c:312
#16 0x00002ace6708747d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
--
Yousf Ateya,
StarkBits
www.starkbits.com
--
This e-mail message is intended only for the use of the intended recipient(s).
The information contained therein may be confidential or privileged,
and its disclosure or reproduction is strictly prohibited.
If you are not the intended recipient, please return it immediately to its sender
at the above address and destroy it.
Mark Michelson
2015-07-17 16:04:50 UTC
Permalink
Post by Yousf Ateya
Dear,
I started to see a strange deadlock in some asterisk nodes. For every
call, when calling pthread_exit from pbx_thread, the caller thread is
stuck inside pthread_exit.
After a while, there will be tens-of-thousands of threads having the
same backtrace. After some googling, I found this happens because of
the default lazy linking of gcc linker.
http://stackoverflow.com/questions/11954527/dlopen-malloc-deadlock
export LDFLAGS=-Wl,-z,now
./configure && make && make install
and this deadlock problem didn't happen again; the problem cause is
lazy binding with libgcc.
Shall we add this option by default or add it in menuselect?
<snip>
Post by Yousf Ateya
--
Yousf Ateya,
StarkBits
www.starkbits.com <http://www.starkbits.com>
Thanks for this report. Based solely on the man page for ld(1), it
sounds like load-time binding would, at most, cause module loading to
take longer. Are there any other potential issues to making this change?

Mark Michelson
Yousf Ateya
2015-07-19 19:39:04 UTC
Permalink
Here is the difference in loading time (on Intel i5 machine):

The default (with lazy linking): 1.422 seconds
With non-lazy linking: 1.852 seconds
Post by Yousf Ateya
Dear,
I started to see a strange deadlock in some asterisk nodes. For every
call, when calling pthread_exit from pbx_thread, the caller thread is stuck
inside pthread_exit.
After a while, there will be tens-of-thousands of threads having the
same backtrace. After some googling, I found this happens because of the
default lazy linking of gcc linker.
http://stackoverflow.com/questions/11954527/dlopen-malloc-deadlock
export LDFLAGS=-Wl,-z,now
./configure && make && make install
and this deadlock problem didn't happen again; the problem cause is lazy
binding with libgcc.
Shall we add this option by default or add it in menuselect?
<snip>
--
Yousf Ateya,
StarkBits
www.starkbits.com
Thanks for this report. Based solely on the man page for ld(1), it sounds
like load-time binding would, at most, cause module loading to take longer.
Are there any other potential issues to making this change?
Mark Michelson
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
asterisk-dev mailing list
http://lists.digium.com/mailman/listinfo/asterisk-dev
--
Yousf Ateya,
StarkBits
www.starkbits.com
--
This e-mail message is intended only for the use of the intended recipient(s).
The information contained therein may be confidential or privileged,
and its disclosure or reproduction is strictly prohibited.
If you are not the intended recipient, please return it immediately to its sender
at the above address and destroy it.
Sedat Karahancı
2015-07-19 20:28:36 UTC
Permalink
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-dev
Sedat Karahancı
2015-07-19 20:29:28 UTC
Permalink
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-dev
Matthew Jordan
2015-07-19 20:48:35 UTC
Permalink
On Sun, Jul 19, 2015 at 3:28 PM, Sedat Karahancı <***@taxim.cab> wrote:

Hi everybody
We are looking for freelancers for a Push To Talk Project like Zello and
Voxer.
We still could not decide to choose Asterix or ? And we are looking for
people freelancer or consultant for supporting us on this project ?
Best,
Sedat
Hijacking a conversation is not a good way to hire consultants or get help
for your project.

The asterisk-dev list is for discussion of the Asterisk source code. For
commercial related discussions, please use the asterisk-biz list.

http://lists.digium.com/mailman/listinfo/asterisk-biz
--
Matthew Jordan
Digium, Inc. | Director of Technology
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org
Loading...