Salahuddin Ahmed
2018-07-12 22:01:29 UTC
Hello,
Recently I have experienced a deadlock situation in Asterisk Version
13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11
and 44). Thread 44 push a task to taskpool, thread 11 execute that task and
the thread 3 is a timer event. In thread 11 during a response going to
transmit a group lock has been acquired. and this thread was locked on this
point. In Thread 3 while retransmission timer triggered same group lock
again acquired and try to retransmit response. In this situation asterisk
was blocked.
Let me explain according to GDB log,
Thread-03:
#4 0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
Thread-11:
#6 0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=***@entry=0x7eff8bdd4568,
tdata=***@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
(gdb) p tsx->transport_flag
$2 = 1(TSX_HAS_PENDING_TRANSPORT)
Here Thread-03 is execute first and try to retransmit a message before
acquire a tsx group lock in Thread-11, because in thread 3
tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we
read this flag from thread 11 in frame 04 then we have found this flag was
set by someone before acquiring tsx group lock. That means during this
acquiring lock in Thread3, Thread 11 trying to send data on transport. So
this might be the case of deadlock, somehow timer thread 3 execute before
thread-11.
Here I enclose Thread-3 and Thread-11 gdb backtrace.
Could anyone please help me on this?
Thread-11:
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f00f62bf494 in _L_lock_952 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58)
at ../nptl/pthread_mutex_lock.c:114
#3 0x00007f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at
../src/pj/os_core_unix.c:1265
#4 0x00007f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at
../src/pj/lock.c:290
#5 0x00007f00b97ead22 in pj_grp_lock_acquire (grp_lock=<optimized out>) at
../src/pj/lock.c:478
#6 0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=***@entry=0x7eff8bdd4568,
tdata=***@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
#7 0x00007f00ba6de952 in pjsip_dlg_send_response (dlg=<optimized out>,
tsx=0x7eff8bdd4568, tdata=***@entry=0x7eff892ae668) at
../src/pjsip/sip_dialog.c:1531
#8 0x00007f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8,
tdata=***@entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231
#9 0x00007f0092874b4c in ast_sip_session_send_response
(session=***@entry=0x7eff89367700, tdata=0x7eff892ae668) at
res_pjsip_session.c:983
#10 0x00007f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591
#11 0x00007f0094b7a490 in sync_task (data=0x7efdbcf8b810) at
res_pjsip.c:4268
#12 0x00000000005d52ce in ast_taskprocessor_execute (tps=***@entry=0x267f9b0)
at taskprocessor.c:971
#13 0x00000000005dcc80 in execute_tasks (data=0x267f9b0) at
threadpool.c:1322
#14 0x00000000005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at
taskprocessor.c:971
#15 0x00000000005dd558 in threadpool_execute (pool=<optimized out>) at
threadpool.c:351
#16 worker_active (worker=<optimized out>) at threadpool.c:1105
#17 worker_start (arg=0x7f0080001680) at threadpool.c:1024
#18 0x00000000005e6dfa in dummy_start (data=<optimized out>) at utils.c:1238
#19 0x00007f00f62bd064 in start_thread (arg=0x7eff67c9b700) at
pthread_create.c:309
#20 0x00007f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread-3:
#0 sanitize_tdata (tdata=<optimized out>) at
res_pjsip/pjsip_message_filter.c:211
#1 filter_on_tx_message (tdata=0x7eff892ae668) at
res_pjsip/pjsip_message_filter.c:233
#2 0x00007f00ba6c5b0e in endpt_on_tx_msg (endpt=<optimized out>,
tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088
#3 0x00007f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718,
tdata=***@entry=0x7eff892ae668, addr=***@entry=0x7eff8bdd4644,
addr_len=16, token=***@entry=0x7eff8bdd4568,
cb=***@entry=0x7f00ba6d91b0 <transport_callback>) at
../src/pjsip/sip_transport.c:833
#4 0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
#5 0x00007f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568,
event=<optimized out>) at ../src/pjsip/sip_transaction.c:2882
#6 0x00007f00ba6d9b86 in tsx_timer_callback (theap=<optimized out>,
entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189
#7 0x00007f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0,
next_delay=***@entry=0x7f0090649e50) at ../src/pj/timer.c:644
#8 0x00007f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8,
max_timeout=***@entry=0x7f0090649ea0, p_count=***@entry=0x0) at
../src/pjsip/sip_endpoint.c:715
#9 0x00007f00ba6c66d7 in pjsip_endpt_handle_events (endpt=<optimized out>,
max_timeout=***@entry=0x7f0090649ea0) at
../src/pjsip/sip_endpoint.c:776
#10 0x00007f0094b7a118 in monitor_thread_exec (endpt=<optimized out>) at
res_pjsip.c:4345
#11 0x00007f00b97e4ca0 in thread_main (param=0x23e3c68) at
../src/pj/os_core_unix.c:541
#12 0x00007f00f62bd064 in start_thread (arg=0x7f009064a700) at
pthread_create.c:309
#13 0x00007f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thanks,
Salah Ahmed
Recently I have experienced a deadlock situation in Asterisk Version
13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11
and 44). Thread 44 push a task to taskpool, thread 11 execute that task and
the thread 3 is a timer event. In thread 11 during a response going to
transmit a group lock has been acquired. and this thread was locked on this
point. In Thread 3 while retransmission timer triggered same group lock
again acquired and try to retransmit response. In this situation asterisk
was blocked.
Let me explain according to GDB log,
Thread-03:
#4 0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
Thread-11:
#6 0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=***@entry=0x7eff8bdd4568,
tdata=***@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
(gdb) p tsx->transport_flag
$2 = 1(TSX_HAS_PENDING_TRANSPORT)
Here Thread-03 is execute first and try to retransmit a message before
acquire a tsx group lock in Thread-11, because in thread 3
tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we
read this flag from thread 11 in frame 04 then we have found this flag was
set by someone before acquiring tsx group lock. That means during this
acquiring lock in Thread3, Thread 11 trying to send data on transport. So
this might be the case of deadlock, somehow timer thread 3 execute before
thread-11.
Here I enclose Thread-3 and Thread-11 gdb backtrace.
Could anyone please help me on this?
Thread-11:
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f00f62bf494 in _L_lock_952 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58)
at ../nptl/pthread_mutex_lock.c:114
#3 0x00007f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at
../src/pj/os_core_unix.c:1265
#4 0x00007f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at
../src/pj/lock.c:290
#5 0x00007f00b97ead22 in pj_grp_lock_acquire (grp_lock=<optimized out>) at
../src/pj/lock.c:478
#6 0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=***@entry=0x7eff8bdd4568,
tdata=***@entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
#7 0x00007f00ba6de952 in pjsip_dlg_send_response (dlg=<optimized out>,
tsx=0x7eff8bdd4568, tdata=***@entry=0x7eff892ae668) at
../src/pjsip/sip_dialog.c:1531
#8 0x00007f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8,
tdata=***@entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231
#9 0x00007f0092874b4c in ast_sip_session_send_response
(session=***@entry=0x7eff89367700, tdata=0x7eff892ae668) at
res_pjsip_session.c:983
#10 0x00007f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591
#11 0x00007f0094b7a490 in sync_task (data=0x7efdbcf8b810) at
res_pjsip.c:4268
#12 0x00000000005d52ce in ast_taskprocessor_execute (tps=***@entry=0x267f9b0)
at taskprocessor.c:971
#13 0x00000000005dcc80 in execute_tasks (data=0x267f9b0) at
threadpool.c:1322
#14 0x00000000005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at
taskprocessor.c:971
#15 0x00000000005dd558 in threadpool_execute (pool=<optimized out>) at
threadpool.c:351
#16 worker_active (worker=<optimized out>) at threadpool.c:1105
#17 worker_start (arg=0x7f0080001680) at threadpool.c:1024
#18 0x00000000005e6dfa in dummy_start (data=<optimized out>) at utils.c:1238
#19 0x00007f00f62bd064 in start_thread (arg=0x7eff67c9b700) at
pthread_create.c:309
#20 0x00007f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread-3:
#0 sanitize_tdata (tdata=<optimized out>) at
res_pjsip/pjsip_message_filter.c:211
#1 filter_on_tx_message (tdata=0x7eff892ae668) at
res_pjsip/pjsip_message_filter.c:233
#2 0x00007f00ba6c5b0e in endpt_on_tx_msg (endpt=<optimized out>,
tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088
#3 0x00007f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718,
tdata=***@entry=0x7eff892ae668, addr=***@entry=0x7eff8bdd4644,
addr_len=16, token=***@entry=0x7eff8bdd4568,
cb=***@entry=0x7f00ba6d91b0 <transport_callback>) at
../src/pjsip/sip_transport.c:833
#4 0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
#5 0x00007f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568,
event=<optimized out>) at ../src/pjsip/sip_transaction.c:2882
#6 0x00007f00ba6d9b86 in tsx_timer_callback (theap=<optimized out>,
entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189
#7 0x00007f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0,
next_delay=***@entry=0x7f0090649e50) at ../src/pj/timer.c:644
#8 0x00007f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8,
max_timeout=***@entry=0x7f0090649ea0, p_count=***@entry=0x0) at
../src/pjsip/sip_endpoint.c:715
#9 0x00007f00ba6c66d7 in pjsip_endpt_handle_events (endpt=<optimized out>,
max_timeout=***@entry=0x7f0090649ea0) at
../src/pjsip/sip_endpoint.c:776
#10 0x00007f0094b7a118 in monitor_thread_exec (endpt=<optimized out>) at
res_pjsip.c:4345
#11 0x00007f00b97e4ca0 in thread_main (param=0x23e3c68) at
../src/pj/os_core_unix.c:541
#12 0x00007f00f62bd064 in start_thread (arg=0x7f009064a700) at
pthread_create.c:309
#13 0x00007f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thanks,
Salah Ahmed