Page MenuHomePhabricator

LuaSandbox's profiler is not thread-safe
Closed, ResolvedPublic

Description

HHVM package version: 3.1+20140630-1+wm1

Host: deployment-mediawiki01
ProcessID: 966
ThreadID: 7f8d457ff700
ThreadPID: 21324
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: heads/wikimedia-0-g4370ff7993b6e308aed94c5b9408fe20db93eea6
DebuggerCount: 0

  1. 0 virtual thunk to boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::program_options::invalid_option_value> >::rethrow() const at /usr/bin/hhvm:0
  2. 1 lua_sethook at /usr/lib/x86_64-linux-gnu/liblua5.1.so.0:0
  3. 2 HPHP::Extension::moduleInfo(HPHP::Array&) at /usr/lib/hphp/extensions/20131007/luasandbox.so:0
  4. 3 timer_sigev_thread at /build/buildd/eglibc-2.19/rt/../nptl/sysdeps/unix/sysv/linux/timer_routines.c:66
  5. 4 start_thread at /build/buildd/eglibc-2.19/nptl/pthread_create.c:312
  6. 5 clone at /build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Version: master
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=68407
https://bugzilla.wikimedia.org/show_bug.cgi?id=70177

Details

Reference
bz68413

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:41 AM
bzimport added a project: Scribunto.
bzimport set Reference to bz68413.

Still seeing these in beta with new builds:

hhvm 3.1+20140723-1+wmf1
hhvm-dev 3.1+20140723-1+wmf1
hhvm-fss 1.1-2
hhvm-luasandbox 2.0-2
hhvm-wikidiff2 1.3-2

Host: deployment-mediawiki02
ProcessID: 3498
ThreadID: 7f0de33ff700
ThreadPID: 4312
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: heads/wikimedia-0-g8b842db4e2db664a9b4d543047ae154a6dd59de6
DebuggerCount: 0

  1. 0 virtual thunk to boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::program_options::invalid_option_value> >::rethrow() const at /usr/bin/hhvm:0
  2. 1 lua_sethook at /usr/lib/x86_64-linux-gnu/liblua5.1.so.0:0
  3. 2 HPHP::Extension::moduleInfo(HPHP::Array&) at /usr/lib/hphp/extensions/20140702/luasandbox.so:0
  4. 3 timer_sigev_thread at /build/buildd/eglibc-2.19/rt/../nptl/sysdeps/unix/sysv/linux/timer_routines.c:66
  5. 4 start_thread at /build/buildd/eglibc-2.19/nptl/pthread_create.c:312
  6. 5 clone at /build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Change 148754 had a related patch set uploaded by Ori.livneh:
beta cluster: use luastandalone

https://gerrit.wikimedia.org/r/148754

Change 148754 merged by jenkins-bot:
beta cluster: use luastandalone

https://gerrit.wikimedia.org/r/148754

Patch was just a hack to switch to luastandalone mode until this crash can be examined/patched in hhvm/luasandbox.

Change 149029 had a related patch set uploaded by BryanDavis:
beta: Re-enable luasandbox

https://gerrit.wikimedia.org/r/149029

Change 149029 merged by jenkins-bot:
beta: Re-enable luasandbox

https://gerrit.wikimedia.org/r/149029

Updated beta servers to hhvm-luasandbox 2.0-3 build, changed config back to luasandbox and restarted hhvm fcgi container. Still seeing crashes:

Host: deployment-mediawiki01
ProcessID: 27248
ThreadID: 7fad89bff700
ThreadPID: 27750
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: heads/wikimedia-0-g8b842db4e2db664a9b4d543047ae154a6dd59de6
DebuggerCount: 0

  1. 0 virtual thunk to boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::program_options::invalid_option_value> >::rethrow() const at /usr/bin/hhvm:0
  2. 1 lua_sethook at /usr/lib/x86_64-linux-gnu/liblua5.1-c++.so.0:0
  3. 2 HPHP::Extension::moduleInfo(HPHP::Array&) at /usr/lib/hphp/extensions/20140702/luasandbox.so:0
  4. 3 timer_sigev_thread at /build/buildd/eglibc-2.19/rt/../nptl/sysdeps/unix/sysv/linux/timer_routines.c:66
  5. 4 start_thread at /build/buildd/eglibc-2.19/nptl/pthread_create.c:312
  6. 5 clone at /build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113

$ hhvm --version
HipHop VM 3.3.0-dev (rel)
Compiler: heads/wikimedia-0-g8b842db4e2db664a9b4d543047ae154a6dd59de6
Repo schema: ce469da81c1d8ec23f3a4aa889afadad8df5a759

$ dpkg -l|grep ^ii|awk '{printf "%-20s %s\n", $2, $3}'|grep hhvm
hhvm 3.1+20140723-1+wmf1
hhvm-dev 3.1+20140723-1+wmf1
hhvm-fss 1.1-2
hhvm-luasandbox 2.0-3
hhvm-wikidiff2 1.3-2

Change 149211 had a related patch set uploaded by Ori.livneh:
Disable LuaSandbox's profiling feature, to isolate bug 68413

https://gerrit.wikimedia.org/r/149211

Change 149211 merged by jenkins-bot:
Disable LuaSandbox's profiling feature, to isolate bug 68413

https://gerrit.wikimedia.org/r/149211

Let's make bug 70177 about the normal & emergency timers, and make this one about thread-safety issues in the profiling feature.

While I'm here:

  • timer_getoverrun can fail if the supplied timer ID is invalid. In that case it returns -1. We should check for that rather than indiscriminately add the return value to overrun_count and profiler_signal_count.

@ori: In https://gerrit.wikimedia.org/r/149211 you disabled the profiling to isolate this bug, then the bug was fixed, but we never turned the profiling back on.

I was about to point someone at this feature at en:Wikipedia:Lua/Requests when I noticed it wasn't showing up anymore. Is there any particular reason not to re-enable it now?