Tech Info 107: HELIOS services may not start after a server crash

HELIOS Tech Info #107

Thu, 1 Mar 2007

HELIOS services may not start after a server crash

We received reports about HELIOS services failing to start after a server crash.
An investigation of this problem shows that the starting HELIOS processes had problems cleaning their shared memory. Typically there are system logs like:
Feb 28 10:14:00 server afpsrv[578]: AquireSharedMutex: mutex held by non-existent process 27911
The reason is that the HELIOS processes were not able to clean up the locks due to the crash so that initializing the locktable failed.
This problem leads either to crashing processes during the start of the HELIOS services or to processes running in an endless loop.
To solve this problem call stop-helios now and verify via the ps command that there is really no HELIOS process running anymore. Now remove the locktable file via rm HELIOSDIR/var/run/locktable and start the HELIOS services again via start-helios.

Note:

Mac OS X does currently not call stop-helios when the server is rebooted so that it is possible that above described problem occurs on a server reboot.