locking problems

Jonathan Roy (roy@atlantic.net)
Thu, 29 May 1997 13:26:42 -0400

Message-Id: <3.0.1.32.19970529132642.006e7c50@mail.atlantic.net>
Date: Thu, 29 May 1997 13:26:42 -0400
To: skimo@breughel.ufsia.ac.be
From: Jonathan Roy <roy@atlantic.net>
Subject: locking problems


  Hey guys, have you ever looked closely at the locking mechanism used by
fastcgi? We haven't been able to find the previous bug we were debugging,
because this new problem keeps happening. :/ We set USE_LOCKING and such
since Solaris can't have multiple simultanious accept() calls. The fastcgi
processes sometimes hang. On the lastest crash, I loaded each process in
the debugger to see where it was at. The dumps are below. Notice that 2 of
them are in accept() at once, and the rest are correctly in the
AcquireLock() blocked state. The AcquireLock() itself seems right.

  Hmm. Here's a question, in the FCGI.xs file there is:

static int 
FCGI_Accept(void)
{
    if(!acceptCalled) {
        /*
         * First call to FCGI_Accept.  Is application running
         * as FastCGI or as CGI?
         */
        isCGI = FCGX_IsCGI();
        acceptCalled = TRUE;
    } else if(isCGI) {

  In FCGX_IsCGI(void) from fcgiapp.c we have:

    /*
     * Perform an accept() on the file descriptor.  If this is not a
     * listener socket we will get an error.  Typically this will be
     * ENOTSOCK but it differs from platform to platform.
     */
    fcgiSocket = accept(FCGI_LISTENSOCK_FILENO, (struct sockaddr *)
&fcgiSa.un,
                        &fcgiClilen);

  Notice however there is no locking done around this accept() call. None
of the locked up scripts where on this line number at the time I did the
dumps, but maybe it's possible it was in accept() here, one of the other
processes got the lock and called accept(), then this one finally exited.
Perhaps when you call Solaris accept() multiple times the first one works
and the rest hang... I don't know. ;) And, of course, how 4 seperate
processes could get to accept() in FCGX_Accept itself is a mystery since
they all have to call AcquireLock() first. And I'm told Solaris should work
for multiple accept()s anyways. :/

  Anyways, should there be an AcquireLock/ReleaseLock around the FCGX_IsCGI
accept call? We might try a SysV message queue approach and see how that
works.

-Jonathan

(gdb) where
#0  0xef5f5ff8 in fcntl ()
#1  0xef79abb0 in s_fcntl ()
#2  0xef533ebc in AcquireLock () at fcgiapp.c:2012
#3  0xef534178 in FCGX_Accept (in=0xef545818, out=0xeffffc04, err=0xeffffc00,
    envp=0xeffffbfc) at fcgiapp.c:2206
29947

(gdb) where
#0  0xef5f6280 in _getmsg ()
#1  0xef798d94 in __accept ()
#2  0xef798cc4 in _accept ()
#3  0xef5341c8 in FCGX_Accept (in=0xef545818, out=0xeffffc04, err=0xeffffc00, 
    envp=0xeffffbfc) at fcgiapp.c:2224
#4  0xef5346e0 in FCGI_Accept () at FCGI.xs:103
29948

#0  0xef5f6280 in _getmsg ()
#1  0xef7984b8 in _s_is_ok ()
#2  0xef799118 in __accept ()
#3  0xef798cc4 in _accept ()
#4  0xef5341c8 in FCGX_Accept (in=0xef545818, out=0xeffffc0c, err=0xeffffc08, 
    envp=0xeffffc04) at fcgiapp.c:2224
#5  0xef5346e0 in FCGI_Accept () at FCGI.xs:103
29949

(gdb) where
#0  0xef5f5ff8 in fcntl ()
#1  0xef79abb0 in s_fcntl ()
#2  0xef533ebc in AcquireLock () at fcgiapp.c:2012
#3  0xef534178 in FCGX_Accept (in=0xef545818, out=0xeffffc0c, err=0xeffffc08, 
    envp=0xeffffc04) at fcgiapp.c:2206
#4  0xef5346e0 in FCGI_Accept () at FCGI.xs:103
29951

(gdb) where
#0  0xef5f5ff8 in fcntl ()
#1  0xef79abb0 in s_fcntl ()
#2  0xef533ebc in AcquireLock () at fcgiapp.c:2012
#3  0xef534178 in FCGX_Accept (in=0xef545818, out=0xeffffc0c, err=0xeffffc08, 
    envp=0xeffffc04) at fcgiapp.c:2206
#4  0xef5346e0 in FCGI_Accept () at FCGI.xs:103
29952


--
Jonathan Roy - roy@idle.com -- Idle Communications, Inc.
Mail rhoefer@cdmag.com to advertise with the Games Domain! 
http://www.gamesdomain.com/ or http://www.gamesdomain.co.uk/