Re: FastCGI / Apache problems

Mark Brown (mbrown@OpenMarket.com)
Mon, 26 Aug 1996 12:18:34 -0400

Message-Id: <199608261618.MAA18557@breckenridge.openmarket.com>
To: fastcgi-developers@OpenMarket.com
Subject: Re: FastCGI / Apache problems 
In-Reply-To: <199608261453.KAA06901@bill-graham.nfic.com> 
Date: Mon, 26 Aug 1996 12:18:34 -0400
From: Mark Brown <mbrown@OpenMarket.com>


Bob Ramstad says:

    is ANYONE using FastCGI in a production environment?

I believe we have a number of customers using the OM-Secure WebServer
in production environments, particularly OM-Axcess customers.
OM-Transact is starting to exploit FastCGI beginning with a
patch to the current (2.2) release; I don't know if our
production commerce service is currently running this patch.

mod_fastcgi and Apache 1.1.1 have a number of known problems, but
your report doesn't match them, so you may have found some
new problems.  I'm certainly interested in getting to the bottom
of these problems.

    one of our sites was recently converted to use FastCGI with Apache
    1.1.1 as the server.  this site gets around 300,000 hits per week.
    for reference, our server averages around 3 requests per second
    over a complete day.  we have seen a lot of problems --- server
    errors, extreme delays when visiting the site --- which never
    occurred with a straight CGI program.  of course, when things are
    working well, performance is greatly improved.

    which pound the site repeatedly and we can definitely reproduce
    all of these problems.  some "evidence" which might help:

    * compilation on SunOS of libfcgi.a wasn't at all clean, many
    warnings.

I just did a build on SunOS 4.1.4 and also noticed lots of warnings
for missing function declarations.  On my system, I was unable to
locate a SunOS header file defining getpeername or fclose, to pick two
random offenders.  (Can you find them on your system?)  These functions
are rather well standardized so I doubt that these warnings
are symptomatic of a real problem.  But I could be wrong.

The kit builds cleanly on Solaris 2.5.

    * quite often we'll get a "FastCGI: /foo/bar/baz terminated due to
    signal" error.

Unfortunately mod_fastcgi doesn't extract the signal number.
Does the application give a core dump?

    * occasionally we'll get a "mod_fastcgi.o: 1365 assert failed (len >
    0)" message.

This should certainly give you a core dump.  A stack trace with
variables would be a good start at understanding what's going on.

    * the system seems to be running a strange number of processes ---
    more processes than specified in srm.conf.

mod_fastcgi forks off a process manager process for each AppClass.
It should fork off a single process manager process.  That's problem
5 in the list below.

mod_fastcgi certainly needs work.  Here is my list of
known problems:

  1) Module can busy-wait in FastCgiDoWork; work-around is to
     set a select timeout of approximately 1 second.

  2) Module violates Apache buff abstraction and therefore doesn't work
     with SSL.

  3) Module fails to chmod the Unix domain listening sockets it creates,
     so protections are set according to the current umask.

  4) Module creates Unix domain listening sockets in /tmp.
     This *at least* requires documentation so sys admins can avoid
     deleting them; better would be to give control over where the
     listening sockets get created.

  5) Module forks too many process manager processes.  When Apache
     parent runs as root, these process manager processes end up
     running as root too, a mistake.

  6) Response header parser requires a space following CGI
     headers (e.g. "Location:/x/y/z" fails but "Location: /x/y/z"
     succeeds.)

I have coded up fixes to problems 1-3, and am working on 5 now.
When I finish that and get it tested there will be a new mod_fastcgi.
I'll just document 4 and 6 for now.

    --mark