FastCGI process manager for Apache web server.
What is the process manager ?
The process manager is an
executable program that is responsible for controlling the execution of
the FastCGI applications. The control is accomplished through starting
the applications, restarting them if they have terminated for some reason
and deciding when the FastCGI application should no longer run, as in the
case of dynamically started FastCGI apps.
Why is it needed ?
The FastCGI protocol depends
on the persistent applications being run during the web server execution
to handle incoming HTTP requests. In most cases these persistent
applications must be started prior to serving any requests forwarded to
them by the web server (an exception are the dynamically started FastCGI
applications). Should an application die during processing, it should
be immediately restarted, so that it handle the next request. It
is the job of the process manager to maintain this control over the execution
of the FastCGI applications.
What is going on when the webserver starts ?
When the web server reads
its configuration files and encounters a directive in which a certain module
maybe interested in, it calls a corresponding function in that module
to handle the processing of the directive. In FastCGI module mod_fastcgi,
these directives (AppClass and ExternalAppClass) are used to specify what
FastCGI applications should be started/restarted by the process manager
in order to handle incoming requests. After the web serve has finished
reading its configuration files, a process manager process is executed,
which in turn executes all preconfigured FastCGI applications, so that
they can begin accepting requests. The process manager then enters
an infinite loop until the time that the web server has informed it [process
manager] about the termination, at which time the process manager's duties
include terminating all child processes (FastCGI applications), freeing
up any used resources, such as memory, file descriptors, etc. and terminating.
How does the process manager interact with the dynamically started FastCGI
applications ?
Some of the process manager
functionality has been changed to accomodate the dynamically created FastCGI
processes, i.e. the processes that do not need to be configured using the
AppClass directive, but are started on their first invokation just like
CGI. Since the process manager has no prior information about the
name of the FastCGI application executable that needs to be created, it
has to obtain that information from somewhere. However, only the
web server knows the name of the FastCGI executable from the HTTP request
line, so it puts an information into a file and signals the process manager
to create the given FastCGI applications. It is important to note
that locking is needed to avoid multiple web server processes writing information
into the file at the same time, causing unpredictable behavior.
Once the process manager
obtains the name of the FastCGI executable from the file, it creates it
and treats it just like any other preconfigured FastCGI application.
An important exception is that the dynamically started applications will
NOT be restarted, since they may be faulty and we do not want them to run
or the process manager has decided that they should be terminated and so
it should not attempt to restart them (see below). The process manager
only created one copy of the given FastCGI application as it is not aware
of how "popular" this application is or will be. Therefore, it needs
to perform some data analysis to decide whether another copy of the applications
should be started or whether an application has too many instances running
and one of them should be terminated or whether all of them should be terminated.
The data to be analyzed comed via the same file mechanism from the web
server processes.
Since there are many web
server processes and only one process manager, the data processing is done
within the process manager process. This also means that the process
manager is implementing a policy, be deciding which FastCGI applications
should have another instance created and which should have their instances
terminated. The decisions on termination of the FastCGI apps are
made using the following heurisitics (top-level design):
-
given the last time period in which the data analysis was done (which is
specified via -updateInterval option to FCGIConfig), for each dynamically
started FastCGI application, calculate how much of that time was spent
by the application handling the request (the information about it comes
from the web server process that knows when the start and end times of
the request to the FastCGI application).
-
since the last data analysis period the web server might have been a spike
of incoming requests or a dead interval after a number of periods of great
activity. As such, you do not want to make your decisions on whether to
terminate the FastCGI application just based on the time interval elapsed
since the last data analysis was performed. Therefore, an exponential
decay was introduced to smooth out the possible spikes in the web server.
The formula for calculating the "new popularity" is:
-
result
= (1-gain)*old_value + gain*new_value; old_value = new_value;
-
where the old_value is the load of the FastCGI application calculated
during the previous data analysis periods, the new_value is the currently
calculated load factor and the gain is the value
in the range 0..1 specified via -gainValue directive. As one can
see, specifying the gain to be 1 would put more emphasis onto new data
(which is useful if you have very large updateInterval or if you would
like your FastCGI processes to be responsive, so that if many requests
come in, many copies of the FastCGI application are started immediately
and if none come in, many copies of the FastCGI application are to be terminated).
Specifying the value closer to 0 would put a larger emphasis onto old data,
smoothing out the starting and termination of the FastCGI processes.
-
once the new "normalized" load factor is calculated using the above formula,
it is compared with the parameters specified via -singleThreshhold and
-multiThreshhold options to the FCGIConfig directive. If the given
FastCGI application has multiple instances of it running and the "normalized"
load factor has fallen below -multiThreshhold, a single instance of that
application is marked to be terminated. If the application has only
a single copy executing and the "normalize" load factor has fallen below
-singleThreshhold, then the only instance is also marked for termination.
Since we would like at least a single copy of the application to be run
(even if it is not very popular), singleThreshhold parameter should be
much much less than multiThreshhold.
-
the process manager also does not want to kill off all FastCGI processes
during the long periods of inactivity (say at nighttime), since the cost
of starting a process may be high. A -minProcesses parameter is used
to tell the process manager NOT to mark anymore processes to be terminated
once this minimum is reached just for that case.
-
the actual termination of the dynamically started FastCGI processes
takes place later on, which involves some synchornization and locking techniques
to avoid receiving and processing a request by the application that is
about to be terminated. The actual termination is governed by the
-killInterval and -processSlack options to the FCGIConfig directive.
In the normal case, the process manager would perform its killing policy
(just terminating the FastCGI applications that have been marked as victims
during the data analysis stage) every n seconds, where n is the number
specified as a parameter to the -killInterval option. However, it
may come the time that the web servers are very busy servicing a lot of
FastCGI requests. The process manager governs the ceiling on the
total number of dynamically started processes that can be running at any
one time. So, if web server is asking to start a number of the FastCGI
processes, but starting them would exceed the ceiling value, the process
manager needs to implement the killing policy immediately, i.e.
-
if (#of
running processes + processSlack > #max processes) do_killing_policy_now();
The starting of the dynamically
started FastCGI processes is also governed by a few options to the FCGIConfig
directive. After the web server recieves a request for the FastCGI
application, it first determines if the current application has already
been started. It is has not, the web server sends a request to the
process manager, asking it to start the given FastCGI application.
After that the web server attempts to connect to FastCGI applicaton via
connect() system call. The attempt may fail, since the process manager
needs time to process the data and start the corresponding FastCGI application.
To allow for that contingency, a -startDelay option is used to specify
a time interval that the web server is going to wait trying to connect
to the application. If the connection is still unsuccessful, it will
issue another request to start FastCGI application (this is done in case
there are FastCGI application currently running but all of them maybe servicing
requests or server is implementing a killing policy or something else).
The web server keeps repeating its attempts until number of seconds specified
via -appConnTimeout parameter has expired, which is usually an indicator
of either bad configuration (low values for killInterval and very high
load or low value for multiThreshhold, etc) or a bad FastCGI script, where
the process manager is unable to start it.
What does the future hold for the process manager ?
The future enhancements to
the process manager includes complete separation of process manager functionality
from the web server core and providing web-server independent mechanism
of informing the process manager of the important events, such as start/termination
of the web server, configuration file parsing, etc. This would
also aid in porting the process manager functionality to other operating
systems. Finally, an overall redesign may be needed to optimize the
management of the FastCGI applications, possibly using OS-dependent facilities.
$Stanley Gambarin <stanleyg@cs.bu.edu>
1997/09/27 12:23:23 PM PST$