"OS error: interrupted system call" while communicating with app

Helmut Oertel (oertelh@cs.tu-berlin.de)
Sun, 10 Aug 1997 18:05:02 +0200 (MET DST)

Date: Sun, 10 Aug 1997 18:05:02 +0200 (MET DST)
From: Helmut Oertel <oertelh@cs.tu-berlin.de>
To: fastcgi-developers@OpenMarket.com
Subject: "OS error: interrupted system call" while communicating with app 
Message-Id: <Pine.SOL.3.95.970810180443.5382A-100000@dub.cs.tu-berlin.de>


we are using FastCGI on an Apache Web Server (Version 1.21, 
FCGI Module Version 1.43, FCGI DeveloperKit Version 1.51). At the
moment we have approx. 300 000 pageviews and more than 600 000
FCGI-requests a day. 

Under high load, especially when the FCGI-daemons temporarily cannot
answer requests as fast as they are coming in, we are experiencing
communication problems between the FCGI module/application manager and
daemons. In the error log this shows up as follows

[Sun Aug 10 00:05:58 1997] access to
all-s-t.fcg failed for, reason: mod_fastcgi: OS error
d system call' while communicating with app
[Sun Aug 10 00:06:18 1997] mod_fastcgi: AppClass
/query-fireball-s-t.fcg pid 2309 terminated by calling exit with
status = 73.

_repeatedly_ (status 73 = pipe error)

At the moment we handle this situtation by sending a
SIGHUP to the server, which is very unsatisfactory, especially under
high load, because it takes quite a while before the server comes up
to speed again. We tried to use Module version 2b2 but it proved to be
very unstable. The patch mentioned some time ago (alarm(0)) did not do
any good either.

Apart from that we are experiencing problems with parentless FCGI
daemons on restart of the server (at the moment we are handling this
using pid-files).

Most important:
1.) What is the reason for the communication-problem as described
above? Is there a solution? If not, why not? Isn't OpenMarket
interested in the use of FastCGI for high performance web sites (this
problem seems to turn up on OpenMarket Webservers as well)?

2.) Is there a solution for the problem of parentless FCGI daemons?

3.) When will there be a stable version 2 of the FCGI Module?

4.) I have seen various patches and enhancements for the FCGI modules
and the developer kit on this mailing list - why aren't they
incorporated into the official releases on http://www.fastcgi.com? Why
is the Mail Archive at the the same site still not up to date?

5.) Who is doing the (official) development of the FastCGI stuff at
OpenMarket? There should be a coordinated effort to remove
those bugs in the FCGI module. Perhaps some of the subscribers of this
mailing-list (including myself) would take part in this effort.

Helmut Oertel

      (O O)
Helmut Oertel				| Technical University Berlin
E-Mail: Helmut@Oertel.com		| FB 13 (CS) - FLP KIT - FR6-10
W3: http://www.cs.tu-berlin.de/~oertelh | Franklinstr. 28/29         _______ _
Tel. (FR 6537, TUB): +49.30.314-25980   | D-10589 Berlin, Germany   (_   _  | |
Tel. (private):      +49.30.304 33 52   |                             | | | | |
FAX  (private):      +49.30.305 90 89   |                             `-' `---'

---------- Forwarded message ----------
Date: Thu, 24 Jul 1997 14:23:39 -0400
From: Sonya Rikhtverchik <rikhtver@OpenMarket.com>
To: fastcgi-developers@OpenMarket.com
Subject: Fastcgi "OS error: interrupted system call" still happens

Message-Id: <199707241814.OAA27372@rio.atlantic.net>
Reply-To: chip@pobox.com
In-Reply-To: <199707241424.KAA00337@u4-138.openmarket.com> from "Sonya 
Rikhtverchik" at Jul 24, 97 10:24:11 am
Content-Type: text
From: Chip Salzenberg <chip@rio.atlantic.net>
Subject: Re: Fastcgi "OS error: interrupted system call" still happens
To: rikhtver@OpenMarket.com (Sonya Rikhtverchik)
Date: Thu, 24 Jul 1997 14:14:12 -0400 (EDT)

According to Sonya Rikhtverchik:
> I'm using Fastcgi to support the page bn.newspage.com, but about once
> a day we will see this error start on the server and it doesn't go
> away untill you kick it.

Please forward this reply:

In the mod_fastcgi.c there is an alarm() system call, then a system
call that needs to time out (I forget what).  After that system call
there should be an alarm(0) call to cancel the alarm.  Try that.
- -- 
Chip Salzenberg         - a.k.a. -           <chip@pobox.com>
 (as character touches fingertips while hypnotizing a girl)
      "Here is the church; here is the steeple;
       now open the door and go - to - sleeple."  // MST3K

------- End of Forwarded Message