[FASTCGI] Threaded C fcgiapp implementation problems and questions
Jonathan Gray
jgray at streamy.com
Wed Apr 22 16:40:24 EDT 2009
If I disable keep-alives in lighttpd, I no longer see lots of connections
in lighttpd status in the read state. That seems to have been tied to
keep-alives.
How does that work with fastcgi and fcgiapp? Is there anything i need to
do to be able to take advantage of keep-alives? The build-up of extra
connections in weird states leads me to believe that the keep-alives are
not working.
On Wed, April 22, 2009 12:54 pm, Jonathan Gray wrote:
> Hello,
>
>
> I have a multithreaded, C FastCGI script using the fcgiapp library
> running on top of lighttpd.
>
> I'm having a recurring problem on my production environment that crops up
> after a few days straight of load around 20-40 concurrent connections.
>
> This script is implementing something called COMET
> http://en.wikipedia.org/wiki/Comet_(programming)
>
>
> It's basically using AJAX/XHR requests to simulate pushing to the client.
> The user opens an AJAX request to the script and the server keeps it
> loading until a message comes in from the server (it connects to a central
> server which sends messages to clients), or until we time it out. On
> the wikipedia page, this is described as Ajax with long polling /
> XMLHttpRequest long polling.
>
>
> This has been working for a very long time but recently as load has been
> increasing we started to see a weird behavior.
>
> All of a sudden, lighttpd/mod_fastcgi will start to reject all new
> connections. The log shows this error:
>
> 2009-04-01 12:21:33: (mod_fastcgi.c.3005) got proc: pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 25
> 2009-04-01 12:21:33: (mod_fastcgi.c.2494) unexpected end-of-file (perhaps
> the fastcgi process died): pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0
>
>
>
> The process is not dead, there are 24 other connections that are
> currently being properly handled. When these requests come in, the script
> does not see them at all (ie. FCGX_Accept_r does not return).
>
> After all the existing connections have dropped, it will then continue
> normal operation and start to accept new connections:
>
> 2009-04-01 12:22:00: (mod_fastcgi.c.1515) released proc: pid: 3664
> socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 2
> 2009-04-01 12:22:01: (mod_fastcgi.c.1515) released proc: pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 1
> 2009-04-01 12:22:03: (mod_fastcgi.c.1515) released proc: pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 0
>
>
> and then
>
> 2009-04-01 12:22:03: (mod_fastcgi.c.3005) got proc: pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 1
>
>
> The same PID (the process never crashed) then does start to see new
> connections and things go for another few days without problems, then the
> same thing happens again.
>
>
> The design of my application differs from the example threaded
> application because I do not keep a thread per connection, rather I use
> queues, timers, hash tables, etc to track the state of sessions and their
> FCGX_Request.
>
>
> Since I can't just use a FCGX_Request per thread, as done in the example,
> I pre-instantiate a large array of FCGX_Requests of size
> MAX_ALLOC_REQUESTS. I then loop through this array, sliding down one
> index each time. This array is significantly large that I do not get
> anywhere close to reusing a request that was not FCGX_Finish_r'd already.
> (this is set to 25,000 right now, in benchmarking i'm trying to get over
> 10k. i am nowhere near this in production where the bug happens).
>
>
> Is this a sane approach? Could I be messing something up with my
> allocating so many and doing FCGX_InitRequest on each.
>
> for(i=0;i<MAX_ALLOC_REQUESTS;i++) FCGX_InitRequest(&reqs[i],0,0);
>
>
> I am locking around the accept, so the accepting of connections is
> single-threaded:
>
>
> pthread_mutex_lock(&accept_mutex); rc = FCGX_Accept_r(&reqs[curreq]);
> nextreq = (curreq + 1) % MAX_ALLOC_REQUESTS;
> pthread_mutex_unlock(&accept_mutex);
>
>
> I have two potential threads that can close the connection. In all
> cases, the closing of the connection follows the form:
>
> FCGX_FPrintF(request->out,"...");
> FCGX_Finish_r(request);
>
>
>
> That request is still part fo the reqs[] and will be called again, much
> later, with FCGX_Accept_r.
>
> Again, is this right? I read that FCGX_Finish_r is thread-safe, so I'm
> not locking around that, there are potentially two threads running FPrintF
> and Finish_r simultaneously (but ALWAYS on different FCGX_Requests).
>
>
> I have increased all kinds of system limits like file/socket descriptor
> limits and memory limits. I have seen the requests "loop" through the big
> array of pre-allocated ones, and they are reused without a problem.
>
> One thing i'm also not sure about is how keep-alives and pipelining might
> interact with what i'm doing. When looking into the lighttpd status
> page, sometimes I noticed connections, after they are out of handle-req
> and the script has returned/finish_r'd it, they sit in a 'read' state for
> some time. The only pointer I've got w.r.t. that was it might be trying
> to read another request from the client?
>
>
> I'd really appreciate any kind of help. I'm a bit stuck and in any case
> could use some best practices advice.
>
> Thanks.
>
>
> Jonathan Gray
> _______________________________________________
> FastCGI-developers mailing list
> FastCGI-developers at mailman.fastcgi.com
> http://mailman.pins.net/mailman/listinfo.cgi/fastcgi-developers
>
>
>
More information about the FastCGI-developers
mailing list