[Click] [PATCH] usermode Router::run_selects() and POLLERR|POLLHUP

Mon Dec 8 02:37:09 EST 2003

Eddie, here's another little tweak to run_selects()....

poll() returns not just POLLIN|POLLOUT, but POLLERR|POLLHUP too,
plus some detail bits which are never asserted without the above bits,
and are more system dependant.

Current code delivers events to read/write element handlers only on
POLLIN|POLLOUT, so one of the ERR|HUP events will cause continuous
firing of poll().

I'd originally noticed this problem when playing with connection-oriented
elements, where a transport reset/shutdown event is possible.
Recently misapplied a click configuration which mentioned a NIC that wasn't
there, and got tight loop on poll(), continually delivering POLLERR on the
FromDevice fd, but never calling FromDevice's registered read handler.

Applied the attached patch, which I had been exploring for the earlier case,
and the 100% cpu usage vanished, allowing the same click config to serve
one-NIC or two-NIC configurations without change.

Calling handlers on these events should cause read()/write(), potentially
clearing the error status.  This could often simply be by read/write reporting
the error, thus the underlying service, be it driver/raw_socket_iface/pcap,
can consider the event delivered.
In other cases it may be necessary for the event handlers to take special
action, but in most cases this would only be in response to an error reported
by usual operations (like read()), so adding no code to the error-less path.

Adding this patch will allow elements to handle exception events, but whether
they handle them correctly is varies by element and/or platform.
Without it no exception events can be handled, always leading to tight loop.

I had considered passing an extra parameter to the elements' handlers, giving
them the poll->revent bits, but it's not necessary for the two contexts in 
which I've needed this, and I doubt it ever is:
The spirit of poll() is to convey state change, not to pass state information,
so there should be nothing present in revents which a subsequent read() or
write() or recvfrom() or ... would not also reveal.
Conversely, there should be no event indication by poll() which is always
ignored, because only a file-op on that descriptor (be it read/write/close/..)
can change the state, preventing the condition from firing the next poll()
immediately.

Yep, the above is pedantic.
No, the attached patch doesn't follow it, it assumes IN|OUT|ERR|HUP are all
the primary events.  Not sure what the standards say, and how relevent they
are, but on Linux at least it seems POLLPRI is another primary event, usually
asserted with IN|OUT, but in some cases (eg 2.4 wan support) asserted alone.

Similar changes could be made to the select() logic in the !HAVE_POLL_H case,
listenign for exception events there too, and calling read/write handlers,
but I don't have a poll-less platform to test it on.


Thanx for some great work.

^..^ peter swain
(oo) swine at pobox.com
-------------- next part --------------
sd diff -u -- ./click/lib/router.cc

--- /tmp/sd15580.x	Sat Dec  6 16:12:36 2003
+++ ./click/lib/router.cc	Sat Dec  6 16:11:18 2003
@@ -1512,7 +1512,7 @@
     perror("poll");
   else if (n > 0) {
     for (struct pollfd *p = _pollfds.begin(); p < _pollfds.end(); p++)
-      if (p->revents & (POLLIN | POLLOUT)) {
+      if (p->revents) {
 	int pi = p - _pollfds.begin();
 
 	// Beware: calling 'selected()' might call remove_select(), causing
@@ -1520,9 +1520,11 @@
 	// out.
 
 	int fd = p->fd;
-	int read_elt = (p->revents & POLLIN ? _read_poll_elements[pi] : -1);
-	int write_elt = (p->revents & POLLOUT ? _write_poll_elements[pi] : -1);
-
+	int read_elt = (p->revents & (POLLIN|POLLERR|POLLHUP)
+		? _read_poll_elements[pi] : -1);
+	int write_elt = (p->revents & (POLLOUT|POLLERR|POLLHUP)
+		? _write_poll_elements[pi] : -1);
+	
 	if (read_elt >= 0)
 	  _elements[read_elt]->selected(fd);
 	if (write_elt >= 0 && write_elt != read_elt)