[Click] nsclick scheduling broken?!

Björn Lichtblau lichtbla at informatik.hu-berlin.de
Wed Oct 12 11:49:16 EDT 2011


Hi,

i just started playing around with click in combination with ns-3 and 
found two problems with scheduling and the simclick interface, using the 
latest git version.
I first suspected that this is a ns-3 problem (as ns-3 <-> click is 
quite new), but after checking this and a colleague reporting similar 
problems with his ns-2 simulations.
Maybe this problem has to do with the commits between
d0ee17ac365976472efe944a3f91cec8de20f7f0 (Timers: Be more careful about 
system time going backwards.)
and
2a9f63d0187e4739583160004bfdfd5a612ea77b (Timewarping: Fix it.)
There were a lot of changes regarding timers / scheduling.

Problem A:
===
RatedSource(\<AAAAAAAAAAAAAAAA>, RATE 2, LIMIT 6) ->
         Queue() ->
     EtherEncap(0xBBBB, 00:00:00:00:00:00, ff:ff:ff:ff:ff:ff) ->
     WifiEncap(0x00, 0:0:0:0:0:0) ->
     RadiotapEncap() ->
     Print("txing ", MAXLENGTH 50) ->
     Discard;
     //ToSimDevice(DEVNAME eth0);
===

RatedSource in a ns-3 simulation triggers only once, but then never 
again. I've dug in the click code and found that the problem is:
- when the first packet is created and sent/printed the element 
reschedules and simclick_sim_command(... SIMCLICK_SCHEDULE...) is 
properly called
- at the scheduled time the simulator calls simclick_click_run which 
ends in routerthread.cc/driver()
- the scheduled RatedSource timer makes a RatedSource task ready, but 
run_timers() happens after run_tasks(), so this task is never executed 
if the simulator does not call simclick_click_run again or for another 
reason (which is not the case in this simple example).

An easy fix for this on the simulator side was to just call 
simclick_click_run twice on each requested schedule, on the click side 
this change does it:

@@ -667,7 +693,9 @@ RouterThread::driver()
  #if CLICK_NS || BSD_NETISRSCHED
         // Everyone except the NS driver stays in driver() until the 
driver is
         // stopped.
-       break;
+       if(task_begin()->_next == this) break; // only exit if there is 
really no task ready now
+       //break;
  #endif

This is the condition run_tasks() also uses to determine if it can quit 
its loop. However i'm not sure about BSD_NETSIRSCHED or whether 
HAVE_TASK_HEAP needs to be obeyed here too, and this could produce 
problems with other elements?!


Problem B:
===
Script(label start, print "************** hello ***********", wait 1, 
goto start);
===
This script produces something like the "opposite" problem. When 
simclick_click_run is called for the first time after initializing the 
router (which happens at time 0 in the simulation: 0 secs, 0 usecs) 
click has a timer with expiry at 0, so click calls the simulation to 
schedule for that moment.

#if CLICK_NS
         // If there's another timer, tell the simulator to make us
         // run when it's due to go off.
         if (Timestamp next_expiry = timer_set().timer_expiry_steady()) {
         struct timeval nexttime = next_expiry.timeval();
         click_chatter("timer_set().timer_expiry_steady() = 
[%ld.%06ld]", nexttime.tv_sec, nexttime.tv_usec);
         simclick_sim_command(_master->simnode(), SIMCLICK_SCHEDULE, 
&nexttime);
         }
#endif

The simulation then immediatly calls simclick_click_run again at time 0, 
however the timer sitting there is not executed:
if (th->expiry_s <= _timer_check) { (timersec.cc in run_timers())
never applies, because expiry and current time are both 0. So it 
schedules this timer which is never executed again and again, producing 
an endless loop, staying in simulation time at 0.

I think the problem is either
a) "normal userlevel click" expects the time to somehow move forward 
here, which won't happen in a sim environment
or
b) that click expects the time to never be (0 secs, 0 usecs) at all?

One dirty fix on the simulator side is to reschedule click a little bit 
later in those situations, and it seems that it's really only needed at 
the start of a simulation, hinting for b), but i don't think this is a 
clean solution.

btw. going back to click-2.0 release fixes problem B, but not A, even if 
i apply my current fixes, which confuses me even more ;)

Regards, Björn




More information about the click mailing list