[PL #2829] RE: [PL #2937] Re: rmdir freezing system
Marc E. Fiuczynski via RT
devel at planet-lab.org
Fri Nov 5 13:17:10 EST 2004
Email Recipients (see http://www.planet-lab.org/Support)
Requestor: frankeh at watson.ibm.com, mlhuang at cs.princeton.edu
Ticket Ccs: mef at cs.princeton.edu, mlhuang at cs.princeton.edu, nagar at watson.ibm.com, sekharan at us.ibm.com
==================================================
What is far more critical is a fix to the memory controller. That is something we need ASAP.
Marc
> -----Original Message-----
> From: devel-community-bounces at planet-lab.org
> [mailto:devel-community-bounces at planet-lab.org]On Behalf Of Marc E.
> Fiuczynski via RT
> Sent: Friday, November 05, 2004 12:47 PM
> To: frankeh at watson.ibm.com; mlhuang at CS.Princeton.EDU
> Subject: RE: [PL #2829] RE: [PL #2937] Re: rmdir freezing system
>
>
> Email Recipients (see http://www.planet-lab.org/Support)
> Requestor: frankeh at watson.ibm.com, mlhuang at cs.princeton.edu
> Ticket Ccs: mef at cs.princeton.edu,
> mlhuang at cs.princeton.edu, nagar at watson.ibm.com, sekharan at us.ibm.com
>
> ==================================================
>
> We can survive without this fix for now. In fact, we are already
> deploying our first release candidate.
>
> > -----Original Message-----
> > From: frankeh at watson.ibm.com via RT [mailto:devel at planet-lab.org]
> > Sent: Friday, November 05, 2004 12:36 PM
> > To: mlhuang at CS.Princeton.EDU
> > Subject: Re: [PL #2829] RE: [PL #2937] Re: rmdir freezing system
> >
> >
> > Email Recipients (see http://www.planet-lab.org/Support)
> > Requestor: frankeh at watson.ibm.com, mlhuang at cs.princeton.edu
> > Ticket Ccs: mef at cs.princeton.edu,
> > mlhuang at cs.princeton.edu, nagar at watson.ibm.com, sekharan at us.ibm.com
> >
> > ==================================================
> >
> >
> >
> > Marc E. Fiuczynski via RT wrote:
> > > Email Recipients (see http://www.planet-lab.org/Support)
> > > Requestor: frankeh at watson.ibm.com, mlhuang at cs.princeton.edu
> > > Ticket Ccs: mef at cs.princeton.edu,
> > mlhuang at cs.princeton.edu, nagar at watson.ibm.com, sekharan at us.ibm.com
> > >
> > > ==================================================
> > >
> > >
> > >>>However, I am still seeing a freeze for our
> > >>>"/rcfs/taskclass/system" class, which is our
> > >>>default class. When I rmdir this one, I get an
> > >>>"infinite" stack trace dumped to the console.
> > >>
> > >>OK, that must be related to a different problem.
> > >
> > >
> > > Unfortunately, it is non trivial for me to attach a serial
> > console to the box.
> > > For this reason, I cannot get your stack trace for it today. I
> > tried debugging
> > > the same problem by running the same kernel as a guest on the
> > qemu pc emulator.
> > > It appears that rq_get_next_queue() in rq_get_next_task()
> > returns a queue data
> > > structure whose array (queue->array) is NULL. This causes all hell to
> > break loose and
> > > the system goes into the telltale infinite stack dump.
> >
> > OK, that is already more then we can ask for ...
> > Put a BUG_ON(queue->array == NULL) and stop the kernel right
> > then and there when it happens.
> >
> > I have an idea what is happening, actually I am certain I know.
> > Something to do with our optimizations on class dequeueing.
> > The local class runqueue is still enqueued despite it has not
> > task running anymore.
> >
> > Working with Haoqiang today on share accuracy and weight adjustment
> > problem. I should be able to get to this particular problem tomorrow
> > evening and fix it, can you survive until then ?
> >
> > >
> > > So maybe it is possible for the queue data structure to hang
> > around for a
> > > while and not be properly released.
> > > I'll try to track this down further by using gdb to walk around
> > > in the kernel running on the qemu pc emulator.
> > >
> > >
> > >>This is definitely a fix that needs to go in.
> > >
> > >
> > > Glad that the other bug definitely needs to go into your code.
> > >
> > >
> > >>I have not upgraded our x206 with your YUM information, since
> > >>I can break this with your kernel (but now ours).
> > >
> > >
> > > While you are waiting for me, please do the full upgrade. This
> > way you can play with the exact problems I see, and thereby also
> > harden your own code for other distro releases. ;)
> > >
> > > Cheers,
> > > Marc
> > >
> > >
> > >
> > >
> >
>
>
> _______________________________________________
> Devel-community mailing list
> Devel-community at lists.planet-lab.org
> http://lists.planet-lab.org/mailman/listinfo/devel-community
>
More information about the Devel-community
mailing list