[maemo-developers] Autobuilder apt-get problems => failing builds

From: Jeff Moe moe at blagblagblag.org
Date: Fri Jan 1 17:00:35 EET 2010
On Friday 01 January 2010 11:37:24 you wrote:
> 2010/1/1 Jeff Moe <moe at blagblagblag.org>:
> > On Friday 01 January 2010 11:12:47 Ed Bartosh wrote:
> >> 2010/1/1 Jeremiah Foster <jeremiah at jeremiahfoster.com>:
> >> > On Jan 1, 2010, at 15:00, Ed Bartosh wrote:
> >> >> 2010/1/1 Andrew Flegg <andrew at bleb.org>:
> >> >>> Hi,
> >> >>>
> >> >>> Attempting to upload a new version of vim to the Fremantle
> >> >>> auto-builder, I get the following failure:
> >> >>>
> >> >>>
> >> >>> https://garage.maemo.org/builder/fremantle/vim_7.2-0maemo6/armel.roo
> >> >>>t.l og.FAILED.txt
> >> >>
> >> >> Should be fixed now:
> >> >> https://garage.maemo.org/builder/fremantle/vim_7.2-0maemo6/ Thanks
> >> >> for pointing out to this.
> >> >>
> >> >> Somebody's changed sbdmock configuration on the build host. I don't
> >> >> know why, because it was working just fine before.
> >> >>
> >> >> This reminds me aphorisms like 'Too many cooks spoil the broth' or
> >> >> 'Many commanders sink the ship". I like better Russian one 'Seven
> >> >> babysitters have a child without eye'. Autobuilder reminds me that
> >> >> child sometimes.
> >> >
> >> > Funny, I was thinking the exact opposite. If more people had access,
> >> > then more than one person could fix it.
> >>
> >> I doubt that. What I can see is that more people can break it.
> >> If you need a proof - give everyone root access to that box :)
> >
> > Starting with me.  :)
> >
> > Though I must say things seem down an awfully lot and we just sit around
> > waiting for someone to fix it. How does Fedora do it? I imagine they have
> > a number of people with access.
> 
> OK, let's look at this particular case. Autobuilder was broken for
> about 15 hours. There were about 40 packages uploaded and failed
> during that time. Before that change builder was working fine.
> 
> I really doubt that it would be better to have doesns of people to
> break this than 1-2 to fix. Even if those who can breake it could fix
> it.

Well, you'd have a better case if things worked more reliably.

> Anyway it was not working for a long time and people became confused.
> And now I'm restarting all those 40 builds manually.

I think we should look to Fedora since they have a similar arrangement: 
"community" distribution with corporate overlord.

This is how they do it:

* IRC channel of admin issues: #fedora-admin and #fedora-noc where you can 
watch things "live".

* Standard Operating Procedure for outages:
https://fedoraproject.org/wiki/Outage_Infrastructure_SOP

* *PAGER* access, available to the public, where you can page one of 9 admins 
(a bit unbelievable, actually):
https://admin.fedoraproject.org/pager

* More people would know how the whole *.maemo.org infrastructure actually 
worked if information about it was public. The joke is that it runs on a N700. 
But people can make this joke because the actual server set up is known by 
only a few. Compare that to this dream:
https://fedoraproject.org/wiki/Category:Infrastructure_SOPs

Anyway, they are doing things far better and I don't see people griping about 
outages over there much at all.

What's the procedure for Maemo? Dive into #maemo-devel and hope someone knows 
WTF is up? Their answer is usually "wait for x-fade". Post to talk.m.o.? Hit 
reload on qaiku? Post a comment there? Add more here?
https://bugs.maemo.org/show_bug.cgi?id=5818

Surprisingly I was told by an @nokian that reporting to that bug *was* the 
correct place to report outages (!).


Anyway, there are organizations all around the world that run servers 24/7/365 
with minimal outages that have more than 2 admins with access. That is 
obvious. The maemo infrastructure is no where near approaching 99% uptime (let 
alone .999s). A mere "40" submitted builds is also a loss of time and momentum 
of many developers...

-Jeff
More information about the maemo-developers mailing list