[maemo-developers] Continued outages and lame updates

From: Jeff Moe moe at blagblagblag.org
Date: Mon Jan 4 22:22:20 EET 2010
On Monday 04 January 2010 15:36:17 Valerio Valerio wrote:
> Hi,
> 
> this email was sent to the council private list, but since there's nothing
> private here, and the email has some valuable points/suggestions, I'm
> replying here in the public mailing list.
> I did some censorship in the email in order to avoid flame wars, since
> there's some rants including names, if the author wants he can paste these
> parts here, IMO these parts are not valuable to the discussion, so should
>  be avoided :).

Well, if you send it here, I prefer you send the whole thing instead of saying 
I was ranting, flaming and including names. You probably shouldn't have sent it 
here without asking me first either  :(  It's not like I'm hard to get ahold 
of.

I mentioned GAN900 and no one else. Since it's now public, what I was 
referring to is logged here:
http://mg.pov.lt/maemo-bugs-irclog/%23maemo-bugs.2010-01-03.log.html
http://mg.pov.lt/maemo-irclog/%23maemo.2010-01-03.log.html


Here is the email in it's entirety:

------------------------------------------------------------------------------------

 From: Jeff Moe <moe at blagblagblag.org>
 To: council at maemo.org
 Subj: Continued outages and lame updates
 
I don't need to tell you that the *.maemo.org infrastructure is very broken. 
Surely you have experienced frequent outages yourselves. I haven't experienced 
such poor service from any other distro that I can remember, big or small.

I am writing to ask that some sort of Standard Operating Procedure be 
developed for handling outages and server updates. I have brought up some of 
these points on the maemo-devel mailing list ( 
http://lists.maemo.org/pipermail/maemo-developers/2010-January/023329.html ):

==========================================
I think we should look to Fedora since they have a similar arrangement: 
"community" distribution with corporate overlord.

This is how they do it:

* IRC channel of admin issues: #fedora-admin and #fedora-noc where you can 
watch things "live".

* Standard Operating Procedure for outages:
https://fedoraproject.org/wiki/Outage_Infrastructure_SOP

* PAGER access, available to the public, where you can page one of 9 admins 
(a bit unbelievable, actually):
https://admin.fedoraproject.org/pager

* More people would know how the whole *.maemo.org infrastructure actually 
worked if information about it was public. The joke is that it runs on a N700. 
But people can make this joke because the actual server set up is known by 
only a few. Compare that to this dream:
https://fedoraproject.org/wiki/Category:Infrastructure_SOPs

Anyway, they are doing things far better and I don't see people griping about 
outages over there much at all.

What's the procedure for Maemo? Dive into #maemo-devel and hope someone knows 
WTF is up? Their answer is usually "wait for x-fade". Post to talk.m.o.? Hit 
reload on qaiku? Post a comment there? Add more here?
https://bugs.maemo.org/show_bug.cgi?id=5818

Surprisingly I was told by an @nokian that reporting to that bug was the 
correct place to report outages (!).


Anyway, there are organizations all around the world that run servers 24/7/365 
with minimal outages that have more than 2 admins with access. That is 
obvious. The maemo infrastructure is no where near approaching 99% uptime (let 
alone .999s). A mere "40" submitted builds is also a loss of time and momentum 
of many developers...

==========================================

After further discussion (not in the thread) with Nigel Jone's from Fedora, 
there are 18 people that can fix Fedora buildserver issues. They use "git" 
and "puppet" to track configuration changes. None of this is majick or unique. 
These are standard system administration practices which should be followed by 
Maemo.

Also, I have suggested using mirrors of content, which is what every other 
distro that I know of, big or small, community or corporate does. For instance 
see:

http://www.debian.org/mirror/list
http://www.novell.com/products/opensuse/downloads/ftp/int_mirrors.html
http://mirrors.fedoraproject.org/publiclist
http://api.mandriva.com/mirrors/list.php
http://www.ubuntu.com/getubuntu/downloadmirrors 
http://www.gentoo.org/main/en/mirrors2.xml
Even tiny distros like puppy linux have mirrors: 
http://www.puppylinux.com/download/

Here is a thread I started on maemo-devel about mirrors:
http://lists.maemo.org/pipermail/maemo-developers/2010-January/023363.html

I think the sysadmins should provide more info about what they are doing, what 
is planned, when the outages are coming, etc. This is in line with common 
practices. For instance, they could:

* Send emails to maemo-devel (or whatever list), advising of forthcoming 
outages.

* Send updates when they are aware of an outage (e.g. "the builder is known 
down, we're working on it")

* Send updates when work has been complete ("the new garage server is up")

Right now there are occasional "tweets" to qaiku, but this falls far short. If 
you need examples of this done correctly, let me know and I'll send you some.

STFU and just sit and wait is not a good answer, but I've been hearing that 
for a month now (including from a former Community Council member, 
generalantilles [GAN900], who even threatened to boot me from IRC!). I have 
wasted much of the last month just waiting for things to work correctly. These 
failures are killing the momentum, initiative, and goodwill of your developer 
community. Not only that, it is dividing them....

-Jeff
http://wiki.maemo.org/User:Jebba
http://maemo.org/profile/view/jebba/

P.S. http://maemo.org/community/council/ should probably have the 
council at maemo.org address on it and a list of current members.
More information about the maemo-developers mailing list