Page MenuHomePhabricator

Add all public lists to Gmane: provide .mbox lists archives
Closed, ResolvedPublic

Description

A number of our public mailing lists are not on Gmane ([[m:Mailing_lists/Overview/Gmane]]). Gmane is very useful to search archives, thread visualization etc.; nobody among our list admins should have anything against it if posting from Gmane itself is not allowed and email addresses are encripted. If they have problems (very unlikely) they can request removal of the list from Gmane.
I would need the ok of someone from the WMF (Cary?) or a sysadmin so that Gmane owner can add them (I've asked him and he's ok with it).

To import archives we need the .mbox archive, which is located at lists.wikimedia.org/mailman/private/<listname>.mbox/<listname>.mbox and is usually non private (try foundation-l – warning, 128 MB or such), but is private for all those lists.
If you don't want to make them non-private, please put them in some temporary place where Gmane owner can download them and mail me the list of URLs (I'll make the requests with the web form).


Version: unspecified
Severity: enhancement

Details

Reference
bz25105

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:14 PM
bzimport set Reference to bz25105.
bzimport added a subscriber: Unknown Object (MLST).

Mailman automatically makes files in mbox format available. They are split by month and compressed, e.g.

http://lists.wikimedia.org/pipermail/mediawiki-l/2010-December.txt.gz

Can't you just use those?

Gmane owner wants the complete mbox archives.
Merging the gzipped texts would be a huge waste of time and I'm not sure the results would be good, because those files are often weird.

Merging the gzipped texts is something easily scripted up and it could
be done without corrupting the the final mbox. If you want this done,
I suggest you do that instead of trying to get something that hasn't
been provided for 6 months.

Closing as FIXED.

(In reply to comment #3)

Merging the gzipped texts is something easily scripted up and it could
be done without corrupting the the final mbox. If you want this done,
I suggest you do that instead of trying to get something that hasn't
been provided for 6 months.

Closing as FIXED.

Re-opening.

This doesn't seem to be fixed at all. No idea where that resolution came from.

As far as I'm aware, Wikimedia lists are internally unsearchable (the subject of another bug). That is, resources such as Gmane are essential to make the mailing lists useful to people who are unable to search their own personal collections of the list. If resolving this bug would make lists searchable (both going backward and forward), it would be worth the minimal sysadmin effort (symlinking these to a public directory for a few minutes). Gmane has functionality that's critically important, so accommodating them seems reasonable. (And it seems rather unreasonable to expect anyone to download and re-upload a lot of individual files, esp. when there's a complete record available.)

This needs further discussion and consideration.

Since the mbox files are available to subscribers to the lists, Daniel Zahn asks:

So you want the mbox files just one single time to import them as
described on http://gmane.org/import.php but not on a regular basis?

Since subscribing the lists to Gmane means having a user/login for a
mailing list user, couldn't you (or Lars, the Gmane guy) just use that one to
login at the authentication screen in order to get the mbox file ?

I've asked him to follow up here, but it seems like this would be the thing to do.

i seem to always get the "Private Archives Authentication" screen when trying to directly access mbox files with those URLs, like f.e.:

https://lists.wikimedia.org/mailman/private/wikitech-l.mbox/wikitech-l.mbox
https://lists.wikimedia.org/mailman/private/foundation-l.mbox/foundation-l.mbox

That means, _also_ on those that are "private = 0" in our config and/or you have listed as being on Gmane already.

After logging in with my user/pass for that specific mailing list i do get the mbox files.

Did i get you right though, that you say you can (or could) access _some_ mbox files without any login?
Or do you say you are always logging in with some user/pass but you still just get mbox files for _some_ of them?

As to adding lists to Gmane, they are just subscribed via (http://gmane.org/subscribe.php) right?
I agree the "encryption/obfuscating mail addresses" option and the "Unidirectional (no posting allowed through Gmane)" options should definitely be switched on.

So you want the mbox files just one single time to import them as described on http://gmane.org/import.php but not on a regular basis?

Since subscribing the lists to Gmane means having a user/login for a mailing list user, couldn't you (or Lars, the Gmane guy) just use that one to login at the authentication screen in order to get the mbox file ?

(In reply to comment #7)

That means, _also_ on those that are "private = 0" in our config and/or you
have listed as being on Gmane already.

After logging in with my user/pass for that specific mailing list i do get the
mbox files.

It was a year ago but if I remember correctly I wasn't able to download the mbox of some lists I was subscribed to.

Did i get you right though, that you say you can (or could) access _some_ mbox
files without any login?

Yes, I'm sure that I just downloaded the foundation-l one with wget without any authentication (I also wondered what the /private meant, then) and that some months later I noticed I couldn't any longer.

Or do you say you are always logging in with some user/pass but you still just
get mbox files for _some_ of them?

As to adding lists to Gmane, they are just subscribed via
(http://gmane.org/subscribe.php) right?

Yes. I can do the paperwork.

I agree the "encryption/obfuscating mail addresses" option and the
"Unidirectional (no posting allowed through Gmane)" options should definitely
be switched on.

Several of our lists (e.g. wikitech-l) actually offer subscribers to post via Gmane, but it's so complex and bugged that it's not worth the effort anyway.

So you want the mbox files just one single time to import them as described on
http://gmane.org/import.php but not on a regular basis?

Since subscribing the lists to Gmane means having a user/login for a mailing
list user, couldn't you (or Lars, the Gmane guy) just use that one to login at
the authentication screen in order to get the mbox file ?

Yes, a single time would be enough, but he wants them to be on some public webserver he can just wget; moreover, if they're not in the standard (auto-updated directory), a good coordination would be needed so that he can import the mbox and then start receiving the new messages without losing anything in between.
Subscribing to some dozens mailing lists and then logging in via wget (if possible) to get the mbox seems way more complex than needed and as said above doesn't seem to reliably work anyway; if those mboxes can't be made permanently public as they've been (at least some of them) for years, perhaps the better thing to do would be to make them public temporarily when he's ready for the import...

Regarding "if those mboxes can't be made permanently public as they've been (at least some of them) for years" i am still unsure about the answer and would like to somehow bring this up in a larger discussion. There could have good been reasons for the changes you noticed over time. ("months later I noticed I couldn't any longer"), but i just can't tell.

(In reply to comment #9)

Regarding "if those mboxes can't be made permanently public as they've been (at
least some of them) for years" i am still unsure about the answer and would
like to somehow bring this up in a larger discussion.

Did you get this discussed? It seems like something you could bring up at the semi-weekly Ops meeting.

I've requested the addition to Gmane of those mailing lists I listed plus some more; it can now require some time. I think there are still some more lists to be added which were not on my list, I'll check it later.

Created attachment 9615
Excerpts from the Gmane subscription results replies

Should be done now: all lists have been created (120 have been added to Gmane) and all archives should have been imported in the last few days (I checked only a couple of them). I notified all the lists and list owners.

Attached: