Page MenuHomePhabricator

Automatic archiving
Closed, DeclinedPublic

Description

Author: brian

Description:
On talk pages, as well as an [edit] link, there should be
an [archive] link that will take you to a list of existing
archives for that talk page that you can add that section
to. This should significantly reduce the amount of time
spent refactoring talk pages.


Version: unspecified
Severity: enhancement

Details

Reference
bz1843

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:18 PM
bzimport set Reference to bz1843.
bzimport added a subscriber: Unknown Object (MLST).

davidkernow wrote:

I very much second this idea - thanks Brian!

How would you define archives in a machine-understandable way?

brian wrote:

(In reply to comment #2)

How would you define archives in a machine-

understandable way?

I don't understand this.

As the computer, I don't necessarily know what "existing archives for that talk
page" are. Can you explain how one would go about looking for them in a regular,
mechanistic way, given a talk page?

How would they work? What would they look like? How are they organized?

brian wrote:

This is not my area of expertise; however, others might
have ideas. I was thinking that the users could define
the archives, or the computer could create a new page
to be an archive, since it seems like most talk pages
don't have archives.

davidkernow wrote:

First thoughts: When the software detects that a talk page exceeds a certain
size (say 48kb) it automatically creates a new archive webpage and, working from

Heading == to == Heading ==, determines how many posts it would need to move

from the top of the page to the new archive page such that the talk page drops
below (say) 16kb in size.

Example:

Talk:Blah is detected to be 50kb. No archives previously created.
Software creates Talk:Blah/Archive 1
Software determines first to fourth post by == Heading == to be (say) 30kb,

but first to

   fifth post to be 43kb. 50kb less 30kb not < 16kb, but 50kb less 43kb < 16kb.
Software therefore moves first to fifth posts from Talk:Blah to

Talk:Blah/Archive 1

Talk:Blah now 7kb, Talk:Blah/Archive 1 is 43kb.
The next time Talk:Blah > 48kb, software detects Talk:Blah/Archive 1 already

created,

   so next new archive would be /Archive 2.
...etc.

The archive names could be more sophisticated, e.g. the software detects
earliest and most recent months included in (say) signatures and then names
archive using those months to give some idea of timespan covered (e.g. /Archive
1 - Jun 2005 to Oct 2005).

Yes, this would probably mean the archiving process would need to be taken out
of the hands of (standard) users so that the archive-naming process etc would be
kept consistent by the software (and therefore remain useable by the software).

...Well, something like that. I'd offer to code it myself, except I don't know
mainstream languages. I certainly think something is possible and would benefit
Wikimedia if implemented, if only to standardise the creation and handling of
archive material.

brian wrote:

If you were to do that, 32kb might be better, since I
think that is the threshold for displaying a warning on
the edit page that some browsers can't handle it.

However, I just thought of an even better idea: Instead
of arbitarily grouping topics together into "archive"
pages (are they really archives if the topics are still
active?) or under specific talk pages (some sections
are posted multiple times; this creates other
problems), store each topic separately and group them
into pages for viewing.

In most cases, it should be sufficient to view the
contents page and then either view one topic or add a
new one.

However, the page may have to be refactored, in which
case it might be good to group certain topics together.

There should be an easy way of finding discussions you
participated in that have not been resolved.

For those who are just looking for questions to answer,
there should be an easy way of doing just that.

davidkernow wrote:

I suggested limits such as 16kb and 48kb to give the software some bandwidth to
work in, otherwise with a particularly active talk page it might start trying to
create archive pages too quickly in succession. Then again, that might not be
much of a problem.

I see your thinking re storing each topic (which I take means 'thread under a ==
Heading ==') separately but wonder if this might overly multiply the number of
pages and/or parameters to be handled by a wiki... This would also need someone
in the know to comment.

Re using dates and deciding whether or not topics are still active: Yes, this
would be heuristic, but, in addition to using posts' dates (not in signatures, I
realise, but the date recorded when submitted or last edited) a sufficiently
sophisticated routine should be able to retain topics that, although begun
sometime previously, have seen recent addition or editing. It wouldn't be
foolproof, but I reckon the trade-off between the software overzealously or
underwhelmingly archiving material should probably be acceptable. Again, this
really needs a developer to evaluate and comment on.

Thanks for your interest - I hope something along the lines we're discussing can
be implemented.

rowan.collins wrote:

Hm, I'm not sure I like the idea of the software *automatically* archiving
discussions; as with so many other things, I think we should think more in terms
of tools that *assist* a user in doing this. My main reason for this is that,
although they sometimes seem that way, current discussion pages are *not*
threaded forums, they are a free-form page, and the sections may be arranged in
all sorts of ways, and subject to rearranging, refactoring, splitting, merging,
etc. There's also no straight-forward way - within the current setup - for the
software to determine when a section was last editted; it could be computed from
the history of the page as a whole, but that would involve analysing a lot of
diffs...

A much simpler, and more flexible, feature would be a way of selecting (tick-box
style) one or more sections of a page, and telling the software to append them
to a given page. So, a user would view Talk:Blah and click an "archive" tab (or
perhaps a more general label?); they would then tick the boxes for the
discussions which seem to have "concluded", and supply a pagename (such as
Talk:Blah/Archive1) and those sections would be appended to that page. If the
page didn't exist, it could of course be automatically created first.

Meanwhile, there's a lot of discussion on the wikitech-l mailing list right now
about the more fundamental problems of discussion pages. See
http://www.mediawiki.org/wiki/Communication for details of how to access it.

davidkernow wrote:

Thanks for your comments, Rowan; your idea of adding an "archive" tab followed
by checklist is much better and probably far easier to code. The routine could
also add a link to the (newly-created) archive page at the top of the talk page,
perhaps with a TOC-style list of the discussions (i.e. headings) in the archive

  • but, unlike a TOC, with a default state of hidden (so that clicking on a

'show' link beside it would display the list).

I now second Rowan's idea - and if any developers are (still) following this
thread, would appreciate some idea if it plus my TOC-style idea stand a chance
of being incorporated.

Thanks also, Rowan, for the pointer to the wikitech-l discussion. I certainly
believe adding something like your "archive" tab along with the opportunity to
standarise how and where archive page names are created and placed would improve
the maintenance of talk pages considerably. Would you say I need to sign onto
the list and add this comment?

brian wrote:

We seem to like our current "free-form" talk pages; yet
we don't use them - we use mailing lists that are much
more difficult to work with than the talk pages, even
when using a newsreader.

robchur wrote:

Talk pages are for discussing particular articles. The mailing list is for
discussing particular issues on the wikis.

brian wrote:

Why can't talk pages be used for discussing "issues" on
the wikis as well (they are already used for discussing
some of them)?

robchur wrote:

Some of them are. Some of the Wikimedia wikis have specific pages and talk pages
devoted to discussing policies, ideas, guidelines, best practises, etc. -
examples springing to mind include the Village Pump on the English Wikipedia;
the Water Cooler on Wikinews, etc.

At the development end of things, it's not up to us to dictate how the
individual projects use the functionality; that's their job.

Automatic archiving would require a daemon, which is not very good. I like the
variant with the button (or a tab) for archiving, although a separate subpage
for every talk page is pretty good too (I image the main talk page to be the
table of contents here).

brian wrote:

(In reply to comment #14)

Some of them are. Some of the Wikimedia wikis have

specific pages and talk pages devoted to discussing
policies, ideas, guidelines, best practises, etc. -
examples springing to mind include the Village Pump on
the English Wikipedia; the Water Cooler on Wikinews,
etc. At the development end of things, it's not up to
us to dictate how the individual projects use the
functionality; that's their job.

What is to stop us from using freeform talk pages too?

Of course it's not up to us to dictate how individual
projects use their pages.

brian wrote:

(In reply to comment #15)

Automatic archiving would require a daemon, which is

not very good. I like the variant with the button (or a
tab) for archiving, although a separate subpage for
every talk page is pretty good too (I image the main
talk page to be the table of contents here).

Why would automatic archiving require a daemon?

There has to be a process that will run occasionally (or all the time) in the
background which will determine whether a page needs archiving or not. And
daemons arn't really good for web applications.

brian wrote:

The page only needs to be checked:

  • once, when the automatic process is introduced,
  • once, when the automatic process is updated, or
  • when it is updated.

davidkernow wrote:

I suggest we turn from thinking of -automatic- processes (which would place
further demands on software and its speed) to something like the 'archive' tab
idea above. Talk pages where each thread involves votes (e.g. VfD-type pages)
could also place [archive] links beside the [edit] links so visitors could
easily archive votes that have been completed. Yes, perhaps more open to abuse,
but as easily revertable as usual.

Are any developers (still) reading this and care to comment?

brian wrote:

This idea was discussed in comments 9 and 10, and if we
insist on manual processes, it sounds good.

However, it would be nice to have "archive" links on
*all* sections in *all* talk pages, so I can decide
that a section seems to have "concluded" and archive it
*immediately*.

ssanbeg wrote:

There should be ways of simplifying the archive without using a daemon.

One example, if we had the labeled section transclusion feature described in
[[wikisource:project:Labeled section transclusion]] and bug #5881, then you
could mark closed discusions like:

<section begin=closed>

some topic

this is resolved
<section end=closed>

So that you can easily find the closed discussions between the markers.

Then, you could archive by creating a new page like [[talk:page/archive 1]] with
the contents:
{{subst:talk:page|include=closed}}

Which would substitute only the closed discussions into the archive.
Conversely, you could replace the contents of talk:page with
{{subst:talk:page|exclude|closed}}

Which would grab the contents of the current page, minus the closed discussions.

Although that was written for something completely different, it works pretty
well there.

Other alternatives would be to mark the begin/end of closed discussions with a
template, and have a bot check the size, and move closed discussions to a new
archive; or write a custom extension with markers for the begin/end of each
closed discussion; where the extension would check the page size/number of
closed discussions, etc, and add a job to the job queue to archive the page if
needed. Of course, that assumes they would allow making visible edits from the
job queue.

brian wrote:

We scan through every page every time it's saved anyway, to check for banned links;
how hard would it be to check for whatever marker we used for closed discussions and
move them to the archive immediately, when the page is saved?

ayg wrote:

This will presumably be automatic when LiquidThreads is finally finished, so I
doubt anyone will waste their time putting in the effort only for it to become
completely obsolete in a year or whatever. And yes, this would require a fair
bit of effort, which could be better invested elsewhere. Bots work fine for now.

brian wrote:

(In reply to comment #24)

This will presumably be automatic when LiquidThreads is finally finished, so Idoubt

anyone will waste their time putting in the effort only for it to becomecompletely
obsolete in a year or whatever. And yes, this would require a fairbit of effort,
which could be better invested elsewhere. Bots work fine for now.

(bug 1234, [[Meta:LiquidThreads]])

  • Bug 23814 has been marked as a duplicate of this bug. ***