Page MenuHomePhabricator

Undeletion of paticular revisions.
Closed, ResolvedPublic

Description

Author: tietew-mediazilla

Description:
I created a patch for undeletion of particular revisions.
Special:Undelete shows a checkbox per revision,
and undelete only checked revisions.

We sysops of ja.Wikipedia need "deletion of particular revisions" very much.
But Special:Import has not been released yet.
I believe this patch is simple and certain way to resolve this problem.


Version: 1.3.x
Severity: enhancement

Details

Reference
bz507

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 7:07 PM
bzimport set Reference to bz507.
bzimport added a subscriber: Unknown Object (MLST).

tietew-mediazilla wrote:

Patch for undeletion of paticular revisions.

attachment SpecialUndelete.diff ignored as obsolete

A couple notes on the patch:

  • There's no validation on the timestamps, and they're not escaped. This could allow SQL injection attacks.
  • The timestamp conversion functions aren't used so it probably won't work on PostgreSQL.

Looks otherwise generally OK; can you update to current CVS and repost please?

Suiwiki wrote:

Yeah! これが使えたらja.wpもだいぶ落ち着きます。
This will make happy to ja.wp people!!

(In reply to comment #3)

Yeah! これが使えたらja.wpもだ

いぶ落ち着きます。

This will make happy to ja.wp people!!

Testing some more... ひらがな

Suiwiki wrote:

テスト投稿です

Sorry for the comment spam. ;) Testing again: カタカナ [firefox]

Suiwiki wrote:

put comment from MacOSX Safari 読めるかな?

tietew-mediazilla wrote:

Patch for undeletion of paticular revisions. (HEAD)

A new patch for CVS HEAD with timestamp validation and SQL sanitizing.
In addition, this patch will restore all revisions
when no revisions are checked for backward compatibility.

The timestamp conversion functions are not used yet,
because SpecialUndelete.php seems not to use them.

I'll make a patch for REL_1_3 later again.

attachment SpecialUndelete.php.diff ignored as obsolete

tietew-mediazilla wrote:

No notify mails...

tietew-mediazilla wrote:

Patch for undeletion of paticular revisions. (HEAD)

Update; a bit bug fixed.

attachment SpecialUndelete.php.diff ignored as obsolete

tietew-mediazilla wrote:

Patch for undeletion of paticular revisions. (REL1_3)

for REL1_3

attachment SpecialUndelete.php.diff ignored as obsolete

Looks generally functional, but there's a serious problem: it doesn't handle
compressed revisions ($wgCompressRevisions) properly.

If the article is not currently present (so a new 'cur' entry has to be
created), and you don't restore the last revision, the raw old_text of the most
recent restored revision is inserted into cur_text. If that revision was
compressed, we see binary gibberish instead of the expected text.

With the old code this wasn't a problem since the most recent revision to be
restored would always have come from cur in the first place and was thus
uncompressed.

It'll be necessary to use Article::getRevisionText() on the entry to be placed
into cur, breaking up the INSERT...SELECT into two queries.

tietew-mediazilla wrote:

Patch for undeletion of paticular revisions. (HEAD)

update; $wgCompressedRevisions compatible

I verified this had undeleted revisions with ar_flags='gzip' correctly.

If this patch is OK, I create a new patch for REL1_3.

Attached:

Committed patch to CVS HEAD; needs testing on PostgreSQL.

tietew-mediazilla wrote:

Patch for undeletion of paticular revisions. (REL1_3)

backport to REL1_3.

Attached:

leercontainer-bugzilla wrote:

If this feature is on by default I fear it may result in widespread history
"corruption", and histories not compatible with GFDL. Ideally all revisions
should be undeleted or at least listed, but those you'd want to ommit could be
hidden or the content deleted. At the very least some log should show that (and
what) revisions were not undeleted.

nao-yuki wrote:

(In reply to comment #16)

Ishould be undeleted or at least listed, but deally all revisions
those you'd want to ommit could be hidden or the content deleted.

I think hiding paticular version may cause probrem that
differencial shows content that has Copyright or various problem.

wiki_tomos wrote:

(In reply to comment #16)

If this feature is on by default I fear it may result in widespread history
"corruption", and histories not compatible with GFDL.

This feature is certainly not a panacea. For certain pages, this feature should
not be
applied. But it solves quite a few problems. The problem with GFDL
compliance could be achieved simply by following a certain procedure like this:

  1. undelete the whole thing
  2. revert to the last version before the copyvio is inserted.
  3. delete.
  4. undelete the latest version, and the versions before the copyvio is inserted.

By doing this, we can legitimately omit the history info. in between.

Of course, if history info. can be preserved while the main text is hidden, that
would
be more useful. But it is quite helpful as it is now.

nao-yuki wrote:

I think GFDL compliance probrem with history information may be solved with
decision of a suitable usage policy.

(In reply to comment #18)

The problem with GFDL compliance could be achieved simply by following a

certain procedure like this:

  1. undelete the whole thing
  2. revert to the last version before the copyvio is inserted.
  3. delete.
  4. undelete the latest version, and the versions before the copyvio is inserted.

I propose following usage policy:
If targer article previously reverted to the last before problem occures, only
some edition from "Problem occured version" to "Revertion to the last before
problem occures" must not be undeleted.
If target article not reverted to the last before problem occures, "Probrem
occured version" and after that must not be undeleted.
The policy above is pretty simple, but I think it is important.

bugzilla_wikipedia_org.to.jamesd wrote:

Some requirements for this general type of feature:

  1. Legal compliance by not continuing to distribute specific revisions
  2. Legal record keeping by never permanently deleting any revision (so a

situation which does result in subsequent legal action doesn't end up with all
records either party needs no longer available from an independent source)

  1. Showing an accurate history which includes any problematic items in that

history but without the copyright infringement or other problem.

  1. Easy and complete reversibility of any action

This feature set is best achieved by a per-revision hide flag which can be
turned on or off and which allows anyone with the right account setting to see
the hidden articles in their proper context (sysop flag seen by any person with
sysop in their user rights, for example). Also avoids problems with the various
compression features, either gzip or diff-based, since nothing is actually being
deleted or restored.

Doesn't seem like a good idea to ship this in 1.4, since it's not really the
right approach to the problem.

A basic issue with the suggestion in comment #20 is that we
distribute database dumps. If the reason we're deleting individual
revisions is because we can't legally distribute them, then not
separating them creates a problem.

wiki_tomos wrote:

Regarding Jamesday's comment;

I agree that recordkeeping is sometimes important, but it can be easily done
using XML export. It is not a reason to delay the introduction of this feature.

Some articles dealing with controversial subjects (sexal, religious, etc)
receive what seem to be an intentional copy-n-paste of copyrighted materials
just so that the articles get deleted. Occasionally, we get things on Main Page
and articles with a very long history. It is very hard to choose between
deleting them altogether and bearing legal risks. I know this is not a news to
Jamesday, but Japanese ISP liability law makes people like Wikipedia admins
liable for not deleting obvious infringement that admins know.

wiki_tomos wrote:

I checked with some admins and looked at the
lists of versions to be deleted.

The requests dates as far back as a year, and
there are over 1,000 versions to be deleted on
Japanese Wikipedia. Not everything could be taken
care of by this feature, but this (or XML Import)
would help a lot.

nao-yuki wrote:

(In reply to comment #20)

This feature set is best achieved by a per-revision hide flag which can be
turned on or off and which allows anyone with the right account setting to see
the hidden articles in their proper context (sysop flag seen by any person with
sysop in their user rights, for example). Also avoids problems with the various
compression features, either gzip or diff-based, since nothing is actually being
deleted or restored.

I think this feature occures a probrem that diff shows content contains various
probrem.

The patch is restoring particular revisions from archive table to old and cur
tables, so other revisions remain in archive table. If somebody (lawyer or ISP)
want content to be deleted, sysops can restore them in order to respond to a
request. (If the restore to respond to request can be marked temporaly restore
and can re-delete only temporaly restored revisions, that's best.)

I also think following feature is helpful: Each entries in archive table have
"Protected" flag that the revision be marked for preventing being permanently
deleted from archive table.

Jamesday suggested to me on IRC to encrypt individual versions instead of
deleting them. As fas as I undestand (and that's not saying much) this would
make thing easier, because no info is actually lost, and no records need to be
deleted, so ther's no mess up in the IDs. Just the content of a single field in
the database is updated (and possibly something like "eccrypted" could be added
to the log-string in the version entry, but that's not neccessary). I like that
idea very much, because it is simple, transparent and easy to undo. Also, it
would untie this bug from #603, as far as I can see.

If the software had access to the secret key, too, it could make the encrypted
version visible for admins, etc. But that's not really neccessary. The most
important thing is that texts that have been put on the wikipedia in violation
of copyrights are no longer publically accessible.

This was applied on 1.4+ some time ago. Resolving as fixed.