Page MenuHomePhabricator

rss feed item should contain a guid element
Closed, ResolvedPublic

Assigned To
None
Authored By
bzimport
Sep 16 2006, 10:17 AM
Referenced Files
F3213: feed.diff
Nov 21 2014, 9:24 PM
F3211: rss-guid.patch
Nov 21 2014, 9:24 PM
F3212: mw-add-rss-guid.diff
Nov 21 2014, 9:24 PM

Description

Author: toots

Description:
Hi all!

I could not found this bug by searching in the DB, so I fill it there.. Hope it not only noise.

Package: mediawiki1.7
Version: 1.7.1-1
Severity: normal

I noticed that when using the "recent changes" mediawiki RSS feed with
liferea, it keeps showing duplicate entries.

According to the liferea documentation[1], this appears to be a problem
with mediawiki[2].

It would be nice if this could be fixed.

References:

[1] file:///usr/share/liferea/doc/html/faq_en.html

"Q: Why do feed items keep being displayed as new? A: This is usually
due to a bad feed which associated a particular ID to multiple items.
You should check your feed against a feed validator such as
feedvalidator.org. If the validator does not report any error, please
submit a bug report including the URL of the problem feed to the Liferea
bugtracker.

Note: If you experience this problem with a planet feed the reason might
be that the planet feed does not provide unique item ids for one or all
off its source feeds. If this is the case Liferea has no chance to match
identical items."

[2]
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fmeta.wikimedia.org%2Fw%2Findex.php%3Ftitle%3DSpecial%3ANewpages%26feed%3Drss

"line 67, column 203: item should contain a guid element (50 occurrences)"

  • System Information:

Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.8-3-k7
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)


Version: 1.16.x
Severity: normal
URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=383130

Details

Reference
bz7346

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:24 PM
bzimport set Reference to bz7346.
bzimport added a subscriber: Unknown Object (MLST).

The spec explicitly allows this. Please contact liferea authors and inform them that this is legit. :)

dhazelton wrote:

Resolving this bug as "invalid" is not correct. Yes, the <guid> tag is not required, but then, only the <title>, <link> and <description> tags are listed as required by the specification.

"This “Global Unique Identifier” allows you to republish or update specific
items without duplicating these items in an aggregator. If you change an
item without using the <guid> element, then the aggregator has no way of
determining that the new item is replacing an old item. In that case, the
aggregator will retain the old item and the new item, forcing the user to read
it twice. If the <guid> element exists (and is the same as a previous item’s
<guid>) then the aggregator can (at the users option) replace the old item
with the new one. If the user has not read the item yet, then all they will see
is the updated item. If they have read the old item already, then they can
optionally read the update or ignore it." -- http://www.feederreader.com/TechnicalGuides/RSS_Basic.html

That is referenced from the current spec, which is apparently housed here: http://cyber.law.harvard.edu/rss/rss.html

In other words... Without <guid> an "aggregator" has no way to determine whether an individual <item> is new or if it has been seen before. So while not required by the spec, not including it actually is a bug. No major feed that I have been able to find - and that claims to be RSS 2 - omits the <guid> tag.

And I thought it somehow had to do with how I manage my wiki.
All those bytes (bug 17058) and no guid!

If not implementing for Special:RecentChanges&feed=rss, then at least
implement for Special:NewPages&feed=rss, whose URLs are more robust.

Created attachment 6828
Patch to add <guid> element to RSS items.

Small patch to solve issue. I tested the result with <URI:http://www.feedvalidator.org/>. Note that for debugging I had to clear objectcache, otherwise the output remained the same :-).

Attached:

dhazelton wrote:

add a <guid> element to RSS feeds

Created this when I noticed that just using the URL still had duplicate items appearing. With the time of the edit added to the URL and the guid flagged as not being a permalink this works extremely well. A much better way to manage things would be to provide a perma-link URL for the feed code - but failing that this should work well.

Attached:

I don't quite unterstand that. First, adding the time of the edit should make no difference at all, since it is redundant to the revision ID already contained in the URL. Second, by my reading of the specification, isPermaLink (with a guid of solely the URL) should be true (or omitted) as the diff link is unique and stable.

In any case, in RSSFeed::outItem $item->getDate() does not seem to be guaranteed to exist, so if it is to be added to the guid, it should be checked for properly.

dhazelton wrote:

Neither did I when it started happening. But apparently the URL is constructed with two ID's so the unique diff can be pulled up. If there is a change made after that, then the link was changing - however, I didn't try it under a newer version of the code-base.

And I didn't know that ->getDate() wasn't guaranteed to exist - since it seems to always exist in my install.

In any case... As I said it would be better to use a link to that specific revision without setting it as a diff as the guid. Because then it is guaranteed to not change. And it's just hit me that relying on the specific date is stupid, so I withdraw my proposed patch. If the URL is changing (or was - I've since updated to a much more recent code-base) then the duplication would be seen regardless. (And I have been seeing it)

So... I'm going to work on a more in-depth fix that will change the GUID to a URL that is that specific revision without the diff contents. I should have a patch for that by Monday.

I don't see how the guid pointing to the revision itself would be an improvement. Keep in mind that the purpose of the feed is to point to the changes, i. e. the diffs, not to the revisions. If a guid would point to the revision, it could collide with other feeds that for example list new pages.

Regarding RSSFeed::getDate(), I'm just deducing from the if-clause three lines above that it is not guaranteed to exist. Maybe someone likes to overhaul the entire process as many functions and structures (abstract base class with rather rigid structure, selectors that silently escape to XML, etc.) look very hackish.

The RSS feeds for article history are also affected by this bug (at least in Liferea).
I am subscribed to few RSS feeds for article history and am sorry to say that it doesn't work that well.
Here is a screenshot which clearly shows duplicated entries (please note that some of them are not duplicated!):
http://img403.imageshack.us/img403/518/zrzutekranuliferea.png

The Feedvalidator shows the same: "item should contain a guid element":
http://feedvalidator.org/check.cgi?url=http://en.wikipedia.org/w/index.php?title=1Q84&feed=rss&action=history

I hope that would be fixed sometime in the future as this is pretty annoying...
Thanks,
Tomasz

buzz wrote:

patch to add guid and permalink support to feeds

Despite it technically being ok not to have a guid, it is an annoyance. And a quote from the RSS Spec says

"In all cases, it's recommended that you provide the guid, and if possible make it a permalink. This enables aggregators to not repeat items, even if there have been editing changes."

And infact this is the main problem I am having in that I have an extension that uses the RSS feed system, and if i make a change to an item, it will be repeated as the RSS software can not tell that it is an old item.

I have made a patch, that not only allows adding of a guid, allows you to set it as a permalink for RSS (this is not needed for atom). it also makes the atom use the new guid, which by default is set to the url, but can be changed with a setuniqueid call on the item. Please can we get this sorted as soon as possible!

attachment guid.diff ignored as obsolete

buzz wrote:

patch to add guid and permalink support to feeds

Feed.php hadnt been updated in over a year. I maek my patch and then someone cleans the file up! here is a new patch that applies to latest svn.

Perhaps someone can have a look at this before Feed.php is again changed?

patch includes a couple of minor cleanups/fixes to the file also.

attachment feed.diff ignored as obsolete

buzz wrote:

patch to add guid and permalink support to feeds

oops. couple of indentation mistakes in that one, and i made the ordering more logical.

attachment feed.diff ignored as obsolete

buzz wrote:

6951: patch to add guid and permalink support to feeds

change some parameter names in new setUniqueId function
removed cosmetic changes from the patch to make it more readable.

Attached:

ayg wrote:

Last patch committed as r61090 after discussion in MediaWiki-General.