Page MenuHomePhabricator

Media viewer fails to give credit to all people in specific circumstances
Open, LowPublic

Description

The Media viewer fails to give credit to all people listed under author in situations where:

  1. There is a {{Creator:Foo}} template used for one of the authors, but
  2. Other authors are listed below said tag.

For example, Mathew_B._Brady_-_William_Sidney_Mount.jpg, if viewed in media viewer, will only credit Mathew Brady, leaving out credit to myself.

Not every creator has a template, and it's not uncommon for a templated and untemplated creator to both be responsible for a work, even outside of restoration work.


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:10 AM
bzimport added a project: MediaViewer.
bzimport set Reference to bz66606.
bzimport added a subscriber: Unknown Object (MLST).

This is true. I'm not sure how to best solve this, that is up to the multimedia team.

I would however advise you start using a template for your own creditline, that makes it easier to retroactively make it compatible with future generations of machine readable data.

When the author field has an author template, we discard all input outside it. We could just keep it instead; I don't think that could cause problems.

That said, it won't look particularly good (something like "Mathew Brady Restoration: Adam Cuerden"). That's the limitation of how much meaning can be extracted from templates currently; Wikidata will provide nicer ways.

Gilles triaged this task as Medium priority.Nov 24 2014, 2:43 PM
Gilles subscribed.
Tgr set Security to None.

Mass-removing the Multimedia tag from MediaViewer tasks, as this is now being worked on by the Reading department, not Editing's Multimedia team.

Jdlrobson changed the task status from Open to Stalled.Feb 3 2016, 8:26 PM
Jdlrobson subscribed.

Not clear how to proceed with this task right now without structured meta data.

Retaining text outside the hcard would be a fairly simple change in CommonsMetadata but I am not sure it would be beneficial all the time.

Well, we agree it's bad to leave out one of the credits, right?

@AdamCuerden, Yes, but I think tgr's point is that he fears that by improving a small set of problem files (0.1% or something), there is the potential to add a much bigger group of new problem files that would get significantly 'worse quality', non-readable, non-understandable author information than we have right now. And that will be hard to assess, without reparsing all the file pages, which takes long, and equally as long to undo. etc..

Jdlrobson lowered the priority of this task from Medium to Low.Nov 24 2016, 7:03 PM

I agree that we don't want to potentially hurt the credit on most files for a tiny use case - and I agree that we should try to fix this once structured data makes it easier for us to credit creators. Leaving stalled, but we definitely want to fix this as soon as is practicable.

simon04 subscribed.

According to above comments a solution is not so easy and thus no good first bug…

In T68606#3218037, MarkTraceur wrote:

I agree that we should try to fix this once structured data makes it easier for us to credit creators.

Is there a task tracking structured data for author credits (which this task should have as a subtask)?

Is there a task tracking structured data for author credits (which this task should have as a subtask)?

Not sure about tasks (T585 would have been relevant but it was declined; T68108 is probably the relevant epic but it has no related subtask) but there is a Commons project page.

In T68606#683872, @Tgr wrote:

That said, it won't look particularly good (something like "Mathew Brady Restoration: Adam Cuerden"). That's the limitation of how much meaning can be extracted from templates currently; Wikidata will provide nicer ways.

Can you replace newline with " | "? ("Mathew Brady | Restoration: Adam Cuerden")

The code to change is TemplateParser::parseFieldArtist, if someone wants to take a shot at this. IMO the energy would be better spent working on storing authors as structured data, or at a minimum on templates which output them in a sane machine-readable way.

Aklapper changed the task status from Stalled to Open.Nov 1 2020, 10:14 PM

Structured metadata is in the making nowadays, hence resetting task status. (If this depends on a specific task, please make that task a subtask.)

Seriously, how will structured data be any more structured than "The stuff in the Author field of the Information Template. Include it all or be violating copyright? Or are you going to hand-redo a few thousand files to suit this new data?

Look, it's been 8 years. Make a fix, and if it causes problems, make another, but in those 8 years countless sites have used my work, not needed the big version, so just used the Media Viewer copy. And I want my work used, but I really would like credit, and, honestly, have a legal right to it. The legal concept of moral rights kind of have to overwhelm "it might not look nice in some cases", because I'm honestly infinitely patient for Wikipedia's bullcrap to sort itself out, but not everyone's going to be, and this affects literally every situation where a more-famous author with a Creator template has a co-author that doesn't.

@AdamCuerden: Please read and follow https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette If you would like to be active here. This is an issue tracker and not a random forum. Thanks for your understanding.

I've just discovered this is more broken than I thought: It's not even an edge case: It can't handle more than one {{Creator}} template. So in literally any case where an image has more than one creator, it will fail to credit at least one person.

@AdamCuerden: Please read and follow https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette If you would like to be active here. This is an issue tracker and not a random forum. Thanks for your understanding.

I've been patiently waiting 8 years for a solution. You have to expect some upset when Wikipedia has been promising to fix a problem that long, while sites are using my work without credit ''entirely because of the poor coding''.

And today I learned it can't be fixed: If there's more than one Creator template, it only uses the first one as far as I can tell. It doesn't even attempt to compound them with an "and". Even if I go through and edit every one of my files, I'm still going to be denied credit.

Oh, and it's putting Wikipedia at legal risk, because I may not be willing to sue Wikipedia, but all those cases where CC-by files are used that have more than one creator? They might be.

I should probably give an example, to show the problem is widespread.

https://en.wikipedia.org/wiki/Wilkinson_Call#/media/File:Wilkinson_Call_-_Brady-Handy.jpg

Two creators, Media Viewer strips one from the list.

I should probably give an example, to show the problem is widespread.

https://en.wikipedia.org/wiki/Wilkinson_Call#/media/File:Wilkinson_Call_-_Brady-Handy.jpg

Two creators, Media Viewer strips one from the list.

@AdamCuerden, seek me out on my talk page on enwiki. (I find Phabricator harder to keep track of) I can't promise anything here, but I can try to create a gadget that hacks the rights attribution in there and advocate to get that enabled for everyone. (which, given the legal issues, may not be impossible)

Low priority is just wrong, this should be medium to high. (and the WMF should order a paid programmer to work on it frankly) On a technical level it's unimportant (it won't crash the site) but it causes a failure to comply with the license, that's a big deal for Wikimedia.

Though one question remains which I asked in 2020 and wasn't answered: can you replace newlines with pipe symbols?

https://en.wikipedia.org/wiki/Battle_of_Spotsylvania_Court_House#/media/File:Battle_of_Spottsylvania_by_Thure_de_Thulstrup.jpg

Little followup. Here we see Media Viewer turning a CC-by image that requires crediting me into a Public Domain image that does not credit me.

https://en.wikipedia.org/wiki/Battle_of_Spotsylvania_Court_House#/media/File:Battle_of_Spottsylvania_by_Thure_de_Thulstrup.jpg

Little followup. Here we see Media Viewer turning a CC-by image that requires crediting me into a Public Domain image that does not credit me.

That’s because that attribution template isn’t a license template. At least not technically.
Edit. Actually, this is probably because PD is the most permissive license which thus gets precedence. This is a side effect of Commons authors not distinguishing between multi licensing and derivatives (nesting).

@AdamCuerden, seek me out on my talk page on enwiki. (I find Phabricator harder to keep track of) I can't promise anything here, but I can try to create a gadget that hacks the rights attribution in there and advocate to get that enabled for everyone. (which, given the legal issues, may not be impossible)

I’ll happily review any patch provided.

https://en.wikipedia.org/wiki/Battle_of_Spotsylvania_Court_House#/media/File:Battle_of_Spottsylvania_by_Thure_de_Thulstrup.jpg

Little followup. Here we see Media Viewer turning a CC-by image that requires crediting me into a Public Domain image that does not credit me.

That’s because that attribution template isn’t a license template. At least not technically.

Just to clarify, that isn't correct, policy wise, it is a valid and usable template. And technically it has the metadata to be picked up by CommonsMetadata, the issue is T89692, where CommonsMetadata appears to be prioritizing the least restrictive terms.

https://en.wikipedia.org/wiki/Battle_of_Spotsylvania_Court_House#/media/File:Battle_of_Spottsylvania_by_Thure_de_Thulstrup.jpg

Little followup. Here we see Media Viewer turning a CC-by image that requires crediting me into a Public Domain image that does not credit me.

That’s because that attribution template isn’t a license template. At least not technically.
Edit. Actually, this is probably because PD is the most permissive license which thus gets precedence. This is a side effect of Commons authors not distinguishing between multi licensing and derivatives (nesting).

How am I supposed to do this?

Just to clarify, that isn't correct, policy wise, it is a valid and usable template. And technically it has the metadata to be picked up by CommonsMetadata, the issue is T89692, where CommonsMetadata appears to be prioritizing the least restrictive terms.

Technically the image is marked up as being PD and attribution-only. When a file has multiple licenses, it is valid to pick one and show only that, and it makes sense to use the least restrictive one for that as it gives the user the most options. As noted in the other task, some markup mechanism would have to be invented for differentiating between the real end-user-relevant license and other license-ish things; the proposal for that did not generate any community interest.

Note that this is a distinct technical issue from source parsing which was discussed earlier in this task (T68606#6399376). There is always only one author field, but the code assumes if that field contains hcard metadata then everything else can be discarded. That actually works out well for Battle of Spottsylvania by Thure de Thulstrup.jpg where the text outside the hcard template is not related to authorship.

The Source/Photographer field is a third distinct (but similar) technical issue where the table is marked up as multilingual content so TemplateParser::parseContents extracts the part in the appropriate language from it, and ignores content outside the multilingual area. That could be fixed, but it would be discarded on the client side anyway, though, as the markup is too complicated (a table and a list) and cannot be meaningfully displayed in a UI that's meant for plain text. As I said before, I don't think trying to extract meaning from a HTML soup is a sustainable approach; license/credit information should be moved to metadata. The relevant discussion would be Commons:Structured_data/Modeling I think.

In T68606#7625182, @Tgr wrote:

Just to clarify, that isn't correct, policy wise, it is a valid and usable template. And technically it has the metadata to be picked up by CommonsMetadata, the issue is T89692, where CommonsMetadata appears to be prioritizing the least restrictive terms.

Technically the image is marked up as being PD and attribution-only. When a file has multiple licenses, it is valid to pick one and show only that, and it makes sense to use the least restrictive one for that as it gives the user the most options.

But as you've seen this isn't true. Just pick the least permissive license to be safe. It's better to tell a re-user they must attribute when they actually don't rather than the other way around.

As noted in the other task, some markup mechanism would have to be invented for differentiating between the real end-user-relevant license and other license-ish things; the proposal for that did not generate any community interest.

There's already https://commons.wikimedia.org/wiki/Template:Licensed-PD and https://commons.wikimedia.org/wiki/Template:Licensed-PD-Art. But there's always room for improvement. (edit: I didn't realize you just linked a 2014 discussion, very much pre-SD)

I don't think trying to extract meaning from a HTML soup is a sustainable approach; license/credit information should be moved to metadata. The relevant discussion would be Commons:Structured_data/Modeling I think.

I agree that extracting meaning from HTML soup is a bad idea, which is why I served the soup. I disagree about discussing this just on Commons, other projects host files too and they can't be expected to participate in discussions on Commons, besides, they have no structured data.

Even with structured data, SD was introduced without the ability to transclude data into wikitext. That should have been a core feature but it wasn't there. I'm not sure if it is now, I thought I asked some time ago and it was, but really I'm not sure and https://commons.wikimedia.org/wiki/Commons:Structured_data doesn't seem to say. I've said enough about that, but license info duplication was a horrible idea and wikitext remains leading for license info so talking about SD is little use for this task.

The way I see it, license templates should provide a text-only alternative format. Might also help screen readers, though I'm not sure how the info should be presented. Anyway, that's a community task. If templates provide a text-only alternative, the rest is a piece of cake.

Change 787870 had a related patch set uploaded (by TheDJ; author: TheDJ):

[mediawiki/extensions/CommonsMetadata@master] Retrieve artists/authors from multiple vcards

https://gerrit.wikimedia.org/r/787870

That actually works out well for Battle of Spottsylvania by Thure de Thulstrup.jpg where the text outside the hcard template is not related to authorship

It is very very related to authorship and your failure to understand that is downright insulting. You're basically saying that it's WMF policy that someone can spend 60+ hours on an image restoration, and have their work be uncredited.

Maybe there isn't a legal duty to credit me, although British law might well disagree with you. But there is certainly a moral obligation to, as well as it being a dereliction of duty to your volunteers.

Seriously, if you think no work was done in it, why not switch all usages over to the unrestricted version, https://www.loc.gov/pictures/resource/pga.04038/ ? The one with rips, pockmarks, poor colour fidelity, etc.

It's been nearly a decade of broken promises this will be fixed. You don't even seem to understand the issues involved.

Solutions:

  1. We fix the api to not have the API scrape ONLY the first creator from the creator box (see my patch)
  2. We either (or both)
    • remove the PD licenses from the embedded works from the page, considering that Adam requests attribution, we can assume they claim copyright and thus this image is legally copyrighted till 70 years after their death, the PD status of the enclosed objects don't really matter, we can write it down in words, it doesn't need a template and the template and the categorisation it adds, is actually misleading I would argue. (even if they don't claim it, they probably still have it, I would argue even outside of the UK. IMO the artistic choices of Adam are copyrightable)
    • we switch to 'most restrictive license by default'

Also like.. just REMOVE the creator template ???? Seems like another very simple solution. Those boxes are not a requirement are they ?

Solutions:

  1. We fix the api to not have the API scrape ONLY the first creator from the creator box (see my patch)

Every little bit helps, but it won't be enough. FoP (Freedom of Panorama) is another issue when it comes to informing re-users.

If templates don't provide a text-only alternative I don't see how we could do much better than what MVattrib does.

  1. We either (or both)
    • remove the PD licenses from the embedded works from the page, considering that Adam requests attribution, we can assume they claim copyright and thus this image is legally copyrighted till 70 years after their death, the PD status of the enclosed objects don't really matter, we can write it down in words, it doesn't need a template and the template and the categorisation it adds, is actually misleading I would argue. (even if they don't claim it, they probably still have it, I would argue even outside of the UK. IMO the artistic choices of Adam are copyrightable)

In the UK (and Australia) no doubt. Outside the UK and Australia I'm not sure, with some simple level adjustments I get a superficially similar result on the original, but I'm working on a crappy 1024px JPEG. There's nothing available between that and the 200MB TIF and I have bad experience trying to open 200MB TIF files. So that could be a different story. Not available on Commons so no thumbnail there either.

It doesn't matter really, if the modifications are copyrightable anywhere the attribution license is valid. Compliance may not be legally required everywhere (try Somalia), but that doesn't matter. We don't know where re-users live or where they want to use the work.

The UK and Australia are the reason I dump CC0 even on basic drawings I upload.

  • we switch to 'most restrictive license by default'

That should be done at any rate. It's safer for re-users. License descriptions like these just happen and copyright is extremely complicated. If there is a more restrictive license, it's probably for a reason.

Also like.. just REMOVE the creator template ???? Seems like another very simple solution. Those boxes are not a requirement are they ?

They're kind of neat, but it's possible in this particular case to have it only on the original. But in case of joint works that won't work.

Change 787870 merged by jenkins-bot:

[mediawiki/extensions/CommonsMetadata@master] Retrieve artists/authors from multiple vcards

https://gerrit.wikimedia.org/r/787870

Sigh. Turns out that the multi language support only takes the first English elements in a field. So that patch needs some more follow-up

I'm going to be honest (although you probably expect that at this point): I don't quite understand what's so hard about just... coding a graceful failure state for anything that seems non-simple. If you end up with, say, https://commons.wikimedia.org/wiki/File:Edward_Duncan_-_The_Explosion_of_the_United_States_Steam_Frigate_Missouri_-_Original.tiff - where there's three creator cards and an awful lot of important text explaining the roles - it's probably going to be far easier to just direct people to the file description page than try to convert it to an attribution line. Like, it really feels like an attempt at perfection is overruling the simple solution of just... handling the trivial cases you can handle and including a "see here" link for anything not readily machine readable.

It's been... what is it now, nearly ten years of trying for perfection with little to no progress? That kind of indicates that maybe the solution isn't what you're trying to do.

On the meta discussion... I'm not trying to be a bastard here. It's just a situation where I've constantly been told it's going to be fixed, and so I go away for years, leaving the WMF to work on it, promises that it'll be fixed ringing in my head. And then I realise that it hasn't worked, and, in some cases, mitigating solutions - like actively providing an attribution line in the manner I was instructed a few years back - have actively broken.

Like... without turning this into a blame game, you get that's frustrating, right? I'm constantly finding myself trusting reassurances it's going to be fixed, and then it's not. And this is sometimes linked in with WMF staff *actively dismissing my moral rights to have my work credited by the orginisation I've put so much time into that I ''literally have more featured pictures than anyone else on English Wikipedia as far as anyone can tell.'' 8.8% of the entire featured picture output of English Wikipedia is my restorations alone, not even including stuff I researched. Not even including stuff I nominated. That's ''just'' the restorations. And, without wanting to have a giant ego, having someone tell you your work doesn't matter, that your credit should be ''actively stripped''... like... can I get an apology for that? Because that was honestly... a bit much. Like... "showing you hate your volunteers" level of hostile.

I don't quite understand what's so hard about just... coding a graceful failure state for anything that seems non-simple. If you end up with, say, https://commons.wikimedia.org/wiki/File:Edward_Duncan_-_The_Explosion_of_the_United_States_Steam_Frigate_Missouri_-_Original.tiff - where there's three creator cards and an awful lot of important text explaining the roles - it's probably going to be far easier to just direct people to the file description page than try to convert it to an attribution line. Like, it really feels like an attempt at perfection is overruling the simple solution of just... handling the trivial cases you can handle and including a "see here" link for anything not readily machine readable.

That is how the earliest version of MediaViewer handled the license and author fields when it didn't understand them, and it made a lot of people angry. There were accusations of violating the CC license (despite legal professionals asserting the opposite). It's hard to make everyone happy.

It's just a situation where I've constantly been told it's going to be fixed, and so I go away for years, leaving the WMF to work on it, promises that it'll be fixed ringing in my head.

FWIW the last time the WMF has resourced work on non-structured-data-related aspects of MediaViewer was, to my knowledge, in 2015 (which is when the old Multimedia team got dissolved); everything since then (which admittedly hasn't been a lot) has been someone spending their free time. I hoped structured data would make this issue easy to handle, but that would depend on the Commons community making use of structured data to describe the various types of contributions to the image, and that hasn't happened.

And this is sometimes linked in with WMF staff *actively dismissing my moral rights to have my work credited by the organisation I've put so much time into that I ''literally have more featured pictures than anyone else on English Wikipedia as far as anyone can tell.''

I am sorry if that came across as being dismissive of your restoration efforts in general; I have a lot of respect for your work. But whether or not you have moral rights is a legal fact, not an expression of support. And Commons has consistently taken the position that restoration efforts don't generally result in copyright, and without copyright there is no authorship and no moral rights. (Which isn't to say there can't be individual cases where restoration requires a level of creativity that amounts to authorship, but it generally isn't the case.) That is why Commons is still hosting the National Portrait Gallery images uploaded by Dcoetzee, for example, despite the gallery's assertion that the many hours their staff spent on image restoration gives them copyright over those images. I think it would be very hypocritical to apply that standard to images restored by museums but not to ones by Wikimedia volunteers.

Which is not to say that image restorations shouldn't be credited in some way. But the Author field has a specific legal meaning. People can be (and have been) sued for reusing Commons images and not indicating the author. Mixing that up with restoration credits is a bad idea.

All this isn't really relevant to the task. As I said on the patch, in the end it's up to the Commons community how they police the use of the Author field; it doesn't make sense for software to second-guess that. It's just not very clear what "don't second-guess" should mean in the case of information that's not machine-readable and in its original format not really reusable on UIs other than the file page. I have proposed a generic way to handle this but there wasn't a lot of interest.

Specifically for hcards, maybe we should just replace them with the author name (and maybe a trailing newline) and keep the rest of the text in the Author field. I feel like there was some reason we didn't do that originally, but can't remember anymore.