Page MenuHomePhabricator

Do not parse .js/.css pages and save their parsed content in database tables
Closed, DeclinedPublic

Description

Do not parse .js/.css pages and save their parsed content in database tables.

It creates entries in following tables:

  • categorylinks
  • externallinks
  • imagelinks
  • iwlinks
  • pagelinks
  • templatelinks

(maybe some others as well)

There is no reason to parse .js/.css pages and therefore create false records in database which then influence the results of various queries such as Wanted* etc.

Databse cleanup will be needed after then, so adding this as a blocker to bug 16660 as well.


Version: 1.18.x
Severity: normal
See Also:

Details

Reference
bz32858

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:04 AM
bzimport set Reference to bz32858.
bzimport added a subscriber: Unknown Object (MLST).

Just to point out, many script authors use the fact that links on script pages get parsed to track usage of their script - e.g. in the instructions on using the script, many authors show something like the following:

// FooBar script by [[User:Example]] ([[User:Example/foobar.js]])
importScript( 'User:Example/foobar.js' );

Personally, I agree that parsing .css/.js pages in this way is wasted effort, but it would be nice having some built-in way to track script usage in place of abusing the current behavior.

reachouttothetruth wrote:

Better tracking would be good, yes. But another use that we should consider is that people sometimes slap deletion templates on their .css and .js pages, and it works. Without that, people would be forced to make such deletion requests on some other page.

(In reply to comment #2)

Better tracking would be good, yes. But another use that we should consider is
that people sometimes slap deletion templates on their .css and .js pages, and
it works. Without that, people would be forced to make such deletion requests
on some other page.

There is always talk page or administrators' noticeboard, admins have their talk pages, etc. Plenty of place, where to request.


I ran a scan on small cs wikis, and it is already hundreds of false entries in database. I wonder how many it will be on cswiki, not even speaking about enwiki.

Things like tracking (besides it does not work crosswiki anyway) or deletion requests could/should be done other way and not have a significant influence on the content of mentioned database tables.

The only way such alternate tracking methods could work is if they are automatic; the only reason the current system works anywhere near as well as it does is because it requires zero additional effort to simply copy-pasting the needed JS. Add an additional step to the script installation for tracking purposes, and in the best case, most people will ignore it (and in the worst case, people won't use your script because they can't be bothered or they find another script with a simpler installation).

*** Bug 17525 has been marked as a duplicate of this bug. ***

*** Bug 32450 has been marked as a duplicate of this bug. ***

Duping here several related bugs, since i think this is the most clear case. Either everything should be parsed, or nothing.
This means r103476 should be reverted (if we decide we go for the nothing case, it would be done somehere else).

subst: could continue working in both cases (mentioned in bug 32450).

Note that javascript authors can make the links in such way that they don't get registered (eg. '[' + '[Category').

As a third alternative, we could do a different kind of parsing for js pages, so that wikitext would only be parsed inside js comments (bug 10410).

lambdav wrote:

Parsing JS/CSS seems useless :

  • {{subst:}} / {{template}} -> useless
  • [[Category:]] -> useless or in comments only
  • [[link]] / [link] -> useless or in comments only

Tracking script usage does not work for gadgets. If script usage should be tracked, it should be done as a new feature of MediaWiki instead.

If some parsing should be done, it should be only for [[Category:]] and [[link]] / [link], in comments only.

{{subst:}} is definitely *not* useless on JS pages; have a look at [[Template:Deletion sorting]] as one example (I'm not arguing that this method of script installation/use should be *encouraged*, merely pointing out that it exists and has been used historically).

I do agree that script usage needs to be tracked as a MediaWiki feature; this is what I was getting at above, though I never actually said as much.

I'd be fine with normal wikitext rendering only happening in comments (and I'd be far from the only person to wholeheartedly welcome functioning links in comments again!).

I meant to link to [[Template:Deltab]] in comment 9; I didn't check closely enough before submitting. =P

lambdav wrote:

Problem occurs when 'subst:' appears in scripts adding some buttons to place models for example. See https://fr.wikibooks.org/w/index.php?title=MediaWiki:Gadget-Barre_de_luxe.js&diff=prev&oldid=343887

subst: seem only used on English wikipedia, there is other projects and languages where this feature is not used.
Instead of using script as a template, parameters in user scripts should be used, so subst: seems useless.

For the example you give, instead of... :

Put in monobook (replace X with your area of interest) :
  {{subst:Deltab|X}}
Javascript source :
  document.editform.wpTextbox1.value += '\{\{subst:deletion sorting|{{{1}}}| -- \~\~\~\~\}\}\n';

...we should have :

Put in monobook :
  DeltabInterest = 'X'; // replace X with your area of interest.
  importScript('Deltab');
Javascript source :
  document.editform.wpTextbox1.value += '\{\{subst:deletion sorting|'+DeltabInterest+'| -- \~\~\~\~\}\}\n';

lambdav wrote:

Also it would be safer to use javascript parameters in user script rather than template parameter. Other language use more characters like ' é à ... which can generate some script errors, for example :

{{subst:Deltab|L'arbre}}

Also the subst doesn't allow using the last script version as it make substitution.

lambdav wrote:

So it would be better to stop parsing subst in scripts, and modify the wikipedia scripts like Deltab.

For clarity: the (In reply to comment #7)

Duping here several related bugs, since i think this is the most clear case.
Either everything should be parsed, or nothing.
This means r103476 should be reverted (if we decide we go for the nothing case,
it would be done somehere else).

I don't like environment-dependent special cases in the parser like r103476, I think the proper place to decide what to do with a JS/CSS page is Article/WikiPage.

subst: could continue working in both cases (mentioned in bug 32450).

Note that javascript authors can make the links in such way that they don't get
registered (eg. '[' + '[Category').

As a third alternative, we could do a different kind of parsing for js pages,
so that wikitext would only be parsed inside js comments (bug 10410).

I think a separate parser class along the lines of bug 10410 would be a nice way to go, it could implement syntax highlighting, linking and subst. But it's a bit late to develop that for 1.19. For now I am going to revert Hashar's changes and hack WikiPage somehow.

(In reply to comment #14)

I think a separate parser class along the lines of bug 10410 would be a nice
way to go, it could implement syntax highlighting, linking and subst. But it's
a bit late to develop that for 1.19. For now I am going to revert Hashar's
changes and hack WikiPage somehow.

@Tim: for WikiData, we (WMDE) plan to introduce explicit types for page content, and handlers for each type, similar to what we use for different types of media files. So, there could be a special renderer (and optionally also a special editor and a special diff engine) for text/css, application/js, etc.

Just wanted to note this here to avoid duplicate effort. I'll detail the plans on mediawiki-l soon, and we can discuss it at the SF hackathon.

As I already mentioned in CR as an author of user scripts I depend in some way on the fact that the javascript code is also parsed as wikitext.
Just this week I updated one of my scripts in a not-100-%-backwardscompatible way, so I had to inform the users of my script about the changes. With the "What links here" function this was no problem. To remove this means that an author of users scripts has no simple way to find out who is using his scripts. By looking at random .js user pages you will find that I'm not the only one who makes use of this feature, without looking much around I found [[User:Ale jrb/Scripts/igloo]] which asks users to include the backlink in a comment.

One of my scripts I don't want to show up in Google, so I put NOINDEX in it - this change makes this not to work any longer.

When a script causes links that shouldn't be there, you just have to put a <nowiki> inside a comment at the beginning.

And not parsing the script as wikitext makes it possible to circumvent Extension:SpamBlacklist and possibly other anti-spam-extensions, as I pointed out in CR.

lambdav wrote:

Javascript must be Javascript. How many javascript are broken because they are parsed as wiki ?

If you want wiki-parsing in scripts, you should make a feature request for special comment or tag. We should not have to modify numerous scripts to put a <nowiki> tag.

Also tracking script users like this is not good:
1 - Users knowning about Javascript syntax won't include the link comment. So all users are not listed.
2 - It doesn't work for gadgets.
3 - Tracking of users seems only useful for the total usage count only, not for the user names.
You may request a new feature of MediaWiki to get total script usage count, instead of using links.

Not parsing the script as wikitext DO NOT make possible to circumvent Extension:SpamBlacklist because links are not links.

(In reply to comment #18)

How many javascript are broken because they are
parsed as wiki ?

You only need to add backslashes in a few cases which triggers PST to prevent the script being broken:

  • var str = "{\{subst:template}}";
  • var str = "\";

And what more?

lambdav wrote:

No, putting backslash is not javascript ! Do you have a bot to make the replacement for all scripts, gadgets, on all wiki projects and all langs ?

The simplest way to have parsed scripts is to use another extension like .jsw instead of .js (.cssw instead of .css) because they aren't true javascript (stylesheet).

(In reply to comment #20)

No, putting backslash is not javascript !

Are you listening to yourself? Using a backslash in strings to escape particular characters so they aren't incorrectly parsed is done by just about everybody who writes JS, including those not writing it on MediaWiki installations - it's the simplest and most obvious way to prevent unwanted parsing by the JS engine running the script. Using the backslash to prevent unwanted parsing by MediaWiki is a natural and obvious extension of this built-in syntax without any downsides; every JS programmer immediately recognises what's going on when they see the backslash, even if they don't understand the reasoning for its being there. How exactly, then, is "putting backslash ... not javascript"?

(In reply to comment #18)

3 - Tracking of users seems only useful for the total usage count only, not for
the user names.

Michael M. gave an explicit counterexample to this in comment 17: 'Just this week I updated one of my scripts in a not-100-%-backwardscompatible way, so I had to inform the users of my script about the changes. With the "What links here" function this was no problem.' If the script tracking only listed the number of users with the script installed, that notification would have been impossible.

lambdav wrote:

(In reply to comment #21)

(In reply to comment #20)

No, putting backslash is not javascript !

Are you listening to yourself? Using a backslash in strings to escape
particular characters so they aren't incorrectly parsed is done by just about
everybody who writes JS, including those not writing it on MediaWiki
installations - it's the simplest and most obvious way to prevent unwanted
parsing by the JS engine running the script. Using the backslash to prevent
unwanted parsing by MediaWiki is a natural and obvious extension of this
built-in syntax without any downsides; every JS programmer immediately
recognises what's going on when they see the backslash, even if they don't
understand the reasoning for its being there. How exactly, then, is "putting
backslash ... not javascript"?

I means that the following is valid Javascript :

var str = "{{subst:template}}";

Why will this code need escaping at all as Javascript is expected ?

(In reply to comment #18)

3 - Tracking of users seems only useful for the total usage count only, not for
the user names.

Michael M. gave an explicit counterexample to this in comment 17: 'Just this
week I updated one of my scripts in a not-100-%-backwardscompatible way, so I
had to inform the users of my script about the changes. With the "What links
here" function this was no problem.' If the script tracking only listed the
number of users with the script installed, that notification would have been
impossible.

Then use special MediaWiki feature instead of special Wiki+Javascript syntax.
Wiki pages are wiki, Javascript pages should be Javascript.

Is there someone using {{subst: to do strange things such as version counting?

If no, I guess it's safe to stop doing PST for .css/.js.

Hmm there're various pages telling people to add {{subst:something}} to their <skinname-or-common>.js to install a user script.

(In reply to comment #24)

Hmm there're various pages telling people to add {{subst:something}} to their
<skinname-or-common>.js to install a user script.

I think this could be replaced by a link in the script documentation page:
http://en.wikipedia.org/w/index.php?title=Special:MyPage/common.js&action=edit&withJS=MediaWiki:Example.js
where [[MediaWiki:Example.js]] would have a simple code which adds the script to the end of the user's common.js page.

This would be an script only solution for the installation of scripts.

There is also the following for helping users to install scripts:

  • [[User:Gary King/script installer source.js]]

(In reply to comment #2)

Better tracking would be good, yes. But another use that we should consider is
that people sometimes slap deletion templates on their .css and .js pages, and
it works. Without that, people would be forced to make such deletion requests
on some other page.

Well, it doesn't works as expected: the deletion category is not shown in the script page so the users are already forced to find some admin to delete their scripts, because the template doesn't seems to work (and also causes JS errors if it is not put /* inside of a comment */ )

(In reply to comment #19)

You only need to add backslashes in a few cases which triggers PST to prevent
the script being broken:

  • var str = "{\{subst:template}}";
  • var str = "\";

Users who try to use JSHint[1] to validate a code which uses this will get a "Bad escapement" error.

Although they can just change it to

var str = "{" + "{subst:template}}";
var str = "~~" + "~~";

to avoid the errors, having to fix every time these sequences of characters which have special meaning in wiki markup is a PITA. Valid JavaScript code should not have have unexpected results when inside of a wiki page.

[1] The tool http://www.jshint.com/ is recommended by the developers, at [[mw:Manual:Coding_conventions/JavaScript#Performance_and_best_practices]]

(In reply to comment #26)

(In reply to comment #2)

Better tracking would be good, yes. But another use that we should consider is
that people sometimes slap deletion templates on their .css and .js pages, and
it works. Without that, people would be forced to make such deletion requests
on some other page.

Well, it doesn't works as expected: the deletion category is not shown in the
script page so the users are already forced to find some admin to delete their
scripts, because the template doesn't seems to work (and also causes JS errors
if it is not put /* inside of a comment */ )

(Keeping in mind that I have almost no technical understanding of how the parsing infrastructure currently works) The simplest (and most naive) method of fixing that would be to just parse CSS/JS to find comments, and send the content of each comment on to the parser proper. Of course, that opens up one massive can of worms as to how we'd want to deal with markup that does just about anything except displaying text with simple formatting and links (what, for example, would be the preferred method of handling lists? tables? images?), and it would also break the ability to copy CSS/JS directly while viewing the relevant page (since you wouldn't get the parsed markup in comments)... But then, I *did* say it would be a naive method.

One more thing which may break when fixing this bug is the hack used on [[Template:Selfsubst/now string]] to produce an auto updating string (e.g. for version of scripts).

(In reply to comment #18)

Not parsing the script as wikitext DO NOT make possible to circumvent
Extension:SpamBlacklist because links are not links.

As I pointed out in r105664#c27321, it DOES make it possible: When you transclude a .js/.css-page in a normal page the links are links again, even if they aren't rendered as links in the .js-page.

Just now a user reported an issue on zhwiki village pump that an edit on user js page triggers AbuseFilter because of "buggy" template usage.

lambdav wrote:

Transcluding a .js/.css-page in a normal page (which I would never do, just putting link to .js/.css page is better) should not be done like any other template, but using javascript syntax highlight without any link parsing (like viewing the .js/.css page directly).

I reverted the fix from r105664 and I'm marking this bug WONTFIX because:

  • Concerns raised on this bug report indicate a lack of consensus and the potential for disruption on deployment.
  • Precaution leads me to favour the existing behaviour over the new proposal in cases where the best behaviour is unclear.
  • A better solution exists, which would satisfy people on both sides of this debate: registering links only where they occur in comments, along the lines of bug 10410.

(In reply to comment #29)

One more thing which may break when fixing this bug is the hack used on
[[Template:Selfsubst/now string]] to produce an auto updating string (e.g. for
version of scripts).

Removing the pre-save transform was not requested here or implemented in either of the proposed fixes, so the selfsubst templates would have still worked.

lambdav wrote:

Revert to MW 1.17 instead.
MW 1.18 caused multiple problems.

What about the {{subst:}} problem ?

(In reply to comment #34)

Revert to MW 1.17 instead.
MW 1.18 caused multiple problems.

I'm not aware of any difference between MW 1.17 and MW 1.18 in the way it handles CSS/JS page parsing. Please file a separate bug.

What about the {{subst:}} problem ?

You can file a separate bug for that. But the discussion here indicates that it would probably be a WONTFIX also since subst is desired by some.

lambdav wrote:

(In reply to comment #35)

(In reply to comment #34)

Revert to MW 1.17 instead.
MW 1.18 caused multiple problems.

I'm not aware of any difference between MW 1.17 and MW 1.18 in the way it
handles CSS/JS page parsing. Please file a separate bug.

What about the {{subst:}} problem ?

You can file a separate bug for that. But the discussion here indicates that it
would probably be a WONTFIX also since subst is desired by some.

See previous comment: I already opened Bug 32450 but it has been marked as duplicate of this.

The only conclusion I see is that problems won't be fixed, so I doubt about bug report being useful here...

(In reply to LordAndrew from comment #2)
...and {{delete}} and [[Category:]] do not work on JS pages anymore, per T70757 (edit: now fixed).

Perhaps a // __NOSCAN__ comment at the top of the JS/CSS could be used to skip the parse/save step.

This is somewhat separate from skipping PST expansion of the text, which could either be a separate magic comment or parameter to EditPage.