Page MenuHomePhabricator

Add robot policy for each namespace to the dumps
Open, MediumPublic

Description

For individual pages, the __NOINDEX__ and __INDEX__ page properties (available in page_props.sql lowercase and without the underscores) can be used to determine overrides.

However, the baseline robot policy for each namespace should also be dumped. For each namespace, this can be determined by starting with [[mw:Manual:$wgDefaultRobotPolicy]] and then overriding it with [[mw:Manual:$wgNamespaceRobotPolicies]]. For convenience (it's not a significant storage cost since there are generally not many namespaces), it should state the policy for each namespace, even the ones that simply inherit $wgDefaultRobotPolicy.

I think it would be simplest to just use robotpolicy="noindex,nofollow" (or whatever the actual policy is) on each <namespace> element, since that's the format used in the HTML output and the configuration variables.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58758

Details

Reference
bz58805

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:16 AM
bzimport set Reference to bz58805.
Nemo_bis lowered the priority of this task from Medium to Low.Apr 9 2015, 7:30 AM
Nemo_bis subscribed.

I think it would be simplest to just use robotpolicy="noindex,nofollow" (or whatever the actual policy is) on each <namespace> element, since that's the format used in the HTML output and the configuration variables.

That would be a MediaWiki bug. I think here you just really want to dump the siteinfo. Cf. https://github.com/WikiTeam/wikiteam/blob/2b78bfb795f2063e8f95beecef221e6494e57f61/dumpgenerator.py#L1783

Mattflaschen-WMF set Security to None.

I think it would be simplest to just use robotpolicy="noindex,nofollow" (or whatever the actual policy is) on each <namespace> element, since that's the format used in the HTML output and the configuration variables.

That would be a MediaWiki bug.

What would be a MediaWiki bug? I'm asking to change the behavior of the dump-generation, and I've filed the bug in the component for that.

Mattflaschen-WMF raised the priority of this task from Low to Medium.Apr 15 2015, 4:02 AM

What would be a MediaWiki bug?

The XML output, like the <namespace> tag you're asking about, is generated by MediaWiki core. https://www.mediawiki.org/wiki/Special:Export/Sandbox
I think that parsing <namespaces> in XML is the old and deprecated way to access this information, anyway if desired that should be done in MediaWiki export and would apply to Special:Export and everything.

Aklapper added subscribers: ArielGlenn, Aklapper.

@ArielGlenn: Hi, I'm resetting the task assignee due to inactivity. Please feel free to reclaim this task if you plan to work on this - it would be welcome! Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for more information - thanks!