Page MenuHomePhabricator

"Create new topic" does not work with $wgFlowContentFormat = wikitext
Closed, ResolvedPublic

Description

Author: bsitu

Description:
It throws invalid workflow exception


See T88908: Drop support for non-Parsoid configuration.


Version: master
Severity: normal

Details

Reference
bz70148

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:35 AM
bzimport set Reference to bz70148.
bzimport added a subscriber: Unknown Object (MLST).

https://www.mediawiki.org/wiki/Extension:Flow#Dependencies mentions this bug, and emphasizes the html format with Parsoid is what WMF uses and tests.

We could drop wgFlowCoptentFormat = wikitext if it makes life easier...

In my specific usecase I was able to switch to html format at little cost or effort; but it would seem to me that a hard dependency on Parsoid is going to severely restict the number of non-WMF sites who can deploy that otherwise very valuable extension.

I don't get to set priorities on dev time, but I'm pretty sure that dropping wikitext markup would be a Bad Idea™

gerritadmin wrote:

Change 169711 had a related patch set uploaded by Matthias Mullie:
(bug 70148) Fix new topic creation with wikitext storage

https://gerrit.wikimedia.org/r/169711

In my specific usecase I was able to switch to html format at little cost or effort; but it would seem to me that a hard dependency on Parsoid is going to severely restict the number of non-WMF sites who can deploy that otherwise very valuable extension.

I don't get to set priorities on dev time, but I'm pretty sure that dropping wikitext markup would be a Bad Idea™

Sure. Medium and small-sized wikis may just want a better discussion system without being willing to install Parsoid or VisualEditor.

MediaWiki is moving to soa, parsoid will likely be one of several services wikis require for various features. The best solution is going to be working with the SOA team to ensure making services available works with your use case.

MediaWiki is moving to soa, parsoid will likely be one of several services wikis require for various features. The best solution is going to be working with the SOA team to ensure making services available works with your use case.

The effort in librarization is valuable, but it seems that just running a simple MediaWiki is going to get harder and harder...

In my specific usecase I was able to switch to html format at little cost or effort; but it would seem to me that a hard dependency on Parsoid is going to severely restict the number of non-WMF sites who can deploy that otherwise very valuable extension.

I don't get to set priorities on dev time, but I'm pretty sure that dropping wikitext markup would be a Bad Idea™

Sure. Medium and small-sized wikis may just want a better discussion system without being willing to install Parsoid or VisualEditor.

The problem is, no one is testing the non-Parsoid configuration much. As far as I know, there is no automated browser testing for that configuration. And I don't even know of anyone regularly using wikis with the non-Parsoid configuration and reporting back (informally or formally).

It's better to have one well-tested path, then two paths, one of which sorta works but has random breakage that doesn't immediately get caught. This also requires maintaining extra code.

That's why we are leaning towards eliminating it (T88908: Drop support for non-Parsoid configuration).

I strongly disagree with dropping non Parsoid support. Parsoid is incredibly costly, and while I would love to switch my LQT enabled wiki to Flow, I have no desire to install costly backend software to do so, nor would many smaller non WMF wikis with more limited resources, and if it means it performs less optimally, I would consider that an acceptable compromise to retain non Parsoid functionality.

LQT was thrown under the bus, more or less, because the WMF found Flow to be a better long term solution, and I would rather the WMF take more time to ensure non Parsoid fallback compatibility (which is advantageous to the WMF if Parsoid fails for some reason) than to eliminate any failsafes and disadvantage any other wikis wishing to use Flow.

I strongly disagree with dropping non Parsoid support. Parsoid is incredibly costly, and while I would love to switch my LQT enabled wiki to Flow, I have no desire to install costly backend software to do so, nor would many smaller non WMF wikis with more limited resources, and if it means it performs less optimally, I would consider that an acceptable compromise to retain non Parsoid functionality.

Costly in what way, exactly?

  • Requires a lot of man-hours to install?
  • Difficult to maintain?
  • Requires high-end services or equipment to run?

Maybe there's a way Parsoid could be easier for you to use.

The main reason we want to remove support for non-Parsoid is that it is costly (mainly in software engineer time) to maintain two versions of key parts of the code.

In T72148#1205608, @Mattflaschen wrote:

I strongly disagree with dropping non Parsoid support. Parsoid is incredibly costly, and while I would love to switch my LQT enabled wiki to Flow, I have no desire to install costly backend software to do so, nor would many smaller non WMF wikis with more limited resources, and if it means it performs less optimally, I would consider that an acceptable compromise to retain non Parsoid functionality.

Costly in what way, exactly?

  • Requires a lot of man-hours to install?
  • Difficult to maintain?
  • Requires high-end services or equipment to run?

Maybe there's a way Parsoid could be easier for you to use.

The main reason we want to remove support for non-Parsoid is that it is costly (mainly in software engineer time) to maintain two versions of key parts of the code.

Parsoid is extremely resource intensive, even on small setups, and while the WMF may be able to easily ignore this, many smaller non WMF setups would not be able to do so, and I have to confess I'm greatly interested in introducing Flow to the Orain wiki farm (which I serve at as a staff member), but I'd rather we not have to have to acquire much more resources than we use currently to support Flow.

I have also tested Parsoid on a localhost configuration, and I found Parsoid absorbed over half a gigabyte of resources just to function (on a Windows setup, have not fully evaluated this on a Linux at present), and I'm not sanguine on trying to use Parsoid on a wider scale given this resource drain on a small scale.

Also, while I understand keeping around two sets of code is indeed costly to the WMF, I do not see the harm in maintaining such code if it means the extension can be used on non Parsoid enabled setups, as, to be blunt, the WMF provided MediaWiki to the world for everyone to use, and while I understand the WMF wants to tailor this software to fit it's current needs, it could be of great benefit to non-WMF users, and I see no good reason to throw those use cases under the bus just to make one organization happy.

Also, since it is intended to succeed LiquidThreads, and many LQT users on non-WMF setups may wish to migrate to Flow like my organization, I don't see the point in making their task in doing so harder by forcing them to add backend software that consumes a ton of overhead just to use the successor to LQT.

I definitely see both sides of the argument here. I agree that, in most cases, maintaining two versions of key parts of a single piece of software is quite costly. However, as a small wiki operator myself, I feel that at this point, restricting Flow to Parsoid-only would only hurt wide adoption, because at this point, Parsoid is still rather experimental (definitely at least "Beta" software).

With a tool like VisualEditor, it's obvious why Parsoid needs to be required, but I don't see why Flow by itself needs to have access to the powerful parsing tools Parsoid provides, especially given that Parsoid is still experimental and is rapidly changing. At least until Parsoid becomes stable and ready for mass release, I would say that giving sysadmins the opportunity to be able to use Flow without needing Parsoid is important, even if it does increase the cost of Flow development.

matthiasmullie claimed this task.
matthiasmullie subscribed.

This issue has been fixed for awhile now. AFAIK, everything currently works in wikitext storage.

Changing my token to make clear my opinion is the same as @Arcane21's one.

Parsoid is extremely resource intensive, even on small setups, and while the WMF may be able to easily ignore this, many smaller non WMF setups would not be able to do so, and I have to confess I'm greatly interested in introducing Flow to the Orain wiki farm (which I serve at as a staff member), but I'd rather we not have to have to acquire much more resources than we use currently to support Flow.

To determine whether it's overly resource-intensive, you would have to compare it to other software that does the same thing: Bidirectional conversion of editable HTML to wikitext.

The reason node.js was used for Parsoid is that investigation showed it was far more performant than the same thing would be in PHP. There's no identical implementation of the current Parsoid in PHP (since no one chose to write one). However, this has almost certainly not changed.

With a tool like VisualEditor, it's obvious why Parsoid needs to be required, but I don't see why Flow by itself needs to have access to the powerful parsing tools Parsoid provides, especially given that Parsoid is still experimental and is rapidly changing.

Though VisualEditor is not required (either for wiki operators installing Flow or for individual Flow end users), VE is a key part of the vision for Flow (that's why we use Parsoid to begin with). That means, for example, we now use VE for previewing (so no preview without Parsoid).

In T72148#1230034, @Mattflaschen wrote:

Parsoid is extremely resource intensive, even on small setups, and while the WMF may be able to easily ignore this, many smaller non WMF setups would not be able to do so, and I have to confess I'm greatly interested in introducing Flow to the Orain wiki farm (which I serve at as a staff member), but I'd rather we not have to have to acquire much more resources than we use currently to support Flow.

To determine whether it's overly resource-intensive, you would have to compare it to other software that does the same thing: Bidirectional conversion of editable HTML to wikitext.

The reason node.js was used for Parsoid is that investigation showed it was far more performant than the same thing would be in PHP. There's no identical implementation of the current Parsoid in PHP (since no one chose to write one). However, this has almost certainly not changed.

With a tool like VisualEditor, it's obvious why Parsoid needs to be required, but I don't see why Flow by itself needs to have access to the powerful parsing tools Parsoid provides, especially given that Parsoid is still experimental and is rapidly changing.

Though VisualEditor is not required (either for wiki operators installing Flow or for individual Flow end users), VE is a key part of the vision for Flow (that's why we use Parsoid to begin with). That means, for example, we now use VE for previewing (so no preview without Parsoid).

As to the first issue, I understand that, but we don't WANT to have, need, or require VE at this time, just Flow, mostly because we don't want to be shackled to Parsoid, which we discovered would not scale well to our current resource infrastructure. The WMF can easily use it because frankly they've got resources to burn and them some, but Parsoid (the service backend) is costly to maintain in terms of memory and processor cycles, and while it may be a magical engine that solves a lot of the WMF's problems, it's a resource hog otherwise, and while it works great for VE and the WMF, and I don't dispute that, we and I'm certain almost every single smaller userbase with far less resources to burn does not want to be forced to use something that only the WMF is happy with.

Forgive me if this sounds harsh, but I believe the WMF seems to be proceeding on the somewhat arrogant assumption that if they like something, the rest of the world should easily be able to adapt and emulate it if they want the same, even though the resources needed to maintain it aren't as plentiful on setups far smaller than the WMF's.

Further, as you pointed out, Flow does not currently require it, and while Visual Editor needing Parsoid made sense from the get go (given its intended design and purpose), forcing Flow to conform to the same when it is not strictly required strikes me a little more than railroading non-WMF end users into using something they don't want simply because it make WMF users happy to do so.

As to the second rebuttal, it merely proves my first point: VE is the WMF's baby, and the WMF is intent on forcing everything that hooks into it to use the same standards and nothing else, despite the inconvenience this places on those who do not wish the added expense of Parsoid when the optional hook in feature does not strictly require it.

I understand the WMF wants to make their lives easier, and as a staff member on a wiki server setup myself, I'm sympathetic from an engineering standpoint. However, as an end user of MediaWiki and someone who has to help install, configure, and adjust settings on our setup to work nicely with anything we add, this would be expense in resources to add when as you pointed out it's not a strict requirement, and on a personal note, if adding Flow was merely adding the extension and running the update script without needing a backend that has no real reason to manacled to an extension that was not designed to strictly need it in the first place, I see no reason to make those like myself and others have to suffer Parsoid's presence at our expense when that could be avoided.

I respectfully remind the WMF MediaWiki is global software, and that I feel the WMF should be mindful not all wiki farms and setups are as wealthy in processor and memory resources to burn to maintain the same configurations as the WMF, and since Parsoid is not needed for Flow (which has a negligible resource footprint without Parsoid in tests I have run so far), I see no reason to force others to use Parsoid only when they don't have to.

I'm not opposed to a dual operation standard, in which Parsoid enabled setups can benefit if Parsoid is installed, but if it's not, then it should retain the non Parsoid mode for those who frankly have no desire or intention to ever use or enable Visual Editor.

I think some assumptions may be wrong here: Flow doesn't need Parsoid because we integrate with VE.

As you may know, Flow is (currently) able to store content in either wikitext or html.
In production, we chose to store data in html, so that we need less resources to keep recompiling the wikitext to html when the page needs to be displayed.
Parsoid is indeed resource-intensive, but less so than the alternative of converting all wikitext to html for every request.

In this scenario (storing html), we need Parsoid to make it possible to edit things (in wikitext) that have been stored in html: plain old PHP parser can't convert html to wikitext.
Since we recently added VE integration, Flow also needs Parsoid to convert from VE back to wikitext editor.
The reason we *need* Parsoid (in production) is not VE, but wikitext-editing. Actually, we wouldn't need Parsoid if we *only* supported VE (we could just store VE's html output).

If you don't want to use Parsoid, you (currently) can, but:

  • you have to store content in wikitext & convert to html for every request (will need lots more CPU, depending on amount of requests)
  • you can't use VE (since we can't convert the html back to wikitext)

We currently still support wikitext storage/non-Parsoid (in part because we simply have to: jenkins isn't set up to let us connect to Parsoid yet) although it's suboptimal (see above).
We must face that it's a burden to maintain & we just can't test everything all the time. Things may be broken from time to time.

We don't need Parsoid because we have resources to burn, or because we want to push VE; those 2 would be easier without Parsoid & compromising on other things.
We choose to use Parsoid because it is the best (only) solution to be more conservative on resources (store in html) & still support editing one's post in wikitext.

AFAIK, there are no concrete plans to abandon wikitext storage/non-Parsoid support yet.
I think we can put this discussion to rest at least until we can contact Parsoid on jenkins, which may or may not be awhile.
I'm also going to close T88908: Drop support for non-Parsoid configuration