Page MenuHomePhabricator

Support for Parsoid-based extracts
Closed, DuplicatePublic

Description

This is important since the extracts are currently generated with the old API which doesn't guarantee valid xhtml5. That prevents third-party developers using less forgiving markup environments from fetching the extracts to build tools on top of them, e.g. https://github.com/waldyrious/primerpedia/issues/20

I'd work on this myself but I really don't have the background to understand the code in ApiQueryExtracts.php on my own (even after taking a look at VisualEditor's ApiVisualEditor.php to check an example implementation of a request to Parsoid, as suggested by Mark Traceur).


Version: unspecified
Severity: enhancement

Details

Reference
bz65169

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:18 AM
bzimport added a project: TextExtracts.
bzimport set Reference to bz65169.
bzimport added a subscriber: Unknown Object (MLST).
KLans_WMF set Security to None.
Jdlrobson subscribed.

I believe this is resolved?
Is this what you were asking for: https://en.wikipedia.org/api/rest_v1/page/summary/Bavaria ?

IIUC, that is not exactly what this request is about: currently I'm using a request like https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&indexpageids=true&format=json&generator=random&grnnamespace=0&format=json, which returns a html-formatted extract. The problem is that the html is not guaranteed to be valid, hence it fails if embedded in a xhtml page (or any other context where well-formed xml is required).

The REST API, in contrast, only returns plaintext extracts at the moment. I could use them, but that's a workaround (avoiding rich formatting altogether) rather than a fix.

Is there any way to use the REST API to get an extract including the (rendered, not wikicode) basic formatting used in the actual article? For example bold, italics, and perhaps links as well.

Thanks for clarifying @waldyrious .
We are considering switching to HTML extracts for Page previews so watch this space.
I think this task is exactly the same as what we are talking about in T113094 and its subtasks ?

I think this task is exactly the same as what we are talking about in T113094 and its subtasks ?

Yes, it looks like it, although the title/description of that issue is a little too broad for me to be comfortable calling this a dupe (particularly the line "we either need to write a bunch of tests and fix up TextExtracts or build a new Parsoid based api specifically for the purpose of Hovercards" -- this bug is specifically about the latter, due to the need for xml well-formedness).

TextExtracts could provide xml well formedness via php (we have a library in MobileFrontend doing exactly that). Is there any other reason you specifically want this in REST?

I suspect we will end up creating a rest endpoint to solve the epic. I've included well formed HTML as a requirement. Does this feel less broad now?

I need to be able to get the formatted extracts either via the old api.php or the new REST service. If a new endpoint is created providing such extracts with well-formed HTML, yes, that would work for me.

Thanks for the feedback! We'll hopefully be prioritising this soon as part of the page previews/Hovercards roll out... so watch this space!