Page MenuHomePhabricator

Make MediaWiki more RESTful
Closed, ResolvedPublic

Description

I just started thinking about this when I saw bug #41836 breeze by on IRC, but we *really* don't have *any* RESTful interfaces available. We should consider changing that.

Things that should get better, off the top of my head:

  • Pages/articles should be GET'd, like they are now. ?action=* should probably be replaced by root-level directories like /pageinfo/* and /parse/*, also as GET queries.
  • Edits should be done via POST, but they should be done *at the same URI* as the page you're editing. So "GET /wiki/Barack_Obama" and "POST /wiki/Barack_Obama" mean exactly what you'd expect.
  • The API probably needs a lot of changing, but I'm not sure where to start.

These issues could probably all be separate bugs, with this as the tracker. This is decidedly a wishlist item, too, so I'll mark the priority as low and make sure it's an enhancement.

Also, this will change a lot of how the community interacts with MediaWiki, so it may need to be an extension, or at least configurable, at first. We'll absolutely need community consensus to enable this on wikipedias, and that will almost certainly take a long time. Maybe this is a project we could do incrementally.

Any thoughts welcome!


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=14123

Details

Reference
bz41837

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:46 AM
bzimport set Reference to bz41837.
bzimport added a subscriber: Unknown Object (MLST).

Note, the above is just brainstorming, so you should feel free to replace my stupid ideas with your own, brilliant ones :)

federico wrote:

Hello Mark,

it's great to see interest around new API ideas!

There's a group of people that has started collecting notes and goals over at https://www.mediawiki.org/wiki/API/API_rewrite .

Despite keeping the conversation around this topic here in the tracker is a possibility, it would be of benefit for all those interested/involved to progress on this directly in the talk page connected to that documentation, what do you think?

I added a comment to the talk page, pointing here and saying I'd accept comments on the talk page as well. I'd prefer to keep track of the process on bugzilla, but we can also mirror it to mw.org periodically.

federico wrote:

Makes sense :)

Did you have a chance to go through the notes linked from the API Rewrite page yet?

https://www.mediawiki.org/wiki/API/API_rewrite/Kickoff_meeting

I see you're interested in getting a better usage of the HTTP verbs and less complex URL's, you should find a lot of familiar references in that document.

Also, have you read about ROA (Resource Oriented Architecture - http://www.infoq.com/articles/roa-rest-of-rest) before?

Interesting links both!

Note that this is more than only the API, though. RESTfulness could also be applied to the inner workings, like editing and submitting other forms. Through that process we can also become a little more equipped to change things into an AJAXy interface, as opposed to keeping around GET/POST interfaces only.

(we'll still need GET/POST interfaces for non-JavaScript browsers, but AJAX is helpful for those of us on modern browsers)

(In reply to comment #0)

  • The API probably needs a lot of changing, but I'm not sure where to start.

I don't think the API (as it is, anyway) is particularly amenable to being made RESTful; for example, how do you specify 500 page titles in a RESTful API like you can do with the API's title=A|B|C|...? See some of the discussion on bug 38716 for further discussion on that.

As for other stuff, how does a "RESTful" setup handle posting for page edits versus preview versus diff without requiring the browser support JavaScript to swap out the form action?

Brad, the API stuff could use a lot of iterating, for sure. I think /api/pages/A,B,C might be legal, if not ideal. I'm not sure how to do that better, but we can always fall back to old-API in the edge cases.

As for edits/previews/diffs, POST /preview/<article> and POST /diff/<article> seems sane to me. And POSTing the actual article would be as easy as POST /wiki/<article>, of course.

(In reply to comment #7)

Brad, the API stuff could use a lot of iterating, for sure. I think
/api/pages/A,B,C might be legal, if not ideal. I'm not sure how to do that
better, but we can always fall back to old-API in the edge cases.

2 APIs is unlikely to be a very good idea, unless the functionality of one is entirely a superset of the other so it is implemented as a wrapper (but then, why bother?).

As for edits/previews/diffs, POST /preview/<article> and POST /diff/<article>
seems sane to me. And POSTing the actual article would be as easy as POST
/wiki/<article>, of course.

Exactly my question. With only HTML (no JavaScript), how do you convince the user's browser to post to /preview/<article> when you click one button in the form, /diff/<article> when you click another, and so on? Or is this another situation where you're having two parallel interfaces?

I'm not sure we are really interested in a REST interface. Remember that we
will still need metadata such as summaries, previous revision...

Also, it is possible for proxies to block you. Such as not knowing the method
and giving you an error. Try to explain to your ISP/company that its proxy is
blocking a RESTful interface.

As for edits/previews/diffs, POST /preview/<article> and POST /diff/<article>
seems sane to me. And POSTing the actual article would be as easy as POST
/wiki/<article>, of course.

This is a headache on server config, with no clear benefit, since posting to index.php works equally well.
I thought you wanted something like PUT /wiki/Somearticle.

(and /diff/ is idempotent, so it should really be GET)

Brad, I agree about the 2 APIs problem. But I also think we have some brilliant minds that may be able to come up with a solution!

About the different POST actions, I think we already must access different URLs for the actions, unless I misunderstand the way the edit page works. I admit to only having looked at that code once or twice, but I think it's relatively similar.

Platonides, that's a good point, and we may need to mostly stick to GET and POST at the beginning. I'm sure there are other ways to solve that problem--I agree with Brad, that two APIs may not be a great idea, but it may solve some of these problems in the interim.

To your second comment, while index.php works, you're not actually asking for index.php, and you're certainly not modifying it. That's the benefit. Maybe it's so marginal at this point that we might not want to do it :)

/diff/article, however, would require that you POST your version of the code--or did I misunderstand what Brad was asking? It's not idempotent, though, because the response to POST /diff/Testing will be different on the second request if the second request includes a different text.

As for PUT /wiki/Somearticle, we'd want to overwrite an article that was already there, barring a dirty diff, so POST seems like the way to go. IIRC "PUT" means "add it, unless it exists, then throw this away". My HTTP dialects may be a little rusty, though :)

Broadly speaking, a good API should fulfill two requirements:

  1. It ought to be meaningful, intuitive, useful, etc. to human beings (the 'soft' requirement).
  2. It ought to be performant, stable, and secure (the 'hard' requirement).

We might be able to use our current API to make progress by decoupling the two requirements. We could build a thin wrapper around the current API: each request to the new API would get munged and re-written as a request to the old API. This won't be especially fast, but it would allow us to iterate on the soft requirement and experiment with different URL schemes and the like while relying on the security and stability of the current API.

Once we're satisfied that the new API meets the soft requirement (i.e., it's intuitive, useful, etc.) we could, in a piecemeal fashion, put real implementations behind the new API.

(In reply to comment #11)

About the different POST actions, I think we already must access different URLs
for the actions, unless I misunderstand the way the edit page works. I admit to
only having looked at that code once or twice, but I think it's relatively
similar.

Not exactly. HTML forms have specific support for knowing which of multiple submit buttons was clicked. Each button has a different name: the Save page button is "wpSave", the Preview button is wpPreview, and the Show changes button is "wpDiff". Depending on which button you click, that button's name gets included in the POST data along with the form text, edit summary, hidden fields, and so on.

The big difference here is that this is all handled by the browser, no scripting necessary.

/diff/1-2/article may always mean "compare versions one and two of this article". Or just put the revisions in the query string as we do now.

No, PUT means "place this in that location". It is normal that it replaces existing content. https://tools.ietf.org/html/rfc2616#section-9.6

A problem with less-common methods is that it's harder to hook to them in the server config, though.

(In reply to comment #0)

  • Pages/articles should be GET'd, like they are now. ?action=* should probably

be replaced by root-level directories like /pageinfo/* and /parse/*, also as
GET queries.

I'm not sure if I understand correctly, but if you mean from the index.php side of things (not api), we've had support for that for a long time. http://wikitech.wikimedia.org/history/Main_Page vs http://wikitech.wikimedia.org/edit/Main_Page etc. Very few people use the feature though


/me not a fan of restful stuff in general, usually seems to be a solution in search of a problem.

I did some actual severe reading on "REST". Unfortunately it seems that "REST" is now used to refer to two completely different things.

The original REST. This REST is the original that Roy Fielding defined in his doctoral dissertation. The goal of this REST is NOT what you see in all these per-site/per-software proprietary APIs. But is actually intended for long term things that work at the scale of the entire internet. Unfortunately even after reading and understanding it I'm not quite sure I could even explain it that well. Anyways, no one who says they use REST is actually using rest. Better -- but perhaps not perfect -- examples of REST clients, would be web browsers and Atom feed readers.

The second REST. This REST is the rest practically everyone is really talking about when they say they use rest. While in a way based on the real original rest. This REST is really so twisted, distorted, and defiled that the only similarities between it and the original rest amount to using HTTP properly when you use it (ie: practice things that are good for caching, and never use GET for things that trigger an action). The rest of this REST are not just different from the original, many of them actively violate the core principles of actual REST.

And don't bother with our Wikipedia article. It basically tries to fuse both types of REST into one page as if they were one and the same. The result being a page that at best contradicts itself. And it's not something all that easy to fix.

Now the original REST is for internet scale things. The goal is different than what the goal of our API is. While implementing this type of REST would be a very admirable goal, it would have basically nothing to do with our API. And it would be a different kind of project.

As for the buzzword type REST. This type of REST has limited use. There aren't really any advantages to it. Many of the supposed advantages of this type of rest such as cacheability and statelessness are really just advantages of using HTTP intelligently and have nothing to to with the specific REST patterns and restrictions. ie: This type of REST is about as worthless as MVC frameworks are to the good practice of MVC.

And now. I should probably point out something. Switching from our query parameters to /pageinfo/... is not REST. The real original REST does not mandate url formats. And using a root like /pageinfo/ also violates the url patterns used in the mangled REST.


So the general closing point. Put some actual critical thought into each idea you have for the api and decide whether it's a good idea based on the pros and cons specific to it. Instead of clinging to the idea because some desecrated buzzword says you should do it.

federico wrote:

Hello guys,

sorry for not getting back to this earlier (those are some busy days) :)

I'd like to clarify on an important detail: the new API proposal me and Siebrand linked to it's not necessarily about re-writing the current API (one would argue that "API rewrite", the title of the page on MW.org, doesn't really help from this perspective and if that is the wording also in the kickoff notes, then those are inaccurate as notes taken while people was brainstorming/discussing can usually be), it's about creating a new one using a totally different design for an "high REST service" ("internet scale REST" to use Daniel's words) using the ROA approach (since REST, like OOP, is a design criteria and ROA is an actual architectural design), just to say: since REST is protocol agnostic, we're inquiring alternate transport protocols as an addition to HTTP(S) too, with the side goal/benefit of making MediaWiki a fully programmable (not-only-web) service.

Starting afresh will let us structure the whole thing with some built-in features that are not easy (at times almost impossible) to embed in the current API, such as authentication (OAuth), performance (caching, hiphop compliance), stability (versioning), accessibility (Service Description Language, automated docs based on PHPDoc, normalized URI's, data schema, existing REST client libraries), quality (unit tests) and re-usability (we're figuring out a way to use the API classes for developing MediaWiki extensions without the mediation of FauxRequest) just to mention a few.

Those have been recognized as improvements upon the current situation during the meeting between the Foundation and Wikia; of course there might be challenges ahead, but that's why we decided to cooperate and learn from each other's experiences.

During the same meeting a possible transition proposal has been mentioned ("Legacy support for a while.... maybe support and keep updating the old one while building the new one. New endpoint, not backwards-compatible.", to use the exact wording from the notes), this is something which doesn't fall into the domain of the RFC work we've started recently; there are benefits and problems that should be analyzed from many different perspectives in both keeping and deprecating the current API (e.g. breaking old clients, updating/refactor code, authentication via keys, maintaining both versions, quotas, performance/caching in no specific order and grouping) and overall that process is not presented as a goal.

TL;DR: a new API proposal doesn't necessarily mean a death sentence for MW's API; an RFC, when ready, will be published and will represent just the research work done by Wikia in cooperation with the Foundation to see where this idea can lead MediaWiki as a programmable service/platform.

All your feedback is greatly appreciated (especially what you think is good/bad in the current solution, what you would like to see being added/removed/done differently), this is all information that is extremely valuable at this time as it can help us in designing a better solution.

Now onto Mark's proposal of integrating a RESTful interface in the rest of the platform: the example for the article creation/diff could be simple to address if we think of the Article class being exposed as an addressable resource and that by no means REST excludes the usage of parameters in general when they're necessary, forms could use directly that resource via the REST entry-point with no need to modify any current behavior.

And now a small request: I know that some like to keep this in the tracker, but it would be great if we could move this conversation to the mediawiki-api mailing list (http://lists.wikimedia.org/pipermail/mediawiki-api/) where other API consumers and developers could join us in this engaging conversation.

(In reply to comment #18)

Hello guys,

sorry for not getting back to this earlier (those are some busy days) :)

I'd like to clarify on an important detail: the new API proposal me and
Siebrand linked to it's not necessarily about re-writing the current API (one
would argue that "API rewrite", the title of the page on MW.org, doesn't really
help from this perspective and if that is the wording also in the kickoff
notes, then those are inaccurate as notes taken while people was
brainstorming/discussing can usually be), it's about creating a new one using a
totally different design for an "high REST service" ("internet scale REST" to
use Daniel's words) using the ROA approach (since REST, like OOP, is a design
criteria and ROA is an actual architectural design), just to say: since REST is
protocol agnostic, we're inquiring alternate transport protocols as an addition
to HTTP(S) too, with the side goal/benefit of making MediaWiki a fully
programmable (not-only-web) service.
[...]

Uhm, are you certain that you're actually talking about the original "internet scale REST" I talked about, not the modern buzzword REST.

Because the Kickoff page looks to be in complete conflict with that assertion.

The page links to references about buzzword REST and has piles and references to CRUD (PUT, DELETE, etc...) when CRUD has nothing to do with the original REST. The author of that REST even explained explicitly that PUT is not necessary, it can work perfectly fine with POST only.

federico wrote:

Daniel,

I understand your concerns and confusion, please consider that those notes have been taken by some participants in real time on etherpad while the discussion was going on in a group of ~10 participants with brainstorming included.

I wouldn't take that as a white paper for the proposal ;)

You can trust my word, as I'm currently the main developer of the proposal, or you can wait for the RFC to get public when we'll complete the research.

If it was just about the "buzzword" you're mentioning (which BTW is called "low REST" or REST-RPC hybrid, the definition and the differences with "high REST" are perfectly depicted in "RESTful Web Services" by Leonard Richardson and Sam Ruby, pp 68-69, but there are many other interesting sources), we'd have gone for a simple wrapper around the current API.

Instead we're evaluating a real ROA (which is a real world architecture, not another marketing term) approach and protocol agnosticism which, to be perfectly executed, require us to write a new entry-point.

Brad, Ori has kindly loaned me "RESTful Web Services", and I'm about halfway through. I have to say that it looks like there are some cool new (well, "new" in the sense that IE8 is new) HTML5-y features that we can use to sidestep the issues you've brought up. Admittedly that's not a perfect solution, and it might not work out for Wikipedias (since we can't very well deploy changes to Wikipedias that would break e.g. IE6), but it's a start. I'll keep reading, and possibly play with some little changes, but of course this will be a longer process than a few patches, so bear with us :)

And while having two sets of URIs for a single API might be....less than good, we could certainly do that until things are more stable.

(In reply to comment #21)

Brad, Ori has kindly loaned me "RESTful Web Services", and I'm about halfway
through. I have to say that it looks like there are some cool new (well,
"new"
in the sense that IE8 is new) HTML5-y features that we can use to sidestep
the
issues you've brought up. Admittedly that's not a perfect solution, and it
might not work out for Wikipedias (since we can't very well deploy changes to
Wikipedias that would break e.g. IE6), but it's a start.

If these new features are essentially eye candy, I don't think it is a good idea to use such new technology just for eye candy.

Krinkle subscribed.

In addition to the Action API, and the external RESTBase service, there's now also a REST API within MediaWIki core.

https://www.mediawiki.org/wiki/API:REST_API