Use the HTTP API in round-trip testing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ssastry
	Nov 4 2013, 10:37 PM

Description

Parsoid currently tests parsing and serialiation of content. Currently, testing of the HTTP API wrappers and any issues with libraries and bugs in the endpoints relies on manual testing.

Gaps in manual testing can lead to incidents like this: https://wikitech.wikimedia.org/wiki/Incident_documentation/20131104-Parsoid

So, we need an automated test setup for our HTTP API endpoints.

Version: unspecified
Severity: normal

Details

Reference: bz56590

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 2:38 AM

• bzimport added a project: Parsoid-Tests.

• bzimport set Reference to bz56590.

ssastry created this task.Nov 4 2013, 10:37 PM

Additionally, we should consider converting our round-trip tests to use the web API as well.

I would very much like to create these tests, as long as beta labs and test2wiki would be targets.

If there is anything I can do in Jenkins, let me know.

(In reply to comment #2)

I would very much like to create these tests, as long as beta labs and
test2wiki would be targets.

As a first step we are considering tests that exercise the HTTP API in a way similar to the way VE does (including selser mode)

on each commit, on a selection of pages
in mass round-trip testing on a variety of content (see also bug 56601)

In addition it would be great to test the full Parsoid + Varnish setup as it is in production to catch caching issues. This sounds like a good fit for betalabs.

For this, we could use

the HTTP client we'll get out of moving our rt testing to use the API, and
browser testing using the VE to test the full stack from browser through VE extension to the Parsoid cluster.

What is the HTTP endpoint for this API? The explicit URI(s)?

Also, I submitted a browser test for a utf8 string, I'll refine this a bit more today. https://gerrit.wikimedia.org/r/#/c/93597/

(In reply to comment #5)

What is the HTTP endpoint for this API? The explicit URI(s)?

https://www.mediawiki.org/wiki/Parsoid#The_Parsoid_web_API

Divided into two tasks:

This bug will track changing the round-trip test client to use the HTTP API so that they test a more similar code to the actual one used by clients.
Bug #56730 tracks the (future) development of unit tests of Parsoid's HTTP API.

Change 97733 had a related patch set uploaded by Marcoil:
Bug 56590: Use the HTTP API in round-trip testing

https://gerrit.wikimedia.org/r/97733

Problems when testing this patch:

When testing with a local Parsoid instance, it compares the two wikitext versions and gives a correct number of syntactic and semantic diffs, although the number is sometimes different to the one given by direct (no HTTP API) testing. Most of the times the differences are due to a final '\n' difference when testing through HTTP. Could be due to different selser and editMode settings?

When testing from my local machine, but using httpL//parsoid.wmflabs.org/ as the HTTP API URL, html2wt always gives out the same wikitext as was gotten with TemplateRequest::one, which results in 0 diffs. Caching?

node roundtrip-test.js --parsoidURL http://parsoid.wmflabs.org/ --prefix enwiki Barack_Obama

gives this error:
ERROR: Error: request entity too large

at IncomingMessage.onData (/data/project/parsoid/js/node_modules/express/node_modules/connect/node_modules/raw-body/index.js:40:17)
at IncomingMessage.EventEmitter.emit (events.js:88:17)
at IncomingMessage._emitData (http.js:359:10)
at HTTPParser.parserOnBody [as onBody] (http.js:123:21)
at Socket.socket.ondata (http.js:1682:22)
at TCP.onread (net.js:404:27)

It doesn't pass Jenkins, see https://integration.wikimedia.org/ci/job/parsoid-roundtrip-test-check/1789/console

It seems that the roundtrip-test.js script can't connect to the parsoid HTTP API from the jenkins machine. We'll need to get a parsoid instance running on the jenkins test machine, using a random port.

Change 97733 abandoned by Marcoil:
Bug 56590: Use the HTTP API in round-trip testing

Reason:
This needs more work to work anywhere besides my local setup, a new patch will come soon.

https://gerrit.wikimedia.org/r/97733

(In reply to comment #9)
<snip>

It doesn't pass Jenkins, see

https://integration.wikimedia.org/ci/job/parsoid-roundtrip-test-check/1789/
console
It seems that the roundtrip-test.js script can't connect to the parsoid HTTP
API from the jenkins machine. We'll need to get a parsoid instance running on
the jenkins test machine, using a random port.

That build has been run on lanthanum.eqiad.wmnet which does not have direct access to internet. We can tie the job to a Jenkins slave that has internet access though. 11pm right now so I will forget about it, but a bug against Wikimedia > Continuous integration would make sure it get solved.

(In reply to comment #11)

(snip)
That build has been run on lanthanum.eqiad.wmnet which does not have direct
access to internet. We can tie the job to a Jenkins slave that has internet
access though. 11pm right now so I will forget about it, but a bug against
Wikimedia > Continuous integration would make sure it get solved.

Thanks for the info, Antoine. Even though we'll be running an independent Parsoid API server when running roundtrip-test.js, that will need internet access to fetch wikitext from the MediaWiki API. I'll open that bug when the code is ready.

Change 99348 had a related patch set uploaded by Marcoil:
Bug 56590: Use the Parsoid HTTP API in round-trip testing

https://gerrit.wikimedia.org/r/99348

(In reply to comment #12)
<snip>

Thanks for the info, Antoine. Even though we'll be running an independent
Parsoid API server when running roundtrip-test.js, that will need internet
access to fetch wikitext from the MediaWiki API. I'll open that bug when the
code is ready.

Shouldn't you mock the API access? I mean Parsoid could be injected fixtures with known articles content and use those flat files instead of relying on the remote wiki. That would save the internet HTTP get and make sure you are testing known value.

Of course, I don't think Parsoid support that kind of injection (yet?).

So far, we've been doing some basic integration testing on a single page (en:Barack_Obama) in a Jenkins job after each commit. While we could fully mock up all API accesses, for large pages like en:Barack_Obama, this would require us to effectively do a capture and replay of all API accesses for those pages (since it is not just wikitext, but also templates, extensions, images, etc. -- some 100s to 1000s of API calls).

Scott does have some basic code for capturing these accesses and dumping them to a file and using that to do a replay (without relying on internet). But, that seems like extra complexity instead of relying on internet access if we really want to do full-page integration testing on a set of N (for small values of N, say 5) pages after each commit.

So, I think the real question is whether we should be doing full page testing in a Jenkins/CI job after each commit or not. If yes, it seems simpler to rely on HTTP.

All that said, we do have a mock MW API server for testing PHP extensions (bug 45440). See parsoid/js/tests/mockAPI.js. Currently, we use this for running parserTests only.

Change 102011 had a related patch set uploaded by GWicke:
Bug 56590: Use the Parsoid HTTP API in round-trip testing

https://gerrit.wikimedia.org/r/102011

Change 99348 abandoned by GWicke:
Bug 56590: Use the Parsoid HTTP API in round-trip testing

Reason:
Moved to https://gerrit.wikimedia.org/r/#/c/102011/