Page MenuHomePhabricator

Create Selenium tests for "Download as PDF" feature
Closed, DeclinedPublic

Description

We're not sure we could test directly for https://bugzilla.wikimedia.org/show_bug.cgi?id=45861 but PDF is an area that breaks often and is a pain to fix after the fact, so a browser test would be good to have.


Version: unspecified
Severity: normal
Whiteboard: gci2013 https://www.mediawiki.org/wiki/Google_Code-In#Candidate_tasks
URL: https://www.google-melange.com/gci/task/view/google/gci2013/6466458871660544

Details

Reference
bz46224

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:23 AM
bzimport set Reference to bz46224.

Volunteer karim_r started working on this.

Dave McNulla suggested idea for implementation: generate pdf from a page, inspect the pdf manually and add it to version control. The next time when the test runs, compare the new and the old pdf files and fail the test if the new one is not the same as the old one.

Related URL: https://gerrit.wikimedia.org/r/60890 (Gerrit Change I195a590844ebd1eee779cd2f3486c9e63035110d)

Related URL: https://gerrit.wikimedia.org/r/60898 (Gerrit Change Idec21c492871f4a55aaa0f1568971ccc7174cf1d)

Related URL: https://gerrit.wikimedia.org/r/61257 (Gerrit Change I4b3a96fd7eefb85a1ba402b22c8b6ad922360550)

How to test this manually:

  • go to http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page
  • on the left hand side expand "Print/export" section (if not already expanded)
  • click "Download as PDF" link
  • page with "Download the file" link will open (eventually)
  • download the pdf file and check if it has the same data as the web page

How to test this with a script:

  • create a simple page
  • do the above steps with Selenium, but visit the simple page instead of home page
  • since Selenium can not inspect pdf files, find a ruby pdf library[1] that can
  • when the file is downloaded, use the pdf library to inspect the pdf file

A simple page could contain:

  • nothing
  • some text
  • title
  • title and some text
  • image
  • title and image
  • image and text
  • title, text and image

...

The test should be executed for every simple page mentioned above (and maybe a few other pages).

1: https://www.ruby-toolbox.com/categories/pdf_generation

Another solution, maybe a simpler one, would be to do the above steps manually, inspect the pdf files manually and save them to the test repository.

Then do the above steps with a script, but instead of using a pdf library to inspect the files, use a diff tool to compare if the generated files are the same as the ones that are already saved in the repository.

Change 98160 had a related patch set uploaded by Mayankmadan:
Added a test for downloading pdf from a random page

https://gerrit.wikimedia.org/r/98160

I have just tested it with firefox, pdf file that is downloaded is automatically opened by firefox and rendered as html page, so it can be inspected with selenium, there is no need for a separate pdf parsing library.

It is not the same for chrome, it uses pdf plugin.

A few simple tests exist now (check if the pdf file downloads at all, if text, title and image are the same on the wiki page and in the pdf file). We need more complex tests, that check more page elements: ordered and unordered lists, links and similar elements.

Resolving as wontfix, I have no plans on working on this. Please reopen if you plan to work on it.