Page MenuHomePhabricator

Add more advanced time/date parsing functionality
Open, MediumPublicFeature

Description

Textlib.py (https://git.wikimedia.org/blob/pywikibot%2Fcore.git/HEAD/pywikibot%2Ftextlib.py) is used to parse wikitext and then output machine readable objects. It currently contains some functionality to parse time and date.

See https://www.wikidata.org/wiki/User:Underlying_lk/harvest_template_old.py for more expanded functionality.


Version: core-(2.0)
Severity: enhancement

Details

Reference
bz64502

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:08 AM
bzimport set Reference to bz64502.
bzimport added a subscriber: Unknown Object (????).
jayvdb renamed this task from Add more advanced time/date parsing functionality to textlib.py to Add more advanced time/date parsing functionality.Dec 3 2014, 9:15 AM
jayvdb set Security to None.
jayvdb removed a subscriber: Unknown Object (????).

I'm not convinced that textlib.py is the best spot for this. textlib parsing is timestamps of edits, which occur in a site specific timezone, have a very strict parse-able format, and occur in a year after 2001, which means it can be based on the datetime datatype, doesnt need to worry about other calendar systems, etc.

On the other hand, data in wiki page templates is very messy, needs to be WbTime (not datetime).

I'd split the code into three methods (which can be placed anywhere)

  1. reduce template wikitext value to a normal text string (remove wiki markup) - probably textlib.py,
  2. convert string to a date class using dateutil (https://labix.org/python-dateutil) - datetime, or class with better date support like https://pypi.python.org/pypi/jdcal/) - probably date.py,
  3. convert datetime to WbTime (a @ classmethod)

For the Wikidata part it might be better to not do it ourselves, but outsource it, see T112140

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:23 PM