Page MenuHomePhabricator

Annotation tool that uses Wikidata concepts to annotate statements from books
Open, LowPublicFeature

Description

Author: vladjohn2013

Description:
Annotation tool that extracts statements from books and feed them on Wikidata

Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

Now think about this: you are at home, reading and studying for pleasiure, or an assignment, or for your PhD thesis. When you study, you engage with the text, and you often annotate and take notes. What about a tool that would let you share important quotes and statements to Wikidata?

A statement in Wikidata is often a simple subject - predicate - object, plus a source. Many, many facts, in the books you read, can be represented in this structure. We an think of a way to share them.

A client-side browser plugin or script or app that would take some highlighted text, offering you a GUI to fix up the statement and source, and then feed it into Wikidata.

We could unveil a brand-new world of sharing and collaborating, directly from you reading.

Possible projects:

Pundit. http://www.thepund.it/ (the team is aware of Wikidata and willing to collaborate).
Annotator https://github.com/okfn/annotator,

Mentors: Aubrey is available for mentorship, paired with a technical expert.

URL:https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Annotation_tool_that_extracts_statements_from_books_and_feed_them_on_Wikidata


Version: unspecified
Severity: enhancement

Details

Reference
bz57812

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:34 AM
bzimport set Reference to bz57812.

vladjohn2013 wrote:

This proposal has been listed at https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects and we are filing a report to gather community feedback and share updates.

I am not sure I understand what kind of data would be put into Wikidata.

I agree with Lydia that the description is confusing. The title of this bug should be "Annotation tool that uses Wikidata concepts to annotate statements from books". Thepund.it almost does that, but it is using dbpedia instead of Wikidata. In my opinion text annotations shouldn't be stored directly stored into Wikidata, but on another database. IIRC, the implementation of Extension:Annotator creates another table on MW to store the annotations there.

Am I understanding right that someone would highlight something like:

"John Smith was born on February 3rd, 1952" on a web site, then natural language processing would be used to suggest Wikidata statements (e.g. https://www.wikidata.org/wiki/Property:P569)? If so, it doesn't seem like the original quote/annotation needs to be stored, though the web site should be cited as a source.

(In reply to comment #3)

I agree with Lydia that the description is confusing. The title of this bug
should be "Annotation tool that uses Wikidata concepts to annotate statements
from books". Thepund.it almost does that, but it is using dbpedia instead of
Wikidata. In my opinion text annotations shouldn't be stored directly stored
into Wikidata, but on another database. IIRC, the implementation of
Extension:Annotator creates another table on MW to store the annotations
there.

Ok that makes a lot more sense and would have my support.

Thank you Andre for changing it and David to be my official interpreter. We are actually working (slowly) with the Pundit team to implement this thing. Pundit is a third party application for annotation in the web, and as David says needs minor development to enable this kind of action. Of course, having a proper MediaWiki Extension on Wikisource would be more appropriate and useful, IMHO.

(In reply to comment #4)

Am I understanding right that someone would highlight something like:

"John Smith was born on February 3rd, 1952" on a web site, then natural
language processing would be used to suggest Wikidata statements (e.g.
https://www.wikidata.org/wiki/Property:P569)? If so, it doesn't seem like
the
original quote/annotation needs to be stored, though the web site should be
cited as a source.

Yes, Matthew, that is what I have in mind. You take statements/citations/quotes from books and texts and you make them Wikidata statements: 2 Wikidata items, a Proeprty and a source.

(In reply to comment #7)

(In reply to comment #4)

Am I understanding right that someone would highlight something like:

"John Smith was born on February 3rd, 1952" on a web site, then natural
language processing would be used to suggest Wikidata statements (e.g.
https://www.wikidata.org/wiki/Property:P569)? If so, it doesn't seem like
the
original quote/annotation needs to be stored, though the web site should be
cited as a source.

Yes, Matthew, that is what I have in mind. You take
statements/citations/quotes
from books and texts and you make them Wikidata statements: 2 Wikidata
items, a
Proeprty and a source.

Hmm this again sounds to me like this is supposed to be saved in Wikidata. This is something I want to see a well thought out plan for with examples.

(In reply to comment #8)

(In reply to comment #7)

(In reply to comment #4)

Am I understanding right that someone would highlight something like:

"John Smith was born on February 3rd, 1952" on a web site, then natural
language processing would be used to suggest Wikidata statements (e.g.
https://www.wikidata.org/wiki/Property:P569)? If so, it doesn't seem like
the
original quote/annotation needs to be stored, though the web site should be
cited as a source.

Yes, Matthew, that is what I have in mind. You take
statements/citations/quotes
from books and texts and you make them Wikidata statements: 2 Wikidata
items, a
Proeprty and a source.

Hmm this again sounds to me like this is supposed to be saved in Wikidata.
This
is something I want to see a well thought out plan for with examples.

Sorry, Lydia, my English is very bad and it's difficult for me to use the proper terms. Yes, that is supposed to be saved in Wikidata. Is it a problem?

The idea is simple: I read a statement on a text, like, "Jorge Luis Borges was born in Buenos Aires". I have a tool for highlight it, and the tool parse the sentence, process natural language and suggests me 2 WD item (Borges and Buenos Aires) and a WD property (place of birth). I also have a source, which is the webpage I read the sentence from. This statement now should go in WD: the tool would login in WD and post it with my account. Is it more clear now?

(In reply to comment #8)

Hmm this again sounds to me like this is supposed to be saved in Wikidata.
This is something I want to see a well thought out plan for with examples.

I don't think there's any actual proposed change to the Wikidata data model. It's just a way to get input/data.

Using my example, a user would highlight "John Smith was born on February 3rd, 1952". Wikidata would give the users choices for John Smith (the user would have to pick the right Q-item). Then, it would suggest P569 (date of birth). Finally, it would generate source statements referring to the website.

So it would be something like:

Q245903 P569 February 3rd, 1952 (with the value using the normal date datatype).

Source (standard Wikidata source):
   P854 http://example.com
   etc.

Then the user could confirm it's correct and post it through OAuth.

(In reply to comment #10)

I don't think there's any actual proposed change to the Wikidata data model.
It's just a way to get input/data.

And still, if you want to highlight the source text later on, then you need to store somewhere the quotation (maybe copyrighted) or the annotation reference (start pos, end pos).

I would recommend taking a look to the Pund.it server side and see how much could be reused: https://github.com/net7/pundit-server

User:Apsdehal is interested in working on this project as GSoC 2014. More info's ehre: https://www.mediawiki.org/wiki/User:Apsdehal. We are in contact with the Pundit team for helping us in the work, and they will probably be the mentors.

Final proposal is here at:https://www.mediawiki.org/wiki/Wikidata_annotation_tool
Please feel free to comment and provide feedback.

Just to confirm: we have two GSoC proposals aiming to work on this project, see https://www.mediawiki.org/wiki/Google_Summer_of_Code_2014

Questions:

  • Is the Wikidata community aware of these proposals? Have the students shared them at the Wikidata mailing list?
  • Are the Wikidata maintainers fine with these plans?
  • We have four mentors (!) available, including two from the Pund.it project, which is great. Still, I must ask: do you feel having the experience required to deal with Wikidata? Even if it is not as official co-mentor, it would be good to have someone following the current evaluation, and the eventual project if we have a candidate accepted.

Kindly assign this bug to me, as I am working on this under GSoC.

GSoC is over and this project was evaluated as PASSED by its mentors. However, looking at the reports it is unclear whether this project is in fact completed, or whether there are still open tasks pending. We are also missing the required wrap up post

https://www.mediawiki.org/wiki/Wikidata_annotation_tool/updates

Please wrap up your project properly.

Sorry for the delay. Give me a day or two, wrap up post would be there.
I had put a mail on Wikidata long time ago about the completion of the project and asking for reviews but didn't get reply from others.

The project has been completed with everything implemented in a proper way.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:13 AM