Page MenuHomePhabricator

DBQ-144 Query to create translated interlanguage links
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-144.
Summary: Query to create translated interlanguage links
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Hoo man <hoo@online.de>


From: Stavros Giannou <afaton@yahoo.com>

Date: Fri, 01 Jul 2011 19:11:57

I want to be able to create a list of interlanguage links of Greek articles into all other available languages in csv or tab delimited format. I.e. 1st column is Greek article, 2nd column is article in English (if available), 3rd column article in French (if available), something similar to this really:

https://jira.toolserver.org/browse/DBQ-116
http://toolserver.org/~hoo/dbq/dbq-116.txt

But I'm not sure how to modify the query below.

So far I have been able to create mappings to one language only at a time, i.e:

SELECT
page.*,
langlinks.*
FROM
page
JOIN langlinks
ON langlinks.ll_from = page_id
WHERE
langlinks.ll_lang = "[language id]"


Version: unspecified
Severity: major

Details

Reference
bz59402

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:28 AM
bzimport set Reference to bz59402.

From: Hoo man <hoo@online.de>

Date: Tue, 16 Aug 2011 17:35:31

Sorry, that one got lost (I was on vacation...) ![][1] Do you still need that data?

The only way to get this would either be the following format:

greek_article	language_the_link_points_to	target_page_title
Μπάλορ	en	Balor

Or you give me a list of the langs you want/ need:

 
 greek_article	fr_name	en_name	de_name	and so on

[1]: https://jira.toolserver.org/images/icons/emoticons/sad.gif

From: Hoo man <hoo@online.de>

Date: Mon, 24 Oct 2011 20:20:23

Closed, no response in over two months, feel free to reopen if the data is still needed ![][1]

[1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

From: Stavros Giannou <afaton@yahoo.com>

Date: Tue, 28 Feb 2012 16:05:07

Hi,

Thank you for your feedback. I am interested in all languages containing langlinks from Greek. Do you want me to make a list with their code names?


From: Hoo man <hoo@online.de>

Date: Tue, 28 Feb 2012 18:11:51

This one contains a list of all langcodes which got interwiki links on elwiki.

SQL:

SELECT ll_lang FROM langlinks GROUP BY ll_lang;

Result:
http://toolserver.org/~hoo/dbq/dbq-144.txt


From: Stavros Giannou <afaton@yahoo.com>

Date: Tue, 28 Feb 2012 19:21:27

Many thanks, but what is the actual query to produce the translations from Greek to all other languages?


From: Hoo man <hoo@online.de>

Date: Tue, 28 Feb 2012 19:34:44

The following SQL query returns the page title (on elwiki), the lang the link goes to and the title on the target wiki for namespace 0 (articles).

SELECT page_title, ll_lang, ll_title FROM langlinks INNER JOIN page ON page_id = ll_from WHERE page_namespace = 0;

Which produces an output like:

+------------+---------+-----------------------+
| page_title | ll_lang | ll_title              |
+------------+---------+-----------------------+
| %          | da      | Procenttegn           |
| %          | en      | Percent sign          |
| %          | fr      | Pour cent             |
| %          | ja      | パーセント記号 |
| %          | ko      | %                     |
+------------+---------+-----------------------+

From: Stavros Giannou <afaton@yahoo.com>

Date: Tue, 28 Feb 2012 20:00:03

Thank you but what I need is something like this (in one row all languages including page title, each language in a separate column, if there is no langling for a specific language for a specific page_title, then that cell would be empty):

page_title (Greek) | Lang 1 translation (langlink) | Lang 2 translation (langlink) | Lang 3 translation ... etc

or

 
+------------+---------+-----------------------+--------------+
| page_title | lang 1  | Lang 2                |Lang 3        |
+------------+---------+-----------------------+--------------+
| %          |Pour cent| Procenttegn           |Percent sign  |
---------------------------------------------------------------
| Μέταλλα    |Métal    | Metalle               |Metal         |
---------------------------------------------------------------

From: Hoo man <hoo@online.de>

Date: Tue, 28 Feb 2012 21:14:18

The following query has to be altered where the /* ... */ are (a row for every language you want to have has to be added):

SELECT page_title,
deiw.ll_title as de_title
/* ... */
FROM (SELECT ll_from FROM langlinks GROUP BY ll_from) as article INNER JOIN page ON page_id = article.ll_from 
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'de') as deiw ON deiw.ll_from = article.ll_from
/* ... */
WHERE page_namespace = 0;

Well, it's by far not a perfect solution and I hardly doubt it will be fast enough to work, but the best I can think of at the moment for your specific data format.


From: Stavros Giannou <afaton@yahoo.com>

Date: Wed, 29 Feb 2012 18:20:56

Thanks, tried the query with 3 langs and it worked, then tried to use all langs as indicated in the list of langcodes you gave above, but I get errors like the one below. Is it because some of the langcodes are non-existent in Greek wiki (I may be wrong)? In that case, how do I get the existent langcodes?

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-classical_title, zh-cniw.ll_title as zh-cn_title, zh-min-naniw.ll_title as zh-m' at line 7

Code snippet

SELECT page_title,
yiiw.ll_title as yi_title,
yoiw.ll_title as yo_title,
zaiw.ll_title as za_title,
zeaiw.ll_title as zea_title,
zhiw.ll_title as zh_title,
zh-classicaliw.ll_title as zh-classical_title,
zh-cniw.ll_title as zh-cn_title,
zh-min-naniw.ll_title as zh-min-nan_title,
zh-twiw.ll_title as zh-tw_title,
zh-yueiw.ll_title as zh-yue_title,
zuiw.ll_title as zu_title
FROM (SELECT ll_from FROM langlinks GROUP BY ll_from) as article INNER JOIN page ON page_id = article.ll_from 
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'yi') as yiiw ON yiiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'yo') as yoiw ON yoiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'za') as zaiw ON zaiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zea') as zeaiw ON zeaiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh') as zhiw ON zhiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-classical') as zh-classicaliw ON zh-classicaliw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-cn') as zh-cniw ON zh-cniw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-min-nan') as zh-min-naniw ON zh-min-naniw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-tw') as zh-twiw ON zh-twiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-yue') as zh-yueiw ON zh-yueiw.ll_from = article.ll_from
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zu') as zuiw ON zuiw.ll_from = article.ll_from
WHERE page_namespace = 0;

From: Hoo man <hoo@online.de>

Date: Wed, 29 Feb 2012 22:21:46

The problems are the '-' in the table names (names of the sub query returns). For example the following wont work:

SELECT page_title,
zh-classicaliw.ll_title as zh-classical_title
FROM (SELECT ll_from FROM langlinks GROUP BY ll_from) as article INNER JOIN page ON page_id = article.ll_from 
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-classical') as zh-classicaliw ON zh-classicaliw.ll_from = article.ll_from
WHERE page_namespace = 0;

But this one will:

SELECT page_title,
zhclassicaliw.ll_title as zhclassical_title
FROM (SELECT ll_from FROM langlinks GROUP BY ll_from) as article INNER JOIN page ON page_id = article.ll_from 
LEFT JOIN (SELECT ll_title, ll_from FROM langlinks WHERE ll_lang = 'zh-classical') as zhclassicaliw ON zhclassicaliw.ll_from = article.ll_from
WHERE page_namespace = 0;

I hope it's clear what I mean ![][1]

[1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

From: Stavros Giannou <afaton@yahoo.com>

Date: Wed, 29 Feb 2012 22:40:01

Ah yes, thank you very much!


From: Stavros Giannou <afaton@yahoo.com>

Date: Wed, 29 Feb 2012 23:07:00

Ha, just tried with 208 langcodes and it gives me:

#1473 - Too high level of nesting for select

MySQL 5.0.88 on Wamp
phpMyAdmin: 3.3.9

And with MySQL 5.5:

#1116 - Too many tables; MySQL can only use 61 tables in a join

Here they suggest using GROUP_CONCAT as a way round it
http://stackoverflow.com/questions/649802/how-to-pivot-a-mysql-entity-attribute-value-schema

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: hoo@online.de