Page MenuHomePhabricator

Google search with [[Google:search term]]
Closed, DeclinedPublic

Description

Author: dunc_harris

Description:
A google search is possible using [[Google:searchterm]]

  1. [[Google:Search term]] searches for Search_term not search+term i.e. the space is

replaced with an underscore.

  1. the link is dark blue when as an exty it should be light blue
  2. open searches should be possible, e.g. Yahoo searches, or open searches (I'm sure

they're possible somewhere)


Version: unspecified
Severity: enhancement

Details

Reference
bz707

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 6:58 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz707.
bzimport added a subscriber: Unknown Object (MLST).

rowan.collins wrote:

The [[Google:foo]] is actually a clever use of the "interwiki" system (see
http://meta.wikimedia.org/wiki/Help:Interwiki_linking and
http://meta.wikimedia.org/wiki/Interwiki_map); as you can see from the second of
those links, it operates by simply adding the search term provided to the end of
"http://www.google.com/search?q="

  1. This is an unfortunate side-effect of the way the interwiki links are

implemented: because they are created as 'normal' Title objects, they have to
conform to the rules of MediaWiki article titles: no "+", " " becomes "_", etc.
This differs from some other wiki engines, which use a simple substitution of
whatever's in the link, leaving it to the destination site to decide what
validity/conversion rules to apply. We could solve this by either: not creating
a Title object for interwiki links (might lead to complex special case code all
over the place); or special-casing Interwiki Title objects so that they don't
have to adhere to internal validity/canonicalisation constraints (would mean
rearranging Title creation code to spot interwiki prefixes before even checking
for "validity"; there may also be checks for validity outside Title.php to worry
about).

  1. It shows up as light-blue to me; it doesn't get a little ext.link icon

because it's an "interwiki link" rather than an "external link"; to be honest,
I'm not sure that's a very sensible distinction to make, but it's definitely not
showing up as an *internal* link, anyway.

  1. I'm not sure what you mean by "open searches" here; as for searching other

search engines, that's just a case of adding a new Intermap entry such as
"Yahoo:" with the appropriate URL to add the terms to; not much use unless/until
(1) is fixed, though, because with neither " " nor "+" available, you can only
search for single words.

dunc_harris wrote:

I've found a way to get round this. You need to use nowiki tags within the to double brackets to
allow a + sign, e.g.

[[Google:foo+bar]]

I still think there are issues with this though and it needs some attention paid to trying to
develop it further.

dunc_harris wrote:

Sorry, I meant [[Google:foo<nowiki>+</nowiki>bar]]

rowan.collins wrote:

Hmmm. I wouldn't have expected that to work, to be honest. And on the test
server, it doesn't - in fact, it looks like there's a bug in the development
version of the software. As you can see on
http://test.wikipedia.org/wiki/Bug707, the result is kind of ugly - those
random-looking strings are placeholders inserted by the software instead of the
"nowiki" section. This has bearings on bug 337, since we should decide how such
links *should* behave; in short, don't bet on this behaviour remaining available
forever.

neil wrote:

Patch to make interwiki links use spaces instead of underscore

Ofcourse the best solution would be to use the literal text from the user
supplied link but that is a bit above my understanding of the source at the
moment.

But how about using the $mTextform variable of the title in case of interwiki
links, that is a minor change. This the version of the Title with the spaces
instead of the underscores. That way you can let the target of the interwiki
link take care of the parsing. This will be no problem if it is an other
mediawiki. And it will fix the Google example.

The only problem is if you actualy *need* the underscores because this way the
underscores are changed into spaces.

attachment interwiki.patch ignored as obsolete

Underscores are necessary for links to UseMod-based wikis with free links,
for instance. If this is going to be done, it has to be a per-interwiki option.
One might add a field to the interwiki table listing it, or perhaps some
funky alternate replacement: currently the interwiki URLs contain $1 as a
placeholder for the link with underscores; another placeholder for spaces
perhaps?

gangleri wrote:

Dear friends,
I am happy about the enthusiasm to improuve [[Google:search term]]. During
last years I searched a lot and want to point out some other aspects:

  • "search term" could be a string of high complexity with special

characters used in iso8859-1, utf-8, Unicode;

  • simple example: [[Google:Español]] translates to

http://www.google.com/search?q=Espa%F1ol and not to
http://www.google.com/search?q=Espa%C3%B1ol as used by Google;

  • as much more precise a serch, is as better / more usefull it is; what I

want to say is the necessity of support for parameters '''without'''
manipulation in a format as [[Google:search term|parameters]] where it
should be decided if parameters starts with "?" (or "&" if some parameters
are inserted already by default);

  • Example [[Google:Pug]] works but

[[Google:Pug&num=100&as_sitesearch=en.wikipedia.org]] is (at the moment)
translated to somthing useless http://www.google.com/search?q=Pug%26num%
3D100%26as_sitesearch%3Den.wikipedia.org ;

I would spend much time on testing, if required also on finding out how it
should work. Please do not hesitate to contact me if you think I should do
a part of the (specidication) work.

Regards Reinhardt

If you need something really fancy, just copy and paste the URL -- it's not that hard, and it
keeps from bloating the syntax with hard to maintain extra variants.

gangleri wrote:

I would not call it fancy. In a project group every person has some skills. One knows how
to write articles, another knows where and how to search material and others may have the
painfull time to make the wikisation of hundered of articles, to reformat them, add links
or url's. And this group here knows how to develop a good software.
The idea below the maintenance list where this issue also arrised, is not to copy links or
to teach others how to search, it is just providing usefull links with high potential.

Regards Reinhardt

rowan.collins wrote:

(In reply to comment #8)

If you need something really fancy, just copy and paste the URL -- it's not

that hard, and it

keeps from bloating the syntax with hard to maintain extra variants.

I believe Reinhardt's intention was that this could then be used with templates,
as an alternative to substituting the search term into the URL with an
as-yet-nonexistent {{URLESCAPE|arbitrary text}} type syntax. However, I would
tend to agree that creating an extra piece of syntax for this might be unwise.

I was thinking maybe we could just leave "&" unescaped, but then something like
[[Google:Bill & Ben]] would break. And now I think about it, you can build just
about all the options for a search engine into the query itself, so if we had
special treatment for [some] interwiki links, you could just construct things
like [[Google:some terms "a phrase" site:en.wikipedia.org]] and the link would
become
http://www.google.com/search?q=some%20terms%20%22a%20phrase%22%20site:en.wikipedia.org
(or "+" instead of "%20", makes no odds as far as Google is concerned), which
does the exact same search as
http://www.google.com/search?as_q=some+terms&as_epq=a+phrase&as_sitesearch=en.wikipedia.org

gangleri wrote:

(In reply to comment #10)
additional notes to:
you could just construct things
like [[Google:some terms "a phrase" site:en.wikipedia.org]] and the link
would
become
http://www.google.com/search?q=some%20terms%20%22a%20phrase%22%
20site:en.wikipedia.org
(or "+" instead of "%20", makes no odds as far as Google is concerned),
which
does the exact same search as
http://www.google.com/search?

as_q=some+terms&as_epq=a+phrase&as_sitesearch=en.wikipedia.org

I would be happy having the parameters. It will work for all English
characters. Unfortunatelly some character transliteration (for non english
characters) suitable to meet those from major search engins is required.
Example:

[[Gerhard Schröder]] translates to
http://en.wikipedia.org/wiki/Gerhard_Schr%F6der

Please note the different character code used for "ö" in google:
http://www.google.com/search?num=100&q=%22Gerhard+Schr%C3%B6der%22+site%
3Aen.wikipedia.org
If ''we'' use the same character translation as for the url's in
en.wikipedia the link will fail:
http://www.google.com/search?num=100&q=%22Gerhard+Schr%F6der%22+site%
3Aen.wikipedia.org

Regards Reinhardt

gangleri wrote:

(In reply to comment #10)
What do you think about the following solution:

[[Google:some terms "a phrase" site:xx.yy]] xx a subdomain (as "fr") and yy a
domain (as wikipedia.org) would simply open the fallback search page when
internal search is disabeled searching for >>some terms "a phrase"<< with the
site:parameter as specified inside [[ ]].

This is still and improuvement because [[Google:foo]] would always search
only for pages in the actual project.

rowan.collins wrote:

(In reply to comment #12)

[[Google:some terms "a phrase" site:xx.yy]] xx a subdomain (as "fr") and yy a
domain (as wikipedia.org) would simply open the fallback search page when
internal search is disabeled searching for >>some terms "a phrase"<< with the
site:parameter as specified inside [[ ]].

I think you may have misunderstood my earlier comments: there is no need for us
to do anything special with the "site:foo", that is something you can type into
the Search box on Google, and it will work. I was just demonstrating that we
don't need to be able to stuff things into extra parts of the Google URL, we can
just use Google's (and, I believe, most other search engine's) ability to have
all the extra information specified in the search query itself.

This is still and improuvement because [[Google:foo]] would always search
only for pages in the actual project.

Are you saying that [[Google:foo]] should act as though it was actually a search
for "foo site:en.wikipedia.org", and not just "foo"? I'm not sure that's
generally what people will be wanting: if they're linking to a Google search,
it's probably intended to be a search of the whole of Google. We could have an
extra InterWiki prefix, say "[[Search: ... ]]", which linked to the internal
search (and when that's down, you'd be directed to a choice of Google and
Yahoo!, as normal), but that's essentially another issue.

gangleri wrote:

Dear friends!

[[en:User:Gangleri/tests/google bugzilla:707]] provides an analysis of the
differences depending if [[google:foo]] is used at a [[Latin-1]] or a [[UTF-8]]
wiki, depending if foo is only ascii, Latin-1 or UTF-8.
a) To my opinion [[google:foo]] should give the same result regardless of the wiki
b) There are examples about various parametrisations.
b1) One showing that "-" would be the most suitable in order to get an exact match.
b2) Another showing varios parametrisations made (manly as external links).

Please feel free to contact me if this is one of your areas of interest.

Best regards Reinhardt.

rowan.collins wrote:

As far as I can see, there are several problems here which need different kinds
of solution:

  1. characters that are illegal in MediaWiki titles, but not elsewhere (", +, &, etc)

--> *external* interwiki links (i.e. not inter-project ones, like "meta") should
only have to match the legal characters for a URL, not a title
--> of course, they'd still have to exclude '|' and ']', but if the string
wasn't interfered with too much, you could use '%7c' and '%5d' directly

  1. URLs with parameters, particularly useful for search engines

--> this is irrelevant if (1) is dealt with, because you could use
[[Foo:Bar?arg1=baz&arg2=quux]], which is no more complex than
[[Foo:Bar|arg1=baz|arg2=quux]] (and less so in that what would that second link
*display* as, if we reuse the '|' to mean something different? Compare
[[Foo:Bar?arg=baz|This = displayed text]] and [[Foo:Bar|arg=baz|Does this =
displayed text or another param?]])
--> similarly, if (3) is [also] dealt with, you could take advantage of the
ability offered by most search engines to specify all your parameters in one
query string (e.g. [[Google:word "a phrase" site:example.com allinurl:foo]] and
so on)

  1. how to translate spaces

--> as Brion says, some sites do require words to be seperated by underscores,
so this needs to be choosable per prefix
--> perhaps the most flexible way would be to have a field in the database
representing which character should be substituted for a space (so things like
"Google:" would have the substitution '+' or '%20', while other wikis could
retain the substitution '_'). This leaves open the possibility for yet other
representations, such as '-' (which some sites use as it is apparently more
search-engine friendly). It's also a bit more obvious to users how to use this
than a magic "$2" or whatever.

  1. character encoding

--> the simplest solution is to always use UTF-8, which is what the majority of
the Wikimedia wikis now use internally anyway
--> however, as with spaces, there may be different sites that expect their URLs
in different character encodings. If so, I think again a char_encoding field, or
maybe just use_utf8 (where 'false' would mean to use ISO 8859-1 or -15 instead)
would be more transparent than something like "$2".

Of these, (1) is actually the most complex, as it requires changes to code other
than the Title class where the interwiki links are "brought to life" (since
links with illegal characters in are just rejected by the parser right now). (3)
and (4) make the database structure a little more complex, but not much, and (4)
in particular would make things more consistent than they are now. I'm actually
tempted to break this into 2 or 3 bugs for the different issues, because I
suspect they will have to tackled seperately.

avarab wrote:

You can do this now with templates, make on with these contents:

"[http://www.google.com/search?q={{{1}}} Google Search for {{{1}}}]"

and use {{Template|searchword}}, what won't help if you're looking for "+" though.

avarab wrote:

I think this should be marked as WONTFIX, if it's to be done at all it should be
done as an extension like:

<google>term</google>

rowan.collins wrote:

(In reply to comment #17)

I think this should be marked as WONTFIX, if it's to be done at all it should be
done as an extension like:

<google>term</google>

I disagree - interwiki links (in the broad sense, as opp. "inter-project" ones)
are simply shortcuts for linking to often-referenced external sites which have
easy to guess URLs. [I know it was originally intended specifically to link
wikis, but why discriminate? For sites like Wikipedia, that distinction is
generally irrelevant] Search engines have such URLs, and are very often
referenced. The assumption in the MediaWiki code that targets of interwiki links
will behave like MediaWiki page titles is a bad one anyway, and a search query
is just on the extreme end of the variation.

Besides which, it's a link, so marking it up as a link makes more sense than
anything else. And as soon as Google is possible, all sorts of other interwiki
prefixes can be defined similarly with basically no extra effort (unlike with an
extension, which would require recoding and installation of multiple variants).

avarab wrote:

A valid point, in fact we already use [[cache: to link to the google cache.

gangleri wrote:

(In reply to comment #16)

You can do this now with templates, make on with these contents:
"[http://www.google.com/search?q={{{1}}} Google Search for {{{1}}}]"
and use {{Template|searchword}}, what won't help if you're looking for "+" though.

This works only with ONE word. Tray the template with "Ævar Arnfjörð Bjarmason"
and you will have as result http://www.google.com/search?q=Ævar Arnfjörð
Bjarmason . I assume you would like to search for
http://www.google.com/search?q=%C3%86var%2BArnfj%C3%B6r%C3%B0%2BBjarmason.
Please note that

  1. this is not http://www.google.com/search?q=Ævar+Arnfjörð+Bjarmason
  2. translation deiifers between [[Latin-1]] and [[UTF-8]] wikis

Regards Reinhardt

rowan.collins wrote:

*** Bug 3014 has been marked as a duplicate of this bug. ***

avarab wrote:

severity => enhancement

see bug 839, which is resolved not: {{urlencode}} can be used to encode text as
of r14273.

psychonaut wrote:

Couldn't this feature be implemented using templates rather than hard-coding the
MediaWiki source?

Yes, why not have {{google|search phrase}}?

muke wrote:

(In reply to comment #25)

Yes, why not have {{google|search phrase}}?

That's only possible now, with a template using {{urlencode}}.
It didn't used to be possible at all (cf. comment #20).

You still can't do [[Google:search term]], but you can now do
[[Google:search+term]] (which was not a possible workaround earlier,
according to comment #1) so this bug is correctly now only an
enhancement, unless somebody develops a burning need to make
interwiki links to a site where the space " " cannot be substituted
with either the underscore "_" or the plus "+".

robchur wrote:

We have templates, and we have {{urlencode}}. This can be implied to be fixed,
surely.

{{urlencode}} works... as long as you don't put any non-ASCII characters in the
query: compare [[Google:{{urlencode:moment magnitude}}]] with
[[Google:{{urlencode:với moment magnitude}}]].

ayg wrote:

I concur with those above who consider this is adequately addressed through means like templates without having to stretch interwiki links' meanings.