Page MenuHomePhabricator

Lucene Search update script fails while downloading DTD
Closed, DeclinedPublic

Description

Author: jeen.broekstra

Description:
I am experiencing a problem with the Lucene Search (2.1) update script. This is a major issue as it means my search index does not get updated at all.

The environment is a Linux 2.6.x system running Java 1.6.0_14-b08, MySQL 5.0.45, mediawiki 1.13.2.

I get the following error message:

java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
at org.wikimedia.lsearch.oai.OAIParser.parse(OAIParser.java:64)
at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:64)
at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:44)
at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:191)

555 [main] WARN org.wikimedia.lsearch.oai.IncrementalUpdater - Retry later: error while processing update for wikidb : Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

When retrieving this same DTD using wget on the same machine, however, it succeeds downloading without a problem.


Version: unspecified
Severity: major
OS: Linux

Details

Reference
bz20173

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:56 PM
bzimport set Reference to bz20173.
bzimport added a subscriber: Unknown Object (MLST).

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]

lsearchd has reached its end of life and will not be improved further, marking this WONTFIX as a result.

jeen.broekstra wrote:

So, what's the alternative then?

If you're looking for a Lucene-based search for MediaWiki, I suggest taking a look at the new CirrusSearch extension we're working on. It's backed by Elasticsearch, rather than our home-grown lsearchd.

jeen.broekstra wrote:

Thanks, will do.(In reply to comment #5)

If you're looking for a Lucene-based search for MediaWiki, I suggest taking a
look at the new CirrusSearch extension we're working on. It's backed by
Elasticsearch, rather than our home-grown lsearchd.

Thanks, will do.