Page MenuHomePhabricator

enhancement of importTextFile.php
Closed, InvalidPublic

Description

Author: alessandra.bilardi

Description:
enhancement of importTextFile.php

it works as mainteinace/importTextFile.php and it has got two options:

--morepages

<filename> contents more wiki pages divide by <title>Title for the new page<title>

--fileslist

<filename> contents one file path for line

if you decide to insert it, then I could create http://www.mediawiki.org/w/index.php?title=ImportTextFile.php&action=edit&redlink=1

thanks,

Alessandra Bilardi.


Version: unspecified
Severity: enhancement
URL: http://www.gbrowse.org/reports/importTextFile_php

attachment importTextFile.php ignored as obsolete

Details

Reference
bz15872

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:18 PM
bzimport set Reference to bz15872.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 5396
Patch against current trunk of the above

attachment importtext.diff ignored as obsolete

+ echo( "\nUsing title '" . $title->getPrefixedText() . "'..." );
+ if( is_object( $title ) ) {

^ This sequence will cause a fatal error in the 'echo' line, so the is_object() check will never be reached for the invalid title case.

Alas this is in the original too, but now's a chance to fix it. ;)

+$separator="<title>";
...
+ $pages = explode( $separator, $text );
+ for ($i=1,$cnt=count($pages);$i<$cnt;$i+=2) {
+ $title = $pages[$i];
+ $text = $pages[$i+1];
+ insertNewArticle( $title, $text, $comment, $flags );
+ }

Couple of things I'm not sure I like about this.

First, it means that the separator cannot appear in the page text. This could be a problem if your text might be documentation -- docs about HTML or about the wiki might legitimately want to talk about <title> tags, and they'll break here. Unlike the XML import, there's no general provision for escaping; you'd have to manually escape, and then they'd be explicitly escaped in the actual imported text as well.

Second, it looks like the idea is to do something like:

<title>First title<title>
First text
First text continues
<title>Second title<title>
Second text
Second text continues

The use of XML-looking tags here is a bit uggy, in that one might expect <title>...</title> (with the slash in the close tag), but that wouldn't work.

Additionally, I think you'll end up with an extra newline at the start of the page text, unless you do it like this:

<title>First title<title>First text
First text continues
<title>Second title<title>Second text
Second text continues

which looks odd.

My personal inclination is to recommend that if you're building batches of pages to import programmatically, it'll be almost as easy and more reliable to just generate the XML import/export format.

+ } else if (isset( $options['fileslist']) && !strstr( $text, $separator ) && !isset( $options['morepages'])) {
+ $pages = preg_split( "/\s+/", $text );
+ for ($i=0,$cnt=count($pages);$pages[$i] && $i<$cnt;$i++) {
+ $text = file_get_contents( $pages[$i]);
+ $title = titleFromFilename($pages[$i]);
+ insertNewArticle( $title, $text, $comment, $flags );

This seems to be meant to allow passing a file containing a list of filenames to import. The main problem here is that the file is split on all whitespace; thus any pathnames containing spaces will be incorrectly split.

Generally where we accept lists of target pages or files, we do the separation by newline, which won't interfere with spaces inside the target page/file name.

alessandra.bilardi wrote:

I don't understand if you want remove line
+ echo( "\nUsing title '" . $title->getPrefixedText() . "'..." );
or if you want this:
+ if( is_object( $title ) ) {
+ echo( "\nUsing title '" . $title->getPrefixedText() . "'..." );

About $separator, I change all and now user could decide <separator> from command line. And I remove 'extra newline' with command line:
+ $separator="/".$separator."\s*/";
+ $pages = preg_split( $separator, $text );

About "\s+" of files list I modify with "\n".

Script modified is here: http://gbrowse.org/reports/importTextsFile_php

Thanks,
Alessandra Bilardi.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

sumanah wrote:

+need-review to signal to developers that this patch needs reviewing. Alessandra, it'll be easier for them to review it if you attach the patch to Bugzilla per https://www.mediawiki.org/wiki/Patch#Posting_a_patch . Thanks!

Comment on attachment 5396
Patch against current trunk of the above

Patch won't apply, and issues also not addressed

TTO claimed this task.
TTO subscribed.

importTextFile.php was deleted years ago.