Page MenuHomePhabricator

Syntax extensions: special character, e.g. underscore, for non-breaking space ( )
Open, LowestPublicFeature

Description

Author: messias+spam

Description:
Well written articles need a lot of non-breaking spaces. I'd like to propose
either a double-tilde (~~) or an underscore (_) for this job:

e.g.:
800 km² (800 km²)
--> 800~~km² / 800_km²

12° 34.56' N (12° 34.56' N)
--> 12°34.56'N / 12°_34.56'_N

July 4th (July 4th)
--> July~~4th / July_4th

Some writers use the shorter en dash – which is then also surrounded
by spaces –...
-->Some writers use the shorter endash –which is then also surrounded by
spaces~~–...
-->Some writers use the shorter en_dash –_which is then also surrounded by
spaces_–...


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Non-breaking_space
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=13619

Details

Reference
bz3461

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:50 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz3461.
bzimport added a subscriber: Unknown Object (MLST).

codemonk wrote:

Underscore (_) would be perfect, because it is also easily accessible from any
non-english keyboard layout.

avarab wrote:

There's already a unicode character for a non-breaking space, use that.

codemonk wrote:

(In reply to comment #2)

There's already a unicode character for a non-breaking space, use that.

It is not easy to copy U+00A0, or to write " ", or to do Alt+0160 every
time nbsp is needed. The idea is to make nbsp input real easy, with one touch
preferably.

wiki.bugzilla wrote:

Underscore is already and will be needed as it is mainly meant to, e.g. for
external links. It may not be possible to divide easily between different
forms of usage. Imagine the possible wish to use a normal underscore
_and_ a non-breaking space together in one link.
Therefore the underscore should not be used for this proposal.

dg wrote:

Non-breaking spaces are not used in the URLs. I think we use URLs in two
different ways: named links, in which it is the first part between [ and a space
(no problem here, all underscores are unambiguously underscores) and in
automatically linked urls, as in http://google.com which are also automatically
detectable.

I do agree that this needs to be analyzed in detail to see where underscores are
currently used

This really would be a useful feature.

Non-breaking space is nearly uninsertable for technical laymen (and unvisible in
source code, when directly inserted). The function should be context sensitive,
so when the underscore is between two numbers the   should become a
  (for separating numbers in three digit groups). I don't see any real
problems, because all appearances of underscores are in specific syntactical
contexts, where replacement could be disabled. Only accentuation through setting
in _underscores_ is not everytime obvious, but this never shows up in real text,
only in discussions.

gangleri wrote:

(In reply to comment #6)

This really would be a useful feature.

Agree. But remember there are a lot of wikilinks where underscores are used and
their meaning is "space". This happens if parts of the url get copied and pasted
by editors.

Please remember that [[_]] is *not* a valid title but [[ ]] *is*. See
http://test.wikipedia.org/w/index.php?title=%C2%A0&redirect=no

This means that in order to be backward compatible "_" should be a substitute
for   outside [ ... ], [[ ... ]].
But what about full urls like http://this_example.com ?

Underscores in links are already now wrong (or at least not needed), so there should
be no problem to eliminate them from the database per script. After that, they also
could be used as non-breaking spaces in links (your extreme example could be sorted
out on conversion, I think. But even when there are further problems, which I weren't
aware, then a form [[The link|The_link]] should work, IMHO.)

Full URLs are no problem, too. If they are valid, they will normally render as
external links without replacement, if they are invalid, as in your example, the
underscore will be replaced with non-breaking space, this is just logical. You can't
expect invalid URLs to render as if they were valid.

michael wrote:

On a Mac, just type alt-space to enter a literal Unicode non-breaking space. Unfortunately, occasionally some editor's edit
using some browsers convert these to plain spaces throughout an article. The exceedingly-rare MSIE 5.0/Mac does this, and I
think some rare version of MSIE/Windows, but I don't know which.

ayg wrote:

Actually, the situation is much worse than that. All Gecko-based browsers do
it. See bug 6790, and https://bugzilla.mozilla.org/show_bug.cgi?id=218277
(linked from there).

(In reply to comment #2)

There's already a unicode character for a non-breaking space, use that.

It doesn't work, mediawiki replaces it by a regular space. It's very, very, very
annoying.

ah... gecko replaces it. sorry for the spam.

(In reply to comment #6)
Isn't thinsp problematic since it's breakable?

wikibugs wrote:

In my opinion, typing alt+0160 on Windows is nearly as fast as typing an underscore. On Mac, it is even easier. So my suggestion is to replace all non-breaking spaces by “ ” on submission. This would
– prevent Firefox 2 users from eating all non-breaking spaces on editing
– make it possible for typographers to actually see which sort of space is used in the source.
Note that the Gecko behaviour has been changed for Gecko 1.9, so Firefox 3 won’t have that bug anymore. But so long, Firefox 2 users will have to keep typing “ ” until Firefox 3 is out.

ui2t5v002 wrote:

As an alternative, add the non-breaking spaces automatically on page render where they are required. This is the more wiki-like functionality, where the software tries to output what you mean and you don't have to specify formatting by hand like HTML. See Bug 13619

kshwiki uses "_" (underscore) as a character denoting that there certainly is no audible pause between two words as opposed to " " (space) which may represent an audible pause, or not. There are cases where not using "_" creates gross ambiguities. Otherwise, authors are usually lazy and don't use it.

sowerk wrote:

For all the examples in the very first post, it would be better to use the thin space instead of the non-breaking spaces. In general there are only few cases, where you really need a non-breaking-space but a lot of cases, where you need the thin space.

An automation as proposed in comment 13 is extremely difficult, since the thin space is used for in lot of different cases for good typographie (copy the examples to some editor with non-mono-font):

Paragraphs: § 1
Units: 15 km, 100 %
Calculation: 3 + 2 = 5
Abbreviations in some languages: z. B. (German)

The forth example is hardly implementable for all languages, the second one even worse, just take the unit “a” (year). There might be cases, where it is better to use the unit in text, but how could be determined, if the unit is meant or the article?

Therefore I think a way to manually implement the thin space is required.

The use of the correct Unicode-Character is possible but could lead to copy-paste problems, since the thin space and non-breaking-space is not detectable in mono-font-editors. Furthermore it is difficult to write, if you don’t use a keyboard layout like NEO2.

I would propose the following, which was discussed in w:de some years ago
(discussion felt asleep back then):

Use of underscores for thin- and non-breaking-spaces within the wiki-code:

One underscore for thin-space: _ ⇒ “ ”
Two underscores for n-b-space: __ ⇒ “ ”

Numbers above 2100 could be automatically replaced with 2 500 in some languages like German (lower numbers could be years).

Underscores are hardly ever used, except for links (there a filter can easily
be implemented). In those rare remaining cases, the nowiki-tag can be used.

This would allow every user with minimal experience to use the correct typography, avoid long lists of common abbrevations as discussed in bug 13619 and ensure, that copy-paste-errors of spaces are easily detectable. As far as I know, all common browsers support both, the nbsp and the thin-space now.

What would be the problem with using ~, which is what LaTeX (for example) uses for &nsbp;? I believe use of ~ is quite rare in wikitext, so we could consider wikilint'ing it away before enabling the wikitext change. _ is going to be more problematic, since it already appears in many links; at the very least it would require us to treat _ differently in the last side of a wikilink or ext link than we do on the right hand side.

Would it be possible to also add some code for U+202F (narrow non-breaking space), i.e. a single tilde (with double tilde beeing nbsp)? Also, if a shorter wikisyntax takes another ten years, a named entity for U+202F would be nice to avoid confusion from syntax-related numbers beeing next to displayed numbers in the meantime.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM