Page MenuHomePhabricator

Preserve `data-parsoid` attribute on internal copy-paste so that Parsoid preserves e.g. syntax layout
Closed, ResolvedPublic

Description

Intention:
Copy a table within an article to use as a template for a similar table with different data

Steps to Reproduce:

  1. Paste the following table into a page using the wikitext editor
{| class="wikitable"
|+ Variables
|-
!colspan=2|Type||Mechanical variable||Analogous electrical variable
|-
|rowspan=2|Power conjugate pair||Effort variable||Force||Voltage
|-
|Flow variable||Velocity||Current
|-
|rowspan=2|Hamiltonian variables||Effort Hamiltonian||Momentum||Flux linkage
|-
|Flow Hamiltonian||Displacement||Charge
|}
  1. Save the page and open VE
  2. Copy and paste the table to a new location. It is best to include in the selection a small amount of text before and after the table to guarantee that the whole table is selected.
  3. Save the page
  4. Examining the copy in the wikitext editor

Actual Results:
The copy of the table looks like this:

{| class="wikitable"
|+
<nowiki> </nowiki>Variables

! colspan="2" |
Type
!
Mechanical variable
!
Analogous electrical variable
|-
| rowspan="2" |Power conjugate pair
|Effort variable
|Force
|Voltage
|-
|Flow variable
|Velocity
|Current
|-
| rowspan="2" |Hamiltonian variables
|Effort Hamiltonian
|Momentum
|Flux linkage
|-
|Flow Hamiltonian
|Displacement
|Charge
|}

Note the refactoring of one table line per row to one table cell per row. Note also the totally spurious nowiki tags in the table caption line.

Expected Results:
I expect copy paste to create an exact clone of the thing being copied. No refactoring or parsing of the selected code should take place at all.

This would not be so big an issue if there was zero change to the rendered page, but this is not the case; note the taller height of the cells in the title row.

Reproducible: Always


Version: unspecified
Severity: normal

Details

Reference
bz72426

Event Timeline

betalbs

  • the pasted table does not have <nowiki> </nowiki> inserted
  • the pasted table has for the title: <p class="ve-ce-paragraphNode ve-ce-generated-wrapper ve-ce-branchNode"><span typeof="mw:Nowiki" class="ve-ce

-mwNowikiAnnotation">&nbsp;</span>Variables_pasted</p>

the original table has:
<p class="ve-ce-paragraphNode ve-ce-generated-wrapper ve-ce-branchNode">Variables</p>

  • the pasted table looks identical to the original and can be edited/modified without problems

test2

    • the pasted table does have <nowiki> </nowiki> inserted
  • the pasted table looks identical to the original and can be edited/modified without problems

Created attachment 16867
Screenshot of test case tables

Attached:

tables.png (638×582 px, 15 KB)

Created attachment 16868
Screenshot of test case tables after saving

Attached:

tables.png (638×590 px, 15 KB)

Yeah, right now Parsoid defaults to the "one cell per line" wikitext syntax for tables; maybe it should switch over to the all-cells-of-one-row-in-one-line format?

(The nowiki item is bug 70857.)

More generally, however, I don't think it's plausible to try to retain non-semnatic syntax conventions in this edge case (normally people will be modifying/moving a table or adding a new one, not copying an existing one and then for some reason expected a particular form of wikitext).

I have added some screenshots of the results I am getting from the example table. They clearly show that the tables *do not* look identical after saving, but there is no visible difference before saving. As well as being a problem that VE has tried to parse the data before pasting, it is also a problem that the rendered difference is not visible until after saving.

More generally, however, I don't think it's plausible to try to retain
non-semnatic syntax conventions in this edge case (normally people will be
modifying/moving a table or adding a new one, not copying an existing one
and then for some reason expected a particular form of wikitext).

Adding a new table has no relevance to this issue (nothing is being pasted). Moving an existing table, I would very much expect it to be retained exactly. This is a bit like cutting and pasting some text and having your word processor go over it with a spellchecker without being asked.

I don't really consider this an edge case. I frequently copy tables and other elements within an article and between articles in a step and repeat. I have often witnessed other users doing such things, it is not just me. Think about all those television episode articles for instance.

I don't agree with the change made to the description of this bug. I don't think that VE should necessarily default to all-in-one-line syntax. To me, the bug is that it does not preserve the integrity of the data being copied, whatever that syntax may be.

(In reply to Spinningspark from comment #6)

More generally, however, I don't think it's plausible to try to retain
non-semantic syntax conventions in this edge case (normally people will be
modifying/moving a table or adding a new one, not copying an existing one
and then for some reason expected a particular form of wikitext).

Adding a new table has no relevance to this issue (nothing is being pasted).

Exactly.

Moving an existing table, I would very much expect it to be retained
exactly. This is a bit like cutting and pasting some text and having your
word processor go over it with a spellchecker without being asked.

Cutting and pasting a table inside VE does retain the exact same HTML. It's just bug 70857 that's causing them to grow in size (and which I'm sure will be fixed in the next few days).

Note that it is *not* the case that the HTML of tables is preserved for non-VE tables being pasted into VE, as the tables have to be sanitised because MediaWiki heavily restricts what you can do to a table.

I don't really consider this an edge case. I frequently copy tables and
other elements within an article and between articles in a step and repeat.
I have often witnessed other users doing such things, it is not just me.
Think about all those television episode articles for instance.

I try not so do to. :-) But aren't almost all of these tables you're talking about actually templates?

(In reply to Spinningspark from comment #7)

I don't agree with the change made to the description of this bug. I don't
think that VE should necessarily default to all-in-one-line syntax. To me,
the bug is that it does not preserve the integrity of the data being copied,
whatever that syntax may be.

Sorry, Bugzilla doesn't handle edit conflicts well. However, this isn't VE but Parsoid making this "change", so I'll leave it for the Parsoid team to weigh in on the broader point.

But I have absolutely not raised a bug asking for all-in-one-line syntax to be the default. Why are you insisting on changing the description to that? That is another issue entirely.

(In reply to Spinningspark from comment #10)

But I have absolutely not raised a bug asking for all-in-one-line syntax to
be the default. Why are you insisting on changing the description to that?
That is another issue entirely.

I didn't insist; I changed it only once, then you edit conflicted whilst I was responding and caused it to be reverted and de-reverted.

To summarize the discussion from IRC, we figured out that we had a gap in our collective cut-paste handling, and found a solution to preserve syntax on cut-paste that should work.

  • VE preserves the data-parsoid attribute on paste which Parsoid can then use to preserve the syntax. Parsoid should be prepared to handle mangled data-parsoid contents in the generic paste scenario.
  • Once Parsoid moves the data-parsoid attribute out of the DOM into metadata storage and replaces them with element ids, on paste. VE preserves the element id of the original element. At this point, we think that solution should work even if this introduces duplicate element ids into the DOM. To be verified when we get to that point.

Per discussion with Parsoid team, moving this back to VE side and re-titling.

Preserving data-parsoid is fine, but the table issue is caused by wrapper paragraphs being converted to regular paragraphs. We should possibly not throw away wrapper properties but that needs to be investigated.

gerritadmin wrote:

Change 173805 had a related patch set uploaded by Esanders:
Remove data-parsoid removal hack

https://gerrit.wikimedia.org/r/173805

gerritadmin wrote:

Change 173807 had a related patch set uploaded by Esanders:
Create 'preserveGenerated' mode for cloneElements and use in copy

https://gerrit.wikimedia.org/r/173807

(In reply to ssastry from comment #12)

To summarize the discussion from IRC, we figured out that we had a gap in
our collective cut-paste handling, and found a solution to preserve syntax
on cut-paste that should work.

  • VE preserves the data-parsoid attribute on paste which Parsoid can then

use to preserve the syntax. Parsoid should be prepared to handle mangled
data-parsoid contents in the generic paste scenario.

Subbu, is Parsoid prepared for this yet? Ed has submitted a commit that will cause VE to preserve data-parsoid on paste; can I merge this safely?

Parsoid can handle the preserved data-parsoid .. but, i haven't verified that nothing breaks if data-parsoid is mangled badly. I think we handle it, but we should verify that with some testing. I'll try to do some testing (of mangled data-parsoid contents) tomorrow .. but I am about to head out on a longish vacation. I'll poke marcoil otherwise. If you or Ed can do some basic sanity testing, that would be helpful as well.

(In reply to ssastry from comment #18)

Parsoid can handle the preserved data-parsoid .. but, i haven't verified
that nothing breaks if data-parsoid is mangled badly. I think we handle it,
but we should verify that with some testing. I'll try to do some testing (of
mangled data-parsoid contents) tomorrow .. but I am about to head out on a
longish vacation. I'll poke marcoil otherwise. If you or Ed can do some
basic sanity testing, that would be helpful as well.

What does "mangled badly" mean in this context? VE would only relocate data-parsoid, never modify it (although I suppose relocation out of context could count as mangled?)

This is what edsanders had said: "also you should be aware that the user agent may randomly mangle attributes - so if Parsoid doesn't understand the data-parsoid attribute it should fail gracefully".

Roan isn't convinced this could happen as above. Also, based on an IRC chat, by relocation I think he meant that data-parsoid could get cloned because of copy-paste. That should be fine since that just copies over any formatting and idiosyncrasies of the copied content. As far as I can tell, Parsoid should be able to handle this just fine.

Ed / Roan: Could you clarify? We don't want to overengineer in the serializer for a non-existent use case.

Jdforrester-WMF renamed this task from VisualEditor: Preserve `data-parsoid` attribute on internal copy-paste so that Parsoid preserves e.g. syntax layout to Preserve `data-parsoid` attribute on internal copy-paste so that Parsoid preserves e.g. syntax layout.Nov 24 2014, 1:48 AM
Jdforrester-WMF assigned this task to Esanders.
Jdforrester-WMF set Security to None.
Jdforrester-WMF moved this task from External and Administrivia to Blocked on the VisualEditor board.

Change 173805 merged by jenkins-bot:
Remove data-parsoid removal hack

https://gerrit.wikimedia.org/r/173805

Change 173807 merged by jenkins-bot:
Create 'preserveGenerated' mode for cloneElements and use in copy

https://gerrit.wikimedia.org/r/173807