abiword-dev Mailing List Archive: Some information on Generic Ch

From: Ben Martin <monkeyiq_at_users.sourceforge.net>
Date: Sat Apr 30 2011 - 06:52:04 CEST

Now that I have some useful cases working for abiword and generic change
tracking proposal [1] I thought I would write the list with some
information.

For those interested in what all this is about, there are some more lead
in style blog posts here
http://monkeyiq.blogspot.com/2011/04/change-tracking-why.html
http://monkeyiq.blogspot.com/2011/04/odf-and-generic-change-tracking.html
http://monkeyiq.blogspot.com/2011/04/odf-and-generic-change-tracking-part-ii.html

The code for is available online at:
https://github.com/monkeyiq/odf-2011-track-changes-git-svn
The test suite is available at:
https://github.com/monkeyiq/odf-2011-track-changes-tests

There are 13 test documents in attribute-change for ac:change and style
handling and one test in change-element-type which checks for handling
of a paragraph which goes from text:p to text:h and back and includes
some other style changes too, requiring interleaving ac:change
attributes across XML elements.

The consecutive changing of text:style-name uses the serialization I
proposed on the oasis collab list cited below:
http://lists.oasis-open.org/archives/office-collab/201104/msg00042.html

I also used the collapsing of text:spans into ac:change where
appropriate as mentioned by Frank Meies in his reply to my above
suggestion. This makes it in into the test suite as example642.abw and
like all other tests is stable across a abw->odt->abw->odt conversion
cycle. ie, the two odt files match and can validate with the same RNG
schema.

The code in ODe_ChangeTrackingACChange.cpp and
ODi_ChangeTrackingACChange.cpp handles exporting and importing
ac:change attributes in ODT. Both of these classes allow the internal
PP_Revision and PP_RevisionAttr data structures to be inspected or
setup. For example, to export from an existing in memory document to ODF
the main method would be:
std::string createACChange( std::list< const PP_Revision* > rl );

A more complete use might be:

ODe_ChangeTrackingACChange acChange;
acChange.setCurrentRevision( spanidref );
acChange.setAttributeLookupFunction( "text:style-name",
acChange.getLookupODFStyleFunctor( m_rAutomatiStyles, m_rStyles ) );
ODe_writeString( m_pParagraphContent,
acChange.createACChange( aclist ) );

The setCurrentRevision() tells the acChange object to ignore PP_Revision
objects from aclist which are higher than spanidref if such are present.
The setAttributeLookupFunction() is the most complex part of the
example. A boost::functor is supplied to that method which knows how to
calculate the value of an ODT attribute from the abiword attributes in a
PP_Revision. The acChange class handles converting the calculated ODT
attributes into ac:change ODT+CT XML attributes.

As such a mapping (abiword style -> ODT style) is likely to be used in
many places in the code, the acChange class itself can supply an
appropriate functor with getLookupODFStyleFunctor().

Reading back ac:change attributes from ODT is as easy as:
PP_RevisionAttr x;
ODi_ChangeTrackingACChange ct( this );
ct.ctAddACChange( x, ppAtts );

Though again you will have to handle the mapping from ODT speak to
abiword speak. At the moment I have left that open to the calling code,
I might move to functors for that mapping too at some stage. For an
example see:
ODi_TextContent_ListenerState::ctAddACChangeODFTextStyle()
which translates ODF text:style-name into abiword properties.

One reason I used functors when generating the ac:change XML attributes
in an export is that the exporting code needs to know the list of
revisions that have attributes associated with them (explained shortly).
On the flip side, when reading an ODT file the file itself will tell the
code this list of revisions.

The ac:change class has to handle shifting existing abiword revisions
back one level, which it does internally. This helps calling code a
bunch as it doesn't have to worry about that (or the reverse during
load). The primary difference is that abiword stores for a revision what
is changed during that revision (5,bold for bold being added in revision
5), on the other hand ODT+CT stores what the value *used to be* not what
it is *set to* during a revision in ac:change.

For example, in abiword we might have a collection of attributes
2 foo=bar
4 foo=boo
5 foo=goo

The ODT+CT will want something like:
<text:span foo="goo"
ac:change1="2,insert,foo,"
ac:change2="4,modify,foo,bar"
ac:change3="5,modify,foo,boo" ... />

So one can see that having ac:change able to shift these things around
between the two styles helps out the calling code which only want to
think in an abiword revision fashion.

Another interesting challenge is that abiword wants to store what
*changes* between revisions for style data and ODF want to cite a style
instead. To see this one has to convert from abw to odt and attempt to
come back to abw again.

See for example, to-bold-italic-and-back-to-normal.abw where one has the
c (odt text:span like) element:
<c
revision="1,
!2{font-weight:bold}{author:0},
!3{font-style:italic}{author:0},
!4{font-weight:normal}{author:0}">text</c>

Where we move to bold, add italic, and then remove the bold again to be
left with just italic. This gives the following ODF

<text:span
  text:style-name="T1"
  delta:insertion-change-idref="4"
  delta:insertion-type="insert-around-content"
  ac:change4="2,insert,text:style-name,"
  ac:change5="3,modify,text:style-name,T2"
  ac:change6="4,modify,text:style-name,T4">text</text:span>

Where T1 is italic, T2 is bold, and T4 is bold+italic. All is well, but
when converting this back to an abw file going from T4 to T1 we run into
the issue where we need to simplify the properties. On loading we will
see something like this from the ac:change loading code:

!2{font-weight:bold}{author:0},
!3{font-weight:bold,font-style:italic}{author:0},
!4{ ,font-style:italic}{author:0}

This is different to what we started with and needs to be simplified
back to what abiword expected. This is coded in
ODi_TextContent_ListenerState::ctSimplifyStyles() which calls
getDefaultStyle(). The ctSimplifyStyles() method will remove the
redundant font-style:italic from revision 4, and notice that
font-weight:bold is no longer in revision 4. It will then call
getDefaultStyle() with the property font-weight to obtain the value
font-weight:normal which is added to revision 4. This recreates the
original revision 4. The same process is then performed for all
revisions in the revision attribute.

As Martin suggested on this list recently, I made a new
getDefaultStyle() in pd_document which getDefaultStyle() in the ODi code
uses. The getDefaultStyle() function attempts to find the default value
in there (the "Normal" style) and if it fails it also has some hard
coded fallbacks. (for properties which "Normal" doesn't explicitly
state). Note that getDefaultStyle() in pd_doc is just a dumb
implementation returning "Normal".

I'd like to make the RNG schemas a bit smarter too, for example for
style tests I can see value in generating the RNG from the
ODT/content.xml and a template RNG file. This way the styles in the odt
can be read and injected into the RNG file so it is testing style
applications at a more semantic level. But this email is becoming a
novelette already, so details on that can follow.

[1]
http://lists.oasis-open.org/archives/office-comment/201007/msg00010.html

application/pgp-signature attachment: This is a digitally signed message part

Received on Sat Apr 30 06:52:26 2011

This archive was generated by hypermail 2.1.8 : Sat Apr 30 2011 - 06:52:26 CEST

Some information on Generic Change Tracking and abiword...