metadata namespace screw cases and DC/RDF

From: Paul Rohr (paul@abisource.com)
Date: Tue May 14 2002 - 13:43:02 EDT

  • Next message: F J Franklin: "commit: abi: psiconv"

    At 12:23 PM 5/14/02 -0400, Dom Lachowicz wrote:
    >On Tue, 2002-05-14 at 11:57, Karl Ove Hufthammer wrote:
    >> [1] <URL: http://dublincore.org/documents/dces/ >
    >> [2] <URL: http://dublincore.org/documents/dcmes-qualifiers/ >
    >> [3] <URL: http://dublincore.org/documents/dcmes-xml/ >
    >> [4] <URL: http://dublincore.org/resources/faq/ >
    >
    >I've taken Paul's suggestion, and now we use #define statements instead
    >of methods. I've updated the MSWord importer as needed.
    >
    >I've also taken Karl's suggestion, and now we support a superset of the
    >Dublin Core elements. The supersetted term is "keywords" which, IMO, is
    >sorely lacking from their list.
    >
    >I'll probably export the properties as DC/RDF sometime soon to make our
    >metadata and m tags go away. The AbiWord importer will have go get
    >seriously smarter to properly handle the namespacing issues involved.

    Karl and Dom,

    I've just finished skimming the DC stuff too, and I agree that we'll need a
    superset approach. By design, DC is a small well-defined set of metadata
    that's considered useful for public indexing of all content. Transparently
    capturing that data is a Good Thing. However, people and organizations use
    word processor properties dialogs to store all kinds of other stuff. I
    suspect that trying to shoehorn that whole mess into "pure" DC is a losing
    battle.

    As for switching to RDF, the big question for me is whether adding all that
    code is really need to help us solve this namespace problem. If not, I'm
    tempted to follow the simpler precedent already in place for HTML:

      http://www.ietf.org/rfc/rfc2731.txt

    In short, the idea would be that we preface any of *our* keys which are
    DC-compatible with the DC prefix. All others -- whether defined by Word or
    by users -- go at top level. To implement this should just take a quick
    upgrade to Dom's current #defines.

    the screw cases
    ---------------
    For example, what would the RDF equivalent of the following markup be?

      <m name="DC.title">World Domination</m>
      <m name="DC.creator">Abi the Ant</m>
      <m name="DC.language">en-US</m>
      <m name="DC.subject">My secret 10-year plan. Shh! Don't tell!</m>
      <m name="DC.date.created">1998-08-01T09:14:37-05:00</m>
      <m name="DC.date.printed">2002-05-13T13:15:30Z</m>
      <m name="pages">143</m>
      <m name="Checked by">Legal Review Committee #43</m>
      <m name="Typist">MLM</m>
      <m name="Playlist">Dave Brubeck, Time Further Out; Tom Tom Club</m>
      <m name="$%&^$$&$">See, that property name is in Inuktitut. I don't speak
    Inuktitut -- it's too cold for us ants up there -- but I just love working
    on a word processor that lets me do stuff like this.</m>

    If this example seems contrived, think again. There are large organizations
    who like to keep close track of who did what in the production and review
    process. There are also creative individuals who like to mention which
    tracks from their playlist helped inspire a given work.

    Can you imagine someone like Abi doing the latter while operating inside the
    corporate constraints of the former? I can. ;-)

    a few other notes
    -----------------
    1. For those of you who read the above date examples carefully, I'm not
    sure whether our canonical datetime output should include the timezone
    offsets or not. For details, see:

      http://www.w3.org/TR/NOTE-datetime

    2. Where possible, we should certainly map as many of the standard Word/RTF
    properties onto their properly-qualified DC equivalent. The user-visible
    names obviously wouldn't have all that dotted DC gibberish, though. That
    way, people can get decent DC compatibility (ignoring the controlled
    vocabulary stuff, of course) by just typing in a friendly dialog. Or by
    importing their existing documents to AbiWord. ;-)

    3. FWIW, I'm not sure it's all that safe to map Word's company onto DC's
    publisher. Word actually has a separate publisher keyword in their custom
    tag.

    4. Getting back to my original metadata vs. document properties thread, is
    a DC.language property the right place to store the document's default
    language, or should we be using PROPS for that instead?

    bottom line
    -----------
    I think DC is a small but very useful subset of the metadata our users will
    want to capture. For me, the jury's still out on whether RDF adds any value
    beyond that.

    Paul,
    trying to think ahead



    This archive was generated by hypermail 2.1.4 : Tue May 14 2002 - 13:45:41 EDT