Metadata

From Pelennor

Jump to: navigation, search

Contents

Metadata in Pelennor

Pelennor envisions the use of two varieties of metadata, structured and arbitrary. Structured metadata conforms to a referenceable schema, such that its purpose is understood by all users. It is the preferred form. Arbitrary metadata takes the form of isolated attribute-value pairs or ad-hoc schemas and is fundamentally weak in semantics. Nevertheless, there may be cases where arbitrary metadata is useful as a form of temporary data storage or for informal "tagging" purposes.

Uses of metadata

Metadata is traditionally seen as orthogonal to data schemas (basic symbolic meaning) or domain ontologies. It is often debated when and whether it is appropriate to mix metadata with the latter. This is especially true when the dividing line becomes invisible at the implementation level. For example, "4 wheel drive" would customarily be part of the "automobile" domain ontology, rather than an arbitrary piece of metadata. Nevertheless, the domain ontology itself may be constructed using the same building blocks that we call "metadata" elsewhere. Indeed, our ontology may, in practice, be as mutable as "pure" metadata. Some have even suggested that ontologies are too rigid, given today's rapidly-changing information storage needs. There are also many practicality issues surrounding the creation of standardized domain ontologies, not the least of which is the perceived need for centralized authority.

Simple, unstructured metadata cannot add true semantic richness to large blocks of semantically-poor data. (As evidenced by current web and desktop search engines) However, once data is decomposed into a graph, even without a defined schema, metadata can be used to flexibly layer additional semantics.

Production of metadata

It is largely impractical to employ humans in the direct production of metadata, both due to the time-cost of annotation and the high potential for context ambiguity. ("tagging" is a traditional example) Nevertheless, rich metadata can be harvested through exploiting the design of the information workflow. Humans should not be aware that they are helping to produce metadata. Otherwise, metadata should be automatically generated when possible, using pattern recognition and machine-learning techniques as necessary.

Types of metadata

As a crude definition, structured metadata is part of a larger "well-formed web of meaning" while arbitrary metadata tends to stand on its own and lack significant context. This should not necessarily imply anything about their storage implementation.

Structured

  • Access controls - as used in traditional filesystem permissions
  • Access history and versioning
  • Relationships between data in different domain ontologies
  • Conversational context - the semantic value related to the scope and association of data entities used during a process or interaction; may also be used as a form of long-term conversation state
  • Pre-defined key-values pairs, such as rank, score, collection, or references.

Arbitrary

  • User-defined or keyword-harvested tags - weak semantics in the form used by current desktop search engines
Personal tools