February 13, 2009

pygloss primer

I've been working pretty solidly on my pygloss project: a Python Glossonomy web application. So what's Glossonomy you ask? Well, it's something between Glossary - a list of terms and definitions, and Taxonomy - a controlled vocabulary with relationships between terms (like Narrower/Wider, Container/Part, Related etc. ). Also in this space are Ontologies, Topic Maps and Concept Maps. And since I don't want to be constrained by preconceptions I've coined a new word - Glossonomy!

Here are a couple of scenarios that might be familiar...
Two groups of people (communities, departments, agencies) are working in similar spaces and have similar interests. Now they need to collaborate, but they find they're using different terms for the same thing, or the same term meaning different things. It can take months to iron out assumptions and errors caused by simple misconceptions.
A business has invested in defining a formal vocabulary to support its' core processes, and this is published as MS Word document. Because it covers the whole business it's weighty and mind-numbingly boring. People don't really use it as intended.
The tools I mention above all try to address these issues, and so does pygloss, but with some Web 2.0 influences at work. This give use the following characteristics/goals:
  • 100% web application
  • Bundles related Terms and Concepts into Domains which have self-managing user Communities.
  • Encourages sharing Terms and Concepts across Domains - cross-pollination, reuse and enrichment.
  • Search and Visualisation tools to explore related concepts.
  • The URI is important - terms get enduring URLs so they can be referenced from other places reliably.
  • Support for semantic web standards - RDF, skos etc
Other extended capabilities could include
  • Glossary Extraction from parsed documents.
  • Term, Concept Extraction, based on submitted content (docs, web pages etc) using Natural Language techniques.
In future posts I'll cover the following:
  • some screen shots and previews of latest pygloss 0.2 featues and the data model.
  • relationships to wordnet and other taxonomy tools
  • Some 'under the hood stuff' - how it hangs together (expect some python, ajax, zodb, xapian learnings here).
  • how this project relates to my earlier cvcore work.
But I'll sign-off now with an appetizer - a some visualisations produced by pygloss 0.1...

Using the AT&T dot library, automated layouts delivered as SVG, PDF, PNG etc...


And from thejit - interactive visualisations, using client-side javascript...

A 'hypertree' layout on the central term 'COURSE' and others within a radius of 2 'hops'



And a radial graph layout, composed of all the terms from the domain IM.SDR...

3 comments:

  1. Really interesting project - I think the problem of mismatched terminology is common to all projects where organisations try to interoperate in some way. Formal taxonomies using OWL, RDF etc. can be pretty intimidating and need quite a bit of expertise to construct, not to mention potentially endless arguments over definitions. So having tools to do this in a co-operative web space could be really productive. Are you thinking about allowing multiple definitions of a term within a domain to evolve over time. Trying to get everything "right" first time is a killer.
    ReplyDelete
  2. Derek - thanks for your words of encouragement

    Re multiple definitions, yes we need to show when there are several concepts linked to one term - a 'disambiguation' view. The larger the domain, the more prevalent this will be, e.g. wordnet. And of course there's multiple terms for one concept.

    The challenge here is the URI pattern. The term is the only natural identifier for the concept, but as you've pointed out it's not unique - in fact it's a many-to-many relation. I believe getting the URI design 'right' (ie intuitive) is a key concern, so I'm experimenting with some patterns right now.

    On a related note, from the perspective of the concept, I envisage a scenario where the domain owners wish to enforce a rule that there must be a single 'preferred term' for each concept (and 0-N alternate terms). This would provide a more natural support for thesaurii (like SONZ, FONZ etc) and the SKOS object model.

    I'll soon post some links to the alpha release of glossio - stay tuned :)
    ReplyDelete
  3. OK - VERY ALPHA - but please have a look anyway...

    http://gloss.io/domain/fonz/term/Accessioning

    this is an example of one term, two meanings from the Functions of New Zealand Thesaraus (FONZ). the (c) link goes to the concept.

    FYI I just completed stress testing to figure out how much. Next on the list is tidy-up some broken stuff, and theming and to get the editing functionality back together for some community input.

    I suggest trying the search tool (using googlesque query syntax). And the visualisations from /concept. Note IE needs the Adobe SVG plugin, Firefox 3.0 works great.
    ReplyDelete

About Me

Wellington, New Zealand
Software Architect / Developer with a keen interest in open data and applications of the Semantic Web.