May 31st, 2005 — 12:00am
Concept maps popped onto the radar last week when an article in Wired highlighted a concept mapping tool called Cmap. Cmap is one of a variety of concept mapping tools that’s in use in schools and other educational settings to teach children to model the structure and relationships connecting – well – concepts.
The root idea of using concept mapping in educational settings is to move away from static models of knowledge, and toward dynamic models of relationships between concepts that allow new kinds of reasoning, understanding, and knowledge. That sounds a lot like the purpose of OWL.
It might be a stretch to say that by advocating concept maps, schools are in fact training kids to create ontologies as a basic learning and teaching method, and a vehicle for communicating complex ideas – but it’s a very interesting stretch all the same. As Information Architects, we’re familiar with the ways that structured visualizations of interconnected things – pages, topics, functions, etc. – communicate complex notions quickly and more effectively than words. But most of the rest of the world doesn’t think and communicate this way – or at least isn’t consciously aware that it does.
It seems reasonable that kids who learn to think in terms of concept maps from an early age might start using them to directly communicate their understandings of all kinds of things throughout life. It might be a great way to communicate the complex thoughts and ideas at play when answering a simple question like “What do you think about the war in Iraq?”
Author Nancy Kress explores this excact idea in the science fiction novel ‘Beggars In Spain’, calling the constructions “thought strings”. In Kress’ book, thought strings are the preferred method of communcation for extremely intelligent genetically engineered children, who have in effect moved to realms of cognitive complexity that exceed the structural capacity of ordinary languages. As Kress describes them, the density and multidimensional nature of thought strings makes it much easier to share nuanced understandings of extremely complex domains, ideas, and situations in a compact way.
I’ve only read the first novel in the trilogy, so I can’t speak to how Kress develops the idea of thought strings, but there’s a clear connection between the construct she defines and the concept map as laid out by Novak, who says, “it is best to construct concept maps with reference to some particular question we seek to answer or some situation or event that we are trying to understand”.
Excerpts from the Wired article:
“Concept maps can be used to assess student knowledge, encourage thinking and problem solving instead of rote learning, organize information for writing projects and help teachers write new curricula. “
“We need to move education from a memorizing system and repetitive system to a dynamic system,” said Gaspar Tarte, who is spearheading education reform in Panama as the country’s secretary of governmental innovation.”
“We would like to use tools and a methodology that helps children construct knowledge,” Tarte said. “Concept maps was the best tool that we found.”
Related posts:
Comment » | Modeling, Semantic Web
May 16th, 2005 — 12:00am
Thursday night I was at Casablanca in Harvard Square for an information architecture meet and greet after Lou’s Enterprise IA seminar. I ordered a Wolver’s. It was dim and noisy, so after shouting three times and pointing, I ended up with a Wolaver’s…
Not a surprise, right? My first thought was “What’s in my glass?” My second thought – I was surrounded by information architects – was about the semantic angle on the situation. It seems like a fair mistake to make in a loud and crowded bar. But as someone who works there, he should know the environmental context, the ways it affects fundamental tasks like talking and answering questions, and about any alternatives to what he thought I said that are close enough to be easily mistaken. Before I get too far, I’ll point out that I liked the mistake enough to order another.
Setting aside for a moment the notion of a semantically adept agent system that monitors interactions between bartenders and patrons to prevent mistakes like this, let’s look at something more likely, such as how does Google fair with this situation? Some post-socialization research shows that as far as Google is concerned, all roads do in fact lead to Wolaver’s. Even when Google’s results list begins with a link to a page on Wolver’s Ale from the originating brewery, it still suggests that you might want ‘wolaver’s ale’. Maybe this explains the bartender’s mistake.
Here’s the breakdown: Google US suggests “wolaver’s ale” when you search for “wolvers ale” and “wolver’s ale”, but not the other way around. When you search for “Wolavers”, Google suggests the correctly punctuated “Wolaver’s”. You can get to the American ale, but not the British.
More surprising, it’s the same from Google UK, when searching only British pages. (Someone tell me how pages become part of the UK? Maybe when they’re sent off to full-time boarding school?)
Google’s insistence on taking me from wherever I start to “Wolaver’s Ale” comes from more than simple American brew chauvanism. This is what happens when the wrong factors drive decisions about the meanings of things; it’s these basic decisions about semantics that determine whether or not a thing correctly meet the needs of the people looking for answers to a question.
You might say semantic misalignment (or whatever we choose to call this condition) is fine, since Google’s business is aimed at doing something else, but I can’t imagine that business leaderhsip and staff at Wolver’s would be too happy to see Google directing traffic away from them by suggesting that people didn’t want to find them in the first place. Neither Wolver’s nor Wolavers seems to have Google ads running for their names, but what if they did? By now we’re all familar with the fact that googling ‘miserable failure‘ returns a link to the White House web site. This reflects a popularly defined association rich in cultural significance, but that isn’t going to satisfy a paying customer who is losing business because a semantically unaware system works against them.
This a good example of a situation in which intelligent disambiguation based on relationships and inferencing within a defined context has direct business ramifications.
Here’s a preview of the full size table that shows the results of checking some variants of wolvers / wolavers:

Related posts:
Comment » | Semantic Web
April 25th, 2005 — 12:00am
Reading the online edition of the New York Times just before leaving work this afternoon, I came across an ironic mistake that shows the utility of a well developed semantic framework that models the terms and relationships in defingin different editorial contexts. In an article discussing the Matrix Online multiplayer game, text identifying the movie character the Oracle mistakenly linked to a business profile page on the company of the same name. In keeping with the movie’s sinister depictions of technology as a tool for creating deceptive mediated realities, by the time I’d driven home and made mojitos for my visiting in-laws, the mistake was corrected…
Ironic humor aside, it’s unlikely that NYTimes Digital editors intended to confuse a movie character with a giant software company. It’s possible that the NYTimes Digital publishing platform uses some form of semantic framework to oversee automated linking of terms that exist in one or more defined ontologies, in which case this mistake implies some form of mis-categorization at the article level,invokgin the wrong ontology. Or perhaps this is an example of an instance where a name in the real world exists simultaneously in two very different contexts, and there is no semantic rule to govern how the system handles reconciliation of conflicts or invocation of manual intervention in cases when life refuses to fit neatly into a set of ontologies. That’s a design failure in the governance components of the semantic framework itself.
It’s more likely that the publishing platform automatically searches for company names in articles due for publication, and then creates links to the corresponding profile information page without reference to a semantic framework that employs contextual models to discriminate between ambiguous or conflicting term usage. For a major content creator and distributor like the NY Times, that’s a strategic oversight.
In this screen capture, you can see the first version of the article text, with the link to the Oracle page clearly visible:
Mistake:

The new version, without the mistaken link, is visible in this screen capture:
New Version:

Related posts:
Comment » | Semantic Web
February 18th, 2005 — 12:00am
mSpace is a new framework – including user interface – for interacting with semantically structured information that appeared on Slashdot this morning.
According to the supporting literature, mSpace handles both ontologically structured data, and RDF based information that is not modelled with ontologies.
What is potentially most valuable about the mSpace framework is a useful, usable interface for both navigating / exploring RDF-based information spaces, and editing them.
From the mSpace sourceforge site:
“mSpace is an interaction model designed to allow a user to navigate in a meaningful manner the multi-dimensional space that an ontology can provide. mSpace offers potentially useful slices through this space by selection of ontological categories.
mSpace is fully generalised and as such, with a little definition, can be used to explore any knowledge base (without the requirement of ontologies!).
Please see mspace.ecs.soton.ac.uk for more information.”
From the abstract of the Technical report, titled mSpace: exploring the Semantic Web“
“Information on the web is traditionally accessed through keyword searching. This method is powerful in the hands of a user that is experienced in the domain they wish to acquire knowledge within. Domain exploration is a more difficult task in the current environment for a user who does not precisely understand the information they are seeking. Semantic Web technologies can be used to represent a complex information space, allowing the exploration of data through more powerful methods than text search. Ontologies and RDF data can be used to represent rich domains, but can have a high barrier to entry in terms of application or data creation cost.
The mSpace interaction model describes a method of easily representing meaningful slices through these multidimensional spaces. This paper describes the design and creation of a system that implements the mSpace interaction model in a fashion that allows it to be applied across almost any set of RDF data with minimal reconfiguration. The system has no requirement for ontological support, but can make use of it if available. This allows the visualisation of existing non-semantic data with minimal cost, without sacrificing the ability to utilise the power that semantically-enabled data can provide.”
Related posts:
Comment » | Modeling, Semantic Web, User Experience (UX)
February 7th, 2005 — 12:00am
In the latest issue of ACMQueue, Tim Bray is interviewed about his career path and early involvement with the SGML and XML standards. While recounting, Bray makes four points about the slow pace of adoption for RDF, and reiterates his conviction that the current quality of RDF-based tools is an obstacle to their adoption and the success of the Semantic Web.
Here are Bray’s points, with some commentary based on recent experiences with RDF and OWL based ontology management tools.
1. Motivating people to provide metadata is difficult. Bray says, “If there’s one thing we’ve learned, it’s that there’s no such thing as cheap meta-data.”
This is plainly a problem in spaces much beyond RDF. I hold the concept and the label meta-data itself partly responsible, since the term meta-data explicitly separates the descriptive/referential information from the idea of the data itself. I wager that user adoption of meta-data tools and processes will increase as soon as we stop dissociating a complete package into two distinct things, with different implied levels of effort and value. I’m not sure what a unified label for the base level unit construct made of meta-data and source data would be (an asset maybe?), but the implied devaluation of meta-data as an optional or supplemental element means that the time and effort demands of accurate and comprehensive tagging seem onerous to many users and businesses. Thus the proliferation of automated taxonomy and categorization generation tools…
2. Inference based processing is ineffective. Bray says, “Inferring meta-data doesn’t work… Inferring meta-data by natural language processing has always been expensive and flaky with a poor return on investment.”
I think this isn’t specific enough to agree with without qualification. However, I have seen analysis of a number of inferrencing systems, and they tend to be slow, especially when processing and updating large RDF graphs. I’m not a systems architect or an engineer, but it does seem that none of the various solutions now available directly solves the problem of allowing rapid, real-time inferrencing. This is an issue with structures that change frequently, or during high-intensity periods of the ontology life-cycle, such as initial build and editorial review.
3. Bray says, “To this day, I remain fairly unconvinced of the core Semantic Web proposition. I own the domain name RDF.net. I’ve offered the world the RDF.net challenge, which is that for anybody who can build an actual RDF-based application that I want to use more than once or twice a week, I’ll give them RDF.net. I announced that in May 2003, and nothing has come close.”
Again, I think this needs some clarification, but it brings out a serious potential barrier to the success of RDF and the Semantic Web by showcasing the poor quality of existing tools as a direct negative influencer on user satisfaction. I’ve heard this from users working with both commercial and home-built semantic structure management tools, and at all levels of usage from core to occasional.
To this I would add the idea that RDF was meant for interpretation by machines not people, and as a consequence the basic user experience paradigms for displaying and manipulating large RDF graphs and other semantic constructs remain unresolved. Mozilla and Netscape did wonders to make the WWW apparent in a visceral and tangible fashion; I suspect RDF may need the same to really take off and enter the realm of the less-than-abstruse.
4. RDF was not intended to be a Knowledge Representation language. Bray says, “My original version of RDF was as a general-purpose meta-data interchange facility. I hadn’t seen that it was going to be the basis for a general-purpose KR version of the world.”
This sounds a bit like a warning, or at least a strong admonition against reaching too far. OWL and variants are new (relatively), so it’s too early to tell if Bray is right about the scope and ambition of the Semantic Web effort being too great. But it does point out that the context of the standard bears heavily on its eventual functional achievement when put into effect. If RDF was never meant to bear its current load, then it’s not a surprise that an effective suite of RDF tools remains unavailable.
Related posts:
Comment » | Semantic Web, Tools
August 17th, 2004 — 12:00am
Here’s a few examples of how Gmail has fared at matching the content of email messages to my Gmail address with advertising content.
A forwarded review of King Arthur gives me “King Arthur Competition” and “King Arthur – Was He Real?” For something this easy and contemporary, I would have expected to see suggestions about movie times and locations, offers to publish my screenplay, and collections of King Arthur collectibles.
An anecdote about Eamon de Valera delivers Shillelagh (sic.), “Irish Clan Aran Sweaters”, and “Classic Irish Imports”. This truly an easy one, since it’s a small pool of similar source terms to sort through. “No, I meant Eamon de Valera, the famous Irish ballet dancer…” Will Gmail suggest links with correct spellings at some future date, or offer correct links to things that you’ve mis-spelled?
A message about another forwarded email sent a few moments before brings “Groupwise email”, “Ecarboncopy.com”, and “Track Email Reading Time”. These are accurate by topic, but not interesting.
A recent email exchange on how to use an excel spreadsheet template card sorting analysis offers four links. Three are sponsored, the other is ‘related’. The sponsored links include “OLAP Excel Browser”, “Microsoft Excel Templates”, and “Analysis Services Guide”. A related link is, “Generating Spreadsheets with PHP and PEAR”. These are simple word matches – none of them really approached the central issue of the conversation, which concerned how to best use automated tools for card sorting.
Last month, in the midst of an exchange about making vacation plans for the 4th of July with family, Gmail offered “Free 4th of July Clip Art”, “Fireworks Weather Forecasts”, and “U.S. Flags and patriotic items for sale”. Given the obvious 4th of July theme, this performance is less impressive, but still solid, offering me a convenience-based service in a timely and topical fashion.
Most interesting of all, a message mentioning a relative of mine named Arena yields links for “Organic Pastas” and “Fine Italian Pasta Makers”. Someone’s doing something right with controlled vocabularies and synonym rings, since it’s clear that Google knows Arena is an Italian surname in this instance and not a large structure for performances: even though it only appeared in the text of the email once, and there was no context to indicate which meaning it carried.
Beyond the obvious – you send me a message, Gmail parses it for terms and phrases that match a list of sponsored links, and I see the message and the links side-by-side – what’s happening here?
Three things:
1. Gmail is product placement for your email. In the same way that the Coke can visible on the kitchen table during a passing shot in the latest romantic comedy from Touchstone pictures is more an advertising message than part of the overall mise en scene, those sponsored links are a commercially driven element of the experience of Gmail that serves a specific agenda exterior to your own.
2. Gmail converts advertisements (sponsored links) into a form of hypertext that should be called advertext. Gmail is creating a new advertext network composed of Google’s sponsored links in companion to your correspondence. Before Gmail, the sponsored links that Google returned in accompaniment to search queries were part of an information space outside your immediate personal universe,
3. Gmail connects vastly different information spaces and realms of thinking. Google’s sponsored links bridge any remaining gap between personal, private, individual conversations, and the commercialized subset of cyberspace that is Google’s ad-verse. You will inevitably come to understand the meaning and content of your messages differently as a result of seeing them presented in a context informed by and composed of advertising.
The implications of the third point are the most dramatic. When all of our personal spaces are fully subject to colonization by the ad-verse, what communication is left that isn’t an act of marketing or advertisement?
Related posts:
Comment » | Ideas, The Media Environment
June 22nd, 2004 — 12:00am
Five minutes after logging into my shiny new gmail account today and sending out a hello message toa few friends, I got a taste of new technology pranksterism: an old friend sent a reply to my hello loaded with keywords for everyone’s favorite flavors of spam. Naturally, my friend had read the Gmail intro that outlines their keyword targeted ad policy, stating that one of the conditions of participating in the beta was that Google would serve up ads related to the content of my messages within the new UI.
I don’t know how aggressively Google will match ads to content, but I haven’t seen anything tied to Scranton, PA on my screen yet. As a riposte, my friend should soon see plenty of discount remedies for embarrassing medical conditions, debilitating psychological illnesses, and other matters of questionable taste.
Funny or not, I find it a bit spooky that my mail is being parsed in order to drive advertising. Yes, un-encrypted email is basically as private as a post-card – but it’s highly unlikely that the local post office is going to slip a brochure for travel agencies and package vacations into friends’ mailboxes to accompany the post-cards I send them while I’m visiting Barcelona or Tenerife.
And then there are the inevitable followup questions: what kinds of patterns is Google building on top of this? Are they using geomatching to ID clusters of themes within zip codes? Maybe creating a history of my searching behavior and the number of times I follow the links placed by the engine, to establish a baseline for how susceptible I am to advertising? Or how often people in certain networks read and reply to messages with certain kinds of content?
I don’t think paranoia is appropriate, but there is a double-edged sword in every technology – especially one like this that combines accumulating personal data with tremendous interpretive power.
And even if I did sign up for the free account knowing that Gmail use implied acceptance of this practice, privacy remains a fundamental right. You can’t create valid and binding contracts that require or permit illegal activity.
Look out for travel guides to Scranton…
Related posts:
Comment » | Ideas, The Media Environment
May 29th, 2004 — 12:00am
It’s been awhile since I’ve had time to read the Word of the Day emails that I get from the good people at Merriam Webster and Yourdictionary.com: long enough that I’ve set up a filter directing their daily contributions to the betterment of my vocabulary into a one of those dead-end Outlook folders that you see highlighted in bold, but never manage to do anything other than bulk delete every few months when you recognize the number of unread messages has crossed from two to three digits. (The count of unread words of the day in my folder is now 91 – just about time to purge again.)
But now, thanks to the atrocious epidemic of spam that’s raging without surcease, I don’t need to feel bad about ignoring the latest juicy word to drop into my Inbox.
Now instead of knowing that it will only be shunted aside and ignored for months before its’ summary termination, I can calmly watch as it’s disposed of without ado.
Now all I need to do for a rich and unusual lexical lesson is peruse the subject lines of the dozens of spam messages that the layers of filters deployed by my ISP haven’t corralled as parasitic trash.
Thanks to the pertinacious conclave of spammers who’ve found the means to pollute the Internet with offers of discount medicines and penile enlargement disguised behind word combinations generated by dictionaries and scripts, there’s a veritable smorgasbord of uncanny solecisms gracing my inbox every day.
Things like, “libidinous plutarchy”, “inconspicuous megohm”, “charcoal expectorant”, and others not even worth mentiong despite their remarkable incongruity bring me unforeseen verbal richness.
Aside from the surrealists and their experiments with automatic writing during the 30’s, who but a spammer would ever think to send out a message about “albania seethe pfennig columbia” – which by the way would make a great name for comic book villainness “You haven’t won yet, Albania Seethe! Justice will be done!”
My day is already good when I can look forward to reading about “erosible integument”, which I seem to remember overhearing the last time I was within fifty yards of a geochemistry lab.
“Systemic cohomolgy” sounds like a pretty cool degenerative disease, or maybe a death metal band.
“Afghanistan surname baboon” is the sort of thing I’d expect to hear coming from one of those early artificial intelligence programs trying to recreate human speech: the sort that you used to see on Nova in the early 80’s; you know the scene – lots of twenty-something guys who haven’t been out in the sunlight enough even though they’re at UC San Jose are all standing around a radio-shacked amateur version of a speaker cabinet looking intently at an amber monitor, while one of them types “Hello. How are you today?” on a keyboard without a cover, only to end up visibly crestfallen when a tinny synthesized voice spits out something akin to gibberish above, and in the end they utter the inevitable combination of exuberant pronouncements regarding natural language processing, and conditioned realism about the fallacies of science fiction expectations.
Some of the spammers no doubt prefer to take a more Zen minimalist approach to fomenting palaver, using single words that bespeak a substantial degree of amphiboly; “gasify”, “archfool”, “deciduous”, “involute” and “burg” are examples of this tradition.
Then there are the imperatives, not to be casually ignored without some measure of trepidation: “deconvolve”, “rebut”, “throb”, and “migrate” for example.
With all these SAT words flowing uninterruptedly into my mailbox, there’s practically no excuse for not doing the Times crossword in pen.
So I say “Thank You Spammers!” Spam On! Whenever I want a tasty linguistic morsel, I’ll just shut off my spam filters…
Related posts:
Comment » | The Media Environment
May 3rd, 2004 — 12:00am
Here’s a some snippets from an article in the Web Services Journal that nicely explains some of the business benefits of a services-based architecture that uses ontologies to integrate disparate applications and knowledge spaces.
Note that XML / RDF / OWL – all from the W3C – together only make up part of the story on new tools for how making it easy for systems (and users, and businesses…) to understand and work with complicated information spaces and relationships. There’s also Topic Maps, which do a very good job of visually mapping relationships that people and systems can understand.
Article:Semantic Mapping, Ontologies, and XML Standards
The key to managing complexity in application integration projects
Snippets:
Another important notion of ontologies is entity correspondence. Ontologies that are leveraged in more of a B2B environment must leverage data that is scattered across very different information systems, and information that resides in many separate domains. Ontologies in this scenario provide a great deal of value because we can join information together, such as product information mapped to on-time delivery history mapped to customer complaints and compliments. This establishes entity correspondence.
So, how do you implement ontologies in your application integration problem domain? In essence, some technology – either an integration broker or applications server, for instance – needs to act as an ontology server and/or mapping server.
An ontology server houses the ontologies that are created to service the application integration problem domain. There are three types of ontologies stored: shared, resource, and application. Shared ontologies are made up of definitions of general terms that are common across and between enterprises. Resource ontologies are made up of definitions of terms used by a specific resource. Application ontologies are native to particular applications, such as an inventory application. Mapping servers store the mappings between ontologies (stored in the ontology server). The mapping server also stores conversion functions, which account for the differences between schemas native to remote source and target systems. Mappings are specified using a declarative syntax that provides reuse.
RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise. The difference between the two is that XML is used to transport data using a common format, while RDF is layered on top of XML defining a broad category of data. When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it.
Comment » | Semantic Web