Understanding Taxonomies

Posted: October 14, 2005


Taxonomies have long been an area left to large directories and Yellow Page properties (both online and offline). As local advertising makes more of a foray into internet based properties, the organization of data becomes more and more critical.

First, let’s take a look at the three most common information retrieval and organization systems at present.

Search
We all know what search is. The entire information source (on the web, most commonly a webpage, but could be a PDF, ppt, etc) is indexed, and then according to the search algorithm, various pages are served as a query result. The advantage of search is that it is very easy for a user to find what they’re looking for. The downside is the sheer amount of information which must be stored, and algo manipulation.

Taxonomy Structure
This is when every piece of information is organized into predefined classifications.  An example of this structure for an IYP might be:
Retail > Clothing & Accessories > Men’s shoes.
The advantage of this structure is that information is predefined, so results are much more structured and harder to manipulate. The difficult part takes place behind the scenes in how this information is organized. Often, these websites use a hybrid approach of letting a user search or browse the categories.

Tagging
This is where a user tags his page or website with a specific information piece declaring that the site or webpage is about a particular topic. As this is a freeform system, there is no pre defined structure. This allows some to either misclassify their information, or to choose a tag which isn’t used often, thus the information contained on the site is not properly found.

Advantages & Disadvantages
of Information Retrieval Methods

Search
First, one must build a system capable of housing all the data. Secondly, one must be able to gather all the data. Third, one must build an algo that brings up the most relevant results. The largest drawback is user manipulate data (read SEO for websites). In a closed loop (i.e. your only indexing your own info) this system is less susceptible to manipulation.

Taxonomy
Large websites often are given a single category. This can be highly restrictive to some websites as they fall into multiple categories. As I’m going to focus on taxonomies, I’ll go into more detail below.

Tagging
The major flaw in tagging is you are relying on a free form trust system for users. Some users will mislabel information, and others won’t understand the best keywords to tag their document. This can lead to some information sources not being properly categorized and found for the end user.

Let’s focus on Taxonomies.

The first decision in creating a taxonomy is how many layers you are going to use.

There are two types of taxonomies, flat and layered.

A flat taxonomy has only top level categories. There are no subcategories. The entire classification system could easily be shown on a single page.

A layered taxonomy system uses a hierarchy of subcategories. The top levels are usually very broad classifications (such as shopping, USA, internet, etc) and provide little value other than to house subcategories. The ‘meat’ of a layered system is how subcategories are organized, and how a user would logically navigate through these categories to find the proper information.

Advantages & Disadvantages of a Flat Structure Taxonomy

A flat structure, meaning there are only top level categories, is useful if:

  • You want broad category structures.
  • You are handling a specific vertical, thus your information is already limited in nature.
  • You want easy choices for people to choose their information.
  • You aren’t displaying all the categories at once, and are using some sort of search for category retrieval. (i.e. you might have 5000 categories, but because you’re either using search or another way of integrating the information, there is no need to classify the information into subcategories. Often a database will use this structure).
  • You want to start small, and grow very gradually - thus you aren’t overloading categories with information sources. In this method, it’s best to limit the data you work with, and to think about how large you will grow. If you commit your database to one system and outgrow it - there can be a painful process in reclassifying all the data into a layered structure).

It’s not useful if:

  • You have many categories and you want to display them all on the home page.
  • You want people to be able to select very specific classifications for their information.

The second choice is deep layers.

  • This is useful if you want to be able to display the top level categories on a page, and let people browse through your categories (think DMOZ, Yahoo.com, Yellow Pages, etc).
  • You want to allow information sources to choose very specific categories.
  • You are classifying data by geography.
  • You plan to have many information sources, and having a layer with 10k choices just isn’t useful for a visitor.

The deep layer disadvantage is:

  • You must create the entire taxonomy, which can be very time consuming.
  • If you want to map data to another taxonomy, it can be a very tedious process.
  • If you want broad information retrieval (i.e. return all ‘car’ information and not car > new cars > Volvo dealers.

Creating Taxonomies

If you are using a single layer taxonomy, the set up process is relative easy. Choose your categories and you’re done. If you find information you wish to include, which doesn’t fit into your current scheme, it’s easy to add a new category.

Deep layer taxonomies are much more difficult. You must determine top level first.

If you are using geographic data, the first choice is going to be how to integrate this data. Will you start with regional data at the top level (thus either the country or state listings is a first page category) and then once you reach your lowest level data (USA >Pennsylvania > Philadelphia then display automobiles > new cars > etc.)

  • Are you going to go broad at the lowest level of a geographic search so you will only have one car category in Philly?
  • Will you cross reference categories so that Philly new cars is crossed with the top level category cars > new cars > Volvo?

This is useful if you are going to allow someone into two categories at once, new Volvo dealers and cars in Philadelphia.

How you cross reference your data? If you have:
Information > buying cars > buying Volvos
– isn’t’ that relevant to the category:
Cars > New Car Dealers > Volvo dealerships?

DMOZ’s approach is to link ’similar’ categories like this together. Some IYPs have decided to populate a listing into multiple categories (thus when creating profiles, often you’re asked to pick 1-5 categories for your listing to appear).

How will you add new categories? No matter how well you’ve mapped your taxonomies, new categories will spring up. 5 years ago the category ‘Reality TV Shows’ wasn’t thought of - now it’s a large category. Emerging technologies and trends will always make you expand - do not become integrated into a static system. Plan for how you will expand your system.

Deep layer taxonomies allow users a lot of control. They allow for a lot of data manipulation in cross referencing categories. They allow for other databases to show very specific information you’ve chosen. They allow for quite a bit of flexibility - however, one must understand how their system is set up, how it will expand, and how to deal with eventualities before such a system can go live. Once it’s mapped, planned, and executed, these can be an amazing source for structured information retrieval by users.

Questions to consider when creating a taxonomy:

  • Top level categories - How will you structure them?
  • Geographies - How will they be integrated?
  • Expanding categories - What is your strategy for adding more?
  • Cross referencing categories - Will you use this feature, and if so, how?
  • Search functionality of categories -
    • How will users search your taxonomies for results?
    • Will you show both individual information results and what categories they appear?
    • Will you show related categories?
  • Database retrieval & storage - How will data be stored and found?
  • User display - What is the user interface like?

Taxonomies are not new. They’ve been around for hundreds of years. However, with databases now housing and manipulating millions of pieces of data, the structure, storage, retrieval, and expansion of such systems can create a truly unique environment for your users. While going through the upfront work in how a system will lay out can be a significant amount of work, the flexible systems that can be created will allow one to repurpose data in ways not yet even considered.

Related Information:
« About Google.org
Google Maps Mania »

Stay on top of PPC info - Subscribe Today!




Comments

Comments are closed.

Google AdWords
Seminars for Success

Learn about Google AdWords from experts hand selected by Google.

These seminars will educate advertisers on the creation and management of successful AdWords campaigns.

Upcoming Seminars:

Los Angeles Seminar Monday, November 3rd
Seattle Seminar Wednesday, November 5th

Learn More about the Seminars:
Seminar Information
Official Google Seminar Page
Suggest a new city

Brad Geddes


Brad Geddes Brad Geddes aka eWhisper
View Brad Geddes's profile on LinkedIn









Leslie Clark


Leslie Clark Leslie Clark
View Leslie Clark's profile on LinkedIn

Other Memberships












Local Search Ranking Factors Contributor

2008 SEMMY Runner-Up