chir.ag/tech [archive]

 
 
 
 
 
 
 

/tech home / projects / personal 'blog / about chir.ag

 

ARCHIVE: Tagline Generator - Timeline-based Tag Clouds

Tue. Nov 14th 2006, 01:28am:

Many people have asked me how they can make their own timeline-based tag cloud like my US Presidential Speeches Tag Cloud. After a lot of cleaning up, I've finally released the complete PHP 5 source code that works pretty well with very basic configuration.

The Tagline Generator is a simple PHP codebase that lets you generate chronological tag clouds from simple text data sources without manually tagging the data entries. Once you have populated the data source and configured the generator, it makes a list of all the unique words that have been used and counts how many times each word is used. Next it identifies the different variations of words and combines them under the most common variation using the Porter Stemming Algorithm. E.g. "promised", "promises", "promising", and "promise" might be grouped under "promises".

Then it removes the most common words like "the", "and", "this", "that" and some not so common language-specific words like "hitherto", and "notwithstanding". Once the commonly used language-specific words are removed, it makes a "tag cloud" in which the more commonly used words are shown in bigger font size than the less frequently used ones. Additionally, it tries to figure out how long ago a given word hit its peak usage and brightens the recently used words while fading away words haven't been used in a while.

Demo: To view a demo and start using the Tagline Generator yourself, check out the Tagline Generator page.