Monday, July 18, 2011

Word Clouds and Social Graphs in Yammer

Introduction

Yammer is a tool for internal corporate communications by bringing together all of a company’s employees inside a private and secure enterprise social network. Assumed you are a Yammer member, the Yammer API allows developers to create their own applications like word clouds, user clouds, user walls or social graphs demonstrating how Yammer can change the global enterprise culture into the direction of shared thinking by breaking the functional department silo thinking. For secure API authorization Yammer makes use of the open protocol OAuth.

During the last 2 years I developed command line tools using the Ruby programming language to produce word clouds and social graphs and published the graphs in Yammer. Over time the Yammer analytic graphs became popular in the community and found many practical usage in presentations for enterprise leadership demonstrating the value of social communities. We had a lot of fun when some users started to manipulate the daily word clouds by repeating favorite words in their yams. It was also amazing how fast some user recognized the user name, when I published a personalized word cloud of an organizational high ranked Yammer member and asked: 'Whose word cloud is it?'.

Due to the popularity of my Yammer analytics I was often asked: 'How did you do this?' or 'Is there a packaged app for this?' and I promised to write a blog post. So here is my first blog post I ever wrote. It's about the software ecosystem I have in use and the creation of word clouds and social graphs with Yammer data.

The purpose of this blog post is more to describe the roadmap with the software ecosystem that I used to create Yammer analytics. A detailed description for Windows user 'How to Run Yammer Analytics' will be posted later with the code published on Github.

Software Ecosystem

First of all I have to say: I don't have a packaged app to create Yammer analytics. The software ecosystem in use consists of open source programs that are available for all operating system OS/X, Linux and Windows.

The programming languages in my environment are Ruby, Java and Python. For the word cloud creation you can use the IBM WCG with Java. For Yammer access via the API there is the 'yammer4r' gem available. Using 'gem install yammer4r' will also install all dependencies like JSON and the OAuth security software for Ruby. To avoid excessive Yammer API usage it is recommended to store the data downloaded from Yammer into a database. I selected MySQL community server as the database for the Yammer data, but that is not a must. Running the MySQL client in an easy way in Ruby,  the 'mysql' gem and the 'activerecord' gem from Rails were installed: 'gem install mysql' and 'gem install activerecord'. I recommend to install Rails, which includes the 'activerecord' gem. For downloads of user images from Yammer the famous cURL program with the Ruby 'curb' gem comes to action: 'gem install curb'. For image editing and converting, ImageMagick with the Ruby gem 'rmagick' is installed on my PC: 'gem install rmagick'. Last not least the Graphviz visualization software is required for creating social graphs. Further products I played with for social network analysis are NodeBox for Mac OS/X only and the Java based Vizster program.


bhuelbue:~
→ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)

bhuelbue:~
→ ruby -v
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.5.0]

bhuelbue:~ 
→ mysql -V
mysql  Ver 14.14 Distrib 5.1.54, for apple-darwin10.5.0 (i386) using  EditLine wrapper

bhuelbue:~  
→ curl -V
curl 7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
Protocols: tftp ftp telnet dict ldap http file https ftps 
Features: GSS-Negotiate IPv6 Largefile NTLM SSL libz

bhuelbue:~
→ convert --version
Version: ImageMagick 6.6.6-10 2011-01-05 Q8 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2011 ImageMagick Studio LLC
Features:  OpenCL

Yammer Database

In the Yammer API description you can find all data fields that are delivered for messages and user data via the API.  With the 'yammer_create_oauth.rb' program delivered with the 'yammer4r' ruby package you can register your Ruby program in Yammer. Once you have registered your Yammer Ruby program and you do have a valid 'oauth.yml' file you can start to download user information and messages from Yammer and save all information into a database. I have 7 tables in my MySQL 'ycyam_development' database. The most important tables for Yammer analytics are 'uyams' and 'fyams' for user information and for the public messages.

mysql> show tables;
+-----------------------------+
| Tables_in_ycyam_development |
+-----------------------------+
| flikes                      |
| fyams                       |
| fyfiles                     |
| fyimgs                      |
| fymodules                   |
| gyams                       |
| schema_migrations           |
| uyams                       |
+-----------------------------+
8 rows in set (0.00 sec)

mysql> describe uyams;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int(11)      | NO   | PRI | NULL    | auto_increment |
| yamid           | int(11)      | NO   |     | NULL    |                |
| name            | varchar(128) | NO   |     | NULL    |                |
| full_name       | varchar(128) | YES  |     | NULL    |                |
| email           | varchar(128) | YES  |     | NULL    |                |
| network_id      | int(11)      | NO   |     | NULL    |                |
| network_name    | varchar(64)  | NO   |     | NULL    |                |
| location        | varchar(45)  | YES  |     | NULL    |                |
| job_title       | varchar(128) | YES  |     | NULL    |                |
| state           | varchar(45)  | NO   |     | NULL    |                |
| stats_followers | int(11)      | YES  |     | NULL    |                |
| stats_updates   | int(11)      | YES  |     | NULL    |                |
| stats_following | int(11)      | YES  |     | NULL    |                |
| expertise       | text         | YES  |     | NULL    |                |
| url             | varchar(255) | YES  |     | NULL    |                |
| web_url         | varchar(255) | YES  |     | NULL    |                |
| mugshot_url     | varchar(255) | YES  |     | NULL    |                |
| created_at      | datetime     | YES  |     | NULL    |                |
| updated_at      | datetime     | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+
19 rows in set (0.00 sec)

mysql> describe fyams;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| yamid         | int(11)      | NO   |     | NULL    |                |
| sender_id     | int(11)      | NO   |     | NULL    |                |
| replied_to_id | int(11)      | YES  |     | NULL    |                |
| thread_id     | int(11)      | YES  |     | NULL    |                |
| message_type  | varchar(45)  | YES  |     | NULL    |                |
| sender_type   | varchar(45)  | YES  |     | NULL    |                |
| client_type   | varchar(45)  | YES  |     | NULL    |                |
| url           | varchar(255) | YES  |     | NULL    |                |
| web_url       | varchar(255) | YES  |     | NULL    |                |
| body_parsed   | text         | YES  |     | NULL    |                |
| body_plain    | text         | YES  |     | NULL    |                |
| created_at    | datetime     | YES  |     | NULL    |                |
| updated_at    | datetime     | YES  |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
14 rows in set (0.00 sec)

Word Cloud Generator

The amount of daily data produced by employees public posts is about 30k or much more depending on the number of members in the Yammer network and their activity. Word clouds are a popular graphic representations of the most commonly used words in the messages posted by Yammer members on a given time frame.

A word cloud of all messages of one day in Yammer YCN.

Thanks to IBM researcher Jonathan Feinberg's IBM Word Cloud Generator WCG program word clouds are easy for anyone to create. WCG is a command line application and can be used in an automated process to process a word cloud. It is a Java application, and requires only a Java 5 or Java 6 runtime environment. The WCG program uses a configuration file 'config.txt' to control all of the settings that affect the output. WIthin the 'config.txt' there is a pointer to a stop-word file, where you can exclude unwanted words. A typical call from the command line to generate a word cloud 'wc.png' from an input file 'msg_body-txt' looks like shown below.

bhuelbue:~  
→ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
bhuelbue:~
→ java -jar ibm-word-cloud.jar -c config.txt -w 800 -h 800 < msg_body.txt > wc.png
IBM Word Cloud Generator build 32
Copyright (c)2009 IBM

Yammer messages are unstructured text, so the WCG program uses a word's relative frequency as its weight for the font size. If the input file for WCG is a tab-separated data file containing weighted phrases like the name of the active members who posted Yammer messages you can produce a user cloud.

A user cloud of one day with active member names

Social Graphs

A more personalized view of activities in Yammer is the user wall showing the avatar of all active members during a given timeframe within one cluster. With the Yammer API you access user information of all Yammer members within a network. Once you have the "mugshot_url" for a user, you can download the user image for further usage in social graphs. Yammer allows images in bmp, png, gif and jpeg format for the avatar, so there is some need to modify the user images in size and format. The software I used for downloading the user images is cURL and for converting and editing the images the ImageMagick program is used. The social graphs are produced with the open source visualization software Graphviz. Here is a sample 'test.gv' file to produce a user wall.

graph g {
  size="6,6";
  ratio="fill";
  rankdir="LR";

  n1 [label="", shape="box", width=.6, height=.6, image="~/img/n1.jpg"];
  n2 [label="", shape="box", width=.6, height=.6, image="~/img/n2.jpg"];
  n3 [label="", shape="box", width=.6, height=.6, image="~/img/n3.jpg"];
  ... (many more nodes here)
  nn [label="", shape="box", width=.6, height=.6, image="~/img/nn.jpg"];
}

With the user images stored in the directory '~/img' and the above 'test.gv' you can run the following Graphiz command:

bhuelbue:~
→ osage -Tpng -o test.png test.gv 

The output of the above command is the following graph in png format 'test.png':

User wall of all active members in Yammer during one week

Yammer messages are organized in threads. Parts of each Yammer message are the 'sender_id', 'thread_id' and the 'replied_to_id'. The 'sender_id' points direct to the user who issued the message and the 'replied_to_id' points to another message. If you pick up the 'sender_id' of this message you have a conversation between two Yammer users. You now can collect all Yammer conversations in a given time frame and generate a 'conversation.gv' file for automated graph generation with a Graphviz command. See a skeleton of of the 'conversation.gv' file below.

digraph g {
  size="7,7";
  ratio="fill";
  rankdir="LR";

  m1 [label="", shape="box", width=.6, height=.6, image="~/img/m1.jpg"];
  m2 [label="", shape="box", width=.6, height=.6, image="~/img/m2.jpg"];
  m1  ->  m2;
  m3 [label="", shape="box", width=.6, height=.6, image="~/img/m3.jpg"];
  m4 [label="", shape="box", width=.6, height=.6, image="~/img/m4.jpg"];
  m3 ->  m4;
  m2 [label="", shape="box", width=.6, height=.6, image="~/img/m2.jpg"];
  m5 [label="", shape="box", width=.6, height=.6, image="~/img/m5.jpg"];
  m2  ->  m5;
  ... (many other nodes and edges)
  mx [label="", shape="box", width=.6, height=.6, image="~/img/mx.jpg"];
  my [label="", shape="box", width=.6, height=.6, image="~/img/my.jpg"];
  mx ->  my;
}

With 'conversation.gv' and all users images stored in '~/img' run the following Graphviz program from the command prompt.

bhuelbue:~
→ fdp -Tpng -o conversation.png conversation.gv 

The result is a directed graph 'conversation.png':

Directed graph of all Yammer conversations for typical day

For Mac OS/X users NodeBox is a wonderful program to visualize data using the Python language. Mapping the member name to the email domain and suppressing the .com top level domain, you get a thread analysis showing which companies are connected on Yammer Customer Network YCN. The directed graph below was produced with NodeBox and demonstrates how companies share messages on YCN.

NodeBox social graph for conversation between domains

Vizster is a Java program to visualize online social networks. In Vizster you can either load data from an XML file describing the network or from a custom MySQL database. A Ruby user program with the Yammer API was written to create XML files and to load MySQL database table with social network data for Vizster. 

Yammer social network graph produced with Vizster 

In my database schema for Vizster the Yammer user data and internal active directory data of the enterprise user were synchronized and stored into the table 'profiles'. In both Vizster tables - 'graphs' and 'profiles' - I added the 'id' field and pluralized the default Vizster 'graph' table to 'graphs'. This was required to load the tables with a Ruby user program that interfaced to Yammer via the API and to MySQL via 'activrecord'. In order to use the Yammer 'profiles' table in Vizster you need to amend the 'VizsterDBLoader.java' program reflecting the pluralized table name 'graphs', new fields and the modified queries.

CREATE TABLE  `fyam_development`.`graphs` (
  `id` int(11) NOT NULL auto_increment,
  `uid1` int(10) unsigned NOT NULL default '0',
  `uid2` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE TABLE  `yam_development`.`profiles` (
  `id` int(11) NOT NULL auto_increment,
  `yamid` int(11) NOT NULL,
  `name` varchar(255) NOT NULL,
  `full_name` varchar(255) default NULL,
  `network_id` int(11) NOT NULL,
  `network_name` varchar(64) NOT NULL,
  `location` varchar(64) default NULL,
  `job_title` varchar(128) default NULL,
  `stats_followers` int(11) default NULL,
  `stats_updates` int(11) default NULL,
  `stats_following` int(11) default NULL,
  `url` varchar(255) default NULL,
  `web_url` varchar(255) default NULL,
  `mugshot_url` varchar(255) default NULL,
  `userid` varchar(16) default NULL,
  `fullname` varchar(255) default NULL,
  `displayname` varchar(255) default NULL,
  `employmenttype` varchar(128) default NULL,
  `reportsto` varchar(16) default NULL,
  `distinguishedname` varchar(255) default NULL,
  `globalid` int(10) unsigned default NULL,
  `unixid` int(10) unsigned default NULL,
  `city` varchar(128) default NULL,
  `country` varchar(128) default NULL,
  `phonetieline` varchar(128) default NULL,
  `telephonenumber` varchar(128) default NULL,
  `buildingabbr` varchar(128) default NULL,
  `buildingcode` varchar(128) default NULL,
  `buildingname` varchar(128) default NULL,
  `businessunitcode` varchar(64) default NULL,
  `lastmodified` varchar(16) default NULL,
  `manager` varchar(2) default NULL,
  `mrrole` varchar(2) default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

4 comments:

  1. Hi Bruno! Thank you so much for creating such a thorough and informative blogpost. This is of huge value to everyone trying to build analytics and visualize their Yammer networks. Thanks for sharing snippets of the code also.

    - Maria Ogneva, Head of Community, Yammer

    ReplyDelete
  2. Bruno: As the admin for Fortune 100 Yammer Network, all I can say is this is fantastic! You rock!

    The Northstar Nerd
    http://www.NorthstarNerd.org/

    ReplyDelete
  3. Hi Bruno,

    I am trying to implement Yamanalytics on windows

    the rmagick gem build is failing - the other gem builds worked.

    can you contact me at yzidell@gmail.com?

    thanks!

    ReplyDelete
  4. Thanks for the detailed and hell of informative post. i have enjoyed a lot.currently associated with a ruby on rails development company who are planing to do something similar like this.

    ReplyDelete