Tuesday, August 9, 2011

Yamanalysis - A Ruby Yammer API Usage Example

This guide is written for Yammer users having available a computer - an 'old grandpa device with real buttons' - and want to run Yammer analysis. Yamanalysis is a known topic in my Yammer networks where I published some social analytics like 'word cloud', 'user cloud' and 'thread analysis'.

Yamanalysis - Yammer Analysis

In my first blog post 'Word Clouds and Social Graphs in Yammer' I promised to deliver a detailed description for Windows user 'How to Run Yammer Analytics' with the Ruby code published on Github.

Prerequisites

This guide is targeted on Windows, so I assume you do have a Windows XP, Windows 7 or any other Windows instance up and running with access to the internet. You should have administrator rights in order to install software on the Windows system. Have also a short check that you have enough disk space available for the installation. You do need about 1GB disk space. I further assume you have installed an editor like Notepad++ to change source code. The download of the Ruby Development-Kit requires 7-Zip. Download this from 7-Zip and install.

You should also be familiar with the Windows command interpreter, cmd.exe. Used without parameters, cmd displays the Windows version and copyright information. All my Ruby programs for Yammer analysis are started via the command line interface. I start cmd.exe with parameter /u for unicode output.

You should also know how to change environment variables in Windows: Management of environment variables is provided in the System Properties dialog box. Open Control Panel-Performance and Maintenance-System (or right-click on My Computer and choose "Properties"). In the box that opens, click the "Advanced" tab to obtain the next dialog box. Next, click the button "Environment Variables". Select the 'Path' variable and click the 'Edit' button. The dialog box shows how to change the 'Path' variable.

Manage Environment Variable - Path

Last not least Java Version 5 or 6 is required for many social network analysis programs like the IBM WCG, Cytoscape, Vizster or Gephi. Assure you have Java on your Windows instance available. The program 'yam_thread_img.rb' will also generate SQL code for Gephi and a text file for import into Cytospace.

Downloads

I created a directory 'C:\Downloads' where I saved all my software downloads:

  • MySQL: dev.mysql.com ➔ C:\Downloads\mysql-5.5.14-win32.msi
  • MySQL GUI: MySQLGUITools ➔ C:\Downloads\mysql-gui-tools-5.0-r17-win32.msi
  • ImageMagick: imagemagick.org ➔ C:\Downloads\ImageMagick-6.7.1-0-Q8-windows-dll.exe
  • Curl: curl.haxx.se ➔ C:\Downloads\curl-7.21.6-devel-mingw32.zip
  • Ruby: rubyinstaller.org ➔ C:\Downloads\rubyinstaller-1.9.2-p180.exe
  • Dev-Kit: rubyinstaller.org ➔ C:\Downloads\DevKit-tdm-32-4.5.1-20101214-1400-sfx.exe
  • Graphviz: graphviz.org ➔ C:\Downloads\graphviz-2.28.0.msi
  • IBM WCG: IBM WCG ➔ C:\Downloads\wordcloud-build-32.zip

Install MySQL

Start with the installation of the database software and download MySQL from dev.mysql.com. I selected the version 5.5.14 and a mirror near by. The MySQL download started and was saved into 'C:\Downloads\mysql-5.5.14-win32.msi'. During the installation the following configuration options were selected:

  • 'Detailed Configuration'
  • 'Server Machine'
  • 'Multifunctional Database'
  • C: Installation Path
  • Enable TCP/IP Networking Port 3306
  • Server SQL mode: 'Enable Strict Mode'
  • Best Support for Multilingualism. Makes UTF8 the default character set.
  • Install as Windows Service 'MySQL' and 'Launch the MySQL Server automatically'.
  • Include Bin Directory in Windows Path
  • Select a new root password for MySQL
  • Enable root access from remote machines.

After all selections are done the MySQL installation continued with 'Prepare configuration':

  ‣ Write configuration file (C:\Programme\MySQL\MySQL Server 5.5\my.ini)
  ‣ Start service
  ‣ Apply security settings

The installation finished showing the following messages:

  • Configuration file created.
  • Windows service MySQL installed.
  • Service started successfully.
  • Security settings applied.

Press [Finish] to close the Wizard. The wizard now has created the 'C:\Programme\MySQL\MySQL Server 5.5\my.ini' configuration file and installed MySQL as a service on your Windows instance.

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Dokumente und Einstellungen\Administrator>mysql -u root -p
Enter password: ******

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.14 MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

mysql> use mysql;
Database changed

mysql> select Host, User, Password from user;

+-----------+------+-------------------------------------------+
| Host      | User | Password                                  |
+-----------+------+-------------------------------------------+
| localhost | root | *172047A30C1D575EF849AEA88AB210FC5914B575 |
| %         | root | *172047A30C1D575EF849AEA88AB210FC5914B575 |
+-----------+------+-------------------------------------------+
2 rows in set (0.00 sec)

mysql> quit

mysql.ini

My recommendation is to make everything utf8 compliant and useable so that we do not run into special characters issues with global Yammer posts in different languages. To ignore client information and use the default server character set, use '--skip-character-set-client-handshake'. This makes MySQL behave like MySQL 4.0. Find below an extract of the 'my.ini' file with the character settings for utf8, that I have in use:

...
[mysqld]
...
# The default character set that will be used when a new schema or table is
# created and no character set is defined
init_connect='SET collation_connection=utf8_general_ci; SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake
...

MySQL GUI Tools

If you like to have an older GUI for the MySQL Administrator or Query Browser, then install MySQLGUITools. Click the '» No thanks, just take me to the downloads!' link text, select a mirror site and download the installation software into 'C:\Downloads\mysql-gui-tools-5.0-r17-win32.msi'. Accept the default installation folder and select complete setup type.

The newer GUI tool is named 'MySQL Workbench' and provides DBAs and developers an integrated tools environment for:
  • Database Design & Modeling
  • SQL Development (replacing MySQL Query Browser)
  • Database Administration (replacing MySQL Administrator)
For download click the link: MySQL Workbench, download and installing the MySQL Workbench should bring no problems.

ImageMagick

ImageMagick is required to resize and convert user photos into jpg image format in my Ruby programs. It is required for successful installation of RMagick, a Ruby interface for ImageMagick. Install the latest version of ImageMagick following the link imagemagick.org. From this link you should download 'ImageMagick-6.7.1-0-Q8-windows-dll.exe' into the directory 'C:\Downloads'.

When you run this program please install ImageMagick into a directory without white spaces, otherwise it will not work. I installed ImageMagick into 'C:/ImageMagick-6.7.1-Q8'. It is also important for a successful 'ruby rmagick' installation to click the checkbox 'Install development headers and libraries for C and C++' within the installation setup.

If Ruby is properly installed, then you can install the 'rmagick' ruby gem using the following command:

gem install rmagick --platform=ruby -- --with-opt-lib=C:/ImageMagick-6.7.1-Q8/lib --with-opt-include=C:/ImageMagick-6.7.1-Q8/include

Curl

In order to download the user images from Yammer onto your PC, my programs requires curl to be installed and interfaced to Ruby using the 'curb' gem. Download curl from curl.haxx.se to 'C:\Downloads\curl-7.21.6-devel-mingw32.zip' and extract to 'C:\Downloads\curl7.21.6' using 7-Zip or WinZip. Add C:\Downloads\curl7.21.6\bin;C:\Downloads\curl7.21.6\lib; to your cmd path.

C:\project\yam>path
PATH=C:\ImageMagick-6.7.1-Q8;C:\Downloads\curl7.21.6\bin;C:\Downloads\curl7.21.6\lib;C:\Ruby192\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Programme\Graphviz 2.28\bin;C:\Programme\MySQL\My SQL Server 5.5\bin

C:\project\yam>curl -V
curl 7.21.6 (i386-pc-win32) libcurl/7.21.6 OpenSSL/0.9.8r zlib/1.2.5 libidn/1.18 libssh2/1.2.8 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN Largefile NTLM SSL SSPI libz

Later on when we have installed Ruby 1.9.2 into the 'C:\Ruby192' directory and the Development Kit into C:\Devkit, we will install the 'curb' gem using the following command:

C:\Ruby192>gem install curb --platform=ruby -- --with-curl-lib="C:/Downloads/curl7.21.6/bin" --with-curl-include="C:/Downloads/curl7.21.6/include"
Temporarily enhancing PATH to include DevKit...
Building native extensions.  This could take a while...
Successfully installed curb-0.7.15
1 gem installed
Installing ri documentation for curb-0.7.15...
Installing RDoc documentation for curb-0.7.15...

Ruby

Install Ruby 1.9.2 into the directory 'C:\Ruby192' downloaded from rubyinstaller.org and check you have 'C:\Ruby192\bin;' in your path for the cmd window. You should see the following ruby version or a higher version.

C:\project\yam>ruby -v
ruby 1.9.2p180 (2011-02-18) [i386-mingw32]

C:\project\yam>gem -v
1.5.2

C:\project\yam>irb
irb(main):001:0> 2.times { |n| puts "Congratulation #{n==1 ? "dear Yammer friend" : "to You"}" }
Congratulation to You
Congratulation dear Yammer friend
=> 2
irb(main):002:0> quit

C:\project\yam>

Ruby new starter may want to read the 'bookofruby.pdf' coming with the Ruby installation. You will find the book in 'C:\Ruby192\doc'. My personal recommendation for beginners is the Ruby On Rails Tutorial by Michael Hartl.

Development-Kit

One of the challenges many Ruby on Windows users have is how to easily use native RubyGems from the community such as Curl or RMagick. The Development-Kit for Windows is a great help.
Download latest the Development-Kit via rubyinstaller.org. and save in 'C:\Downloads'. My actual version was 'DevKit-tdm-32-4.5.1-20101214-1400-sfx.exe'. Follow the instruction on the rubyinstaller wiki. I installed the Development-Kit into 'C:\Devkit'.

cd \Devkit
C:\Devkit>ruby dk.rb init

In the generated file config.yml you should now see:

# This configuration file contains the absolute path locations of all
# installed Rubies to be enhanced to work with the DevKit. This config
# file is generated by the 'ruby dk.rb init' step and may be modified
# before running the 'ruby dk.rb install' step. To include any installed
# Rubies that were not automagically discovered, simply add a line below
# the triple hyphens with the absolute path to the Ruby root directory.
#
# Example:
#
# ---
# - C:/ruby19trunk
# - C:/ruby192dev
#
--- 
- C:/Ruby192

You can verify this with the following ruby program in a cmd window:

C:\Devkit>ruby dk.rb review
Based upon the settings in the 'config.yml' file generated
from running 'ruby dk.rb init' and any of your customizations,
DevKit functionality will be injected into the following Rubies
when you run 'ruby dk.rb install'.

C:/Ruby192

C:\Devkit>

Finally run

C:\Devkit>ruby dk.rb install

Before we continue to install the Ruby gems perform a PATH check in your cmd window. It is important to have the following entries on top of the path, so that the installation of the Curl, RMagick will find the right DLL entries:

  • C:\ImageMagick-6.7.1-Q8;
  • C:\Downloads\curl7.21.6\bin;
  • C:\Downloads\curl7.21.6\lib;
  • C:\Ruby192\bin;
  • C:\Programme\Graphviz 2.28\bin;
  • C:\Programme\MySQL\MySQL Server 5.5\bin

C:\project\yam>path
PATH=C:\ImageMagick-6.7.1-Q8;C:\Downloads\curl7.21.6\bin;C:\Downloads\curl7.21.6\lib;C:\Ruby192\bin;C:\Programme\Graphviz 2.28\bin;C:\Programme\MySQL\MySQL Server 5.5\bin; .... other entries

C:\project\yam>

Ruby Gems

Open a cmd window and install the following Ruby gems:

  • Curl
gem install curb --platform=ruby -- --with-curl-lib="C:/Downloads/curl7.21.6/bin" --with-curl-include="C:/Downloads/curl7.21.6/include"
  • RMagick
gem install rmagick --platform=ruby -- --with-opt-lib=C:/ImageMagick-6.7.1-Q8/lib --with-opt-include=C:/ImageMagick-6.7.1-Q8/include
  • MySQL
gem install mysql
  • Rails
gem install rails
  • Yammer4r
gem install yammer4r

Graphviz

Graphviz is used to create the Yammer Analysis graphs in various image formats. Install Graphviz 2.28.0 from graphviz.org and run the graphviz-2.28.0.msi file. Open a cmd window and check Graphviz is properly installed:

C:\project\yam>dot -V
dot - graphviz version 2.28.0 (20110507.0327)

Word-Cloud Generator

IBM alphaWorks technologies is a great resource and offers a Java application that creates attractive "word clouds" from your source texts. You can download the 'wordcloud-build-32.zip' file (236kB) containing the Word-Cloud Generator tool from IBM WCG. If you do not have an IBM user ID, please register first.

Unzip the distribution archive in the directory of your choice. To uninstall, delete the distribution directory. I have unzipped the 'IBM Word Cloud' directory into 'C:/Downloads/wordcloud-build-32'.

C:\Downloads\wordcloud-build-32\IBM Word Cloud

Keep the IBM WCG jar file 'C:/Downloads/wordcloud-build-32/IBM Word Cloud/ibm-word-cloud.jar' in mind for your customization of the Yamanalysis program in 'yam_config.rb'.

Yamanalysis - Download and Configure

Download the Yamanalysis source code from Github. The file 'yam_config.rb' and the MySQL file 'db/yam_development_new.sql' require customization. You have to customize Yamanalysis usage like the OAuth filename, and the directory path where you installed the IBM Word Cloud Generator, your default mail domain, the database name, userid and password for MySQL.

When you use Yamanalysis for the first time, you have to download a certain amount of messages for analysis using he program 'yam_last_msg.rb'. The parameter '$YAM_HIST_CNT = 250' downloads 250 * 20 = 5,000 messages in one program call for a first download of public Yammer messages. If you run a daily download do not forget to lower this value according to your needs.
I use '$YAM_HIST_CNT = 50'.

Customize the following files:

  • 'yam_config.rb'
  • 'db/yam_development_new.sql'

Review the configuration for the IBM WCG and customize the font parameter for your word cloud with a proper value:

  • 'wordcloud/configuration.txt'

As soon as you finished your Yamanalysis configuration you should create the required MySQL database. Open a cmd window and create the MySQL database:

C:\project\yam>mysql < db/yam_development_new.sql -u root -p

Yammer Registration

The Yamanlysis programs make use of the Ruby 'yammer4r' gem, which requires credentials stored in an OAuth file for successful Yammer data access via the API. The OAuth file is generated using execution of the Ruby program 'yam_create_oauth_yml.rb' and need to be done in 4 steps:

  • Register your application on Yammer: Register Yammer

https://www.yammer.com/client_applications/new

Fill in the required fields and press the 'Create application' button. You will receive a new screen with the OAuth settings 'Consumer Key' and 'Consumer secret'.

  • Yammer Oauth Settings

Save Consumer key and Consumer secret into a txt file

Save the 'Consumer key' and the 'Consumer secret' into a txt file:

Consumer key
1A25samPlehHKG4FaddGg
Consumer secret
XSmzOLPsamPleEGHGKA767o7Vhq7g3p1HgiaTmPPd0

From there prepare the call of the Ruby 'yam_create_oauth_yml.rb' in a cmd window. Remember to set the name of the generated OAuth file was given in 'yam_config.rb':

C:\project\yam>ruby yam_create_oauth_yml.rb -k 1A25samPlehHKG4FaddGg -s XSmzOLPsamPleEGHGKA767o7Vhq7g3p1HgiaTmPPd0

Please visit the following URL in your browser to authorize your application,
then enter the 4 character security code when done:
https://www.yammer.com/oauth/authorize?oauth_token=oUlsamPlEzHkWC4GlqFw


You now have to authorize and then to enter a 4 character long secret for creation of the OAuth file. Copy the the authorize link
'https://www.yammer.com/oauth/authorize?oauth_token=oUlsamPlEzHkWC4GlqFw'
from the cmd window into your browser window and you receive the following screen:

Select the Network for Yamanalysis Application - Click Autorize

Click the 'Authorize' button and you will receive the code you should type into the cmd window, where the 'yam_create_oauth_yml.rb' is running:

  • Application enabled: 4 character secret code.

Add the code in your 'yam_create_oauth.yml.rb' cmd window

C:\project\yam>ruby yam_create_oauth_yml.rb -k 1A25samPlehHKG4FaddGg -s XSmzOLPsamPleEGHGKA767o7Vhq7g3p1HgiaTmPPd0

Please visit the following URL in your browser to authorize your application,
then enter the 4 character security code when done:
https://www.yammer.com/oauth/authorize?oauth_token=oUlsamPlEzHkWC4GlqFw
X12Z

You now should have a valid 'oauth.yml' file in your Yamanalysis project directory.

yam_user.rb

With a valid 'oauth.yml' file you can now download all user information from your Yammer network using the Ruby program 'yam_user.rb':

C:\project\yam>ruby yam_user.rb 
5983393-aaron-guest-Aaron Leon inserted ... 
...
5544237-alf-guest-Alf Døj inserted ... 
5980792-alfredvanpaaschen-guest-Alfred van Paaschen inserted ... 
Part A-1 done ... 50 messages
5702034-alison-guest-Alison Michalk inserted ... 
5530658-mckenziea-guest-Alissa McKenzie inserted ...
...
5842193-yholland-guest-Yvonne Holland inserted ... 
Part Y-1 done ... 17 messages
...
3850574-dlehoang-guest-Zu LeHoang inserted ... 
Part Z-1 done ... 6 messages

Inserted 1703
Updated  0
Duration 208.5936 sec.

Yammer user information is downloaded by letter ('A'..'Z') in chunks of 50 users records for each API call and stored in your MySQL database into the table 'yusers'. If the 'mugshot_url' has changed, the user photo is deleted from the './img' directory in order to force a later reload.

A download of ca. 1,700 user records takes about 210 seconds process time, and a download for 7,000 user took 775 seconds.

yam_user_wall_img.rb

The Ruby program 'yam_user_wall_img.rb' reads the table 'yusers' and downloads all user photos into the './img' directory using the 'mugshot_url'. The program is limited to 2,000 image downloads in one go. Adjust the limit value in the source code to your requirement.
Adding 2,000 (the current limit value) to the offset value and 1 to the part value in the source code enables you to produce more 'user walls' when you have more than 2,000 members in your network. Fell free to modify the program to your needs:

part = 0      # add 1 for the next part
limit = 2000  # modify what you like
offset = 0 + part * limit
downloadPhotos(limit, offset)
genGraphviz(part, limit, offset)

A download of ca. 1,700 photos is performed in 1,650 seconds. After the download of all images with a valid image extension a Graphviz '.gv' is generated and the 'jpg' image will be produced with the Graphviz 'osage' layout.

C:\project\yam>ruby yam_user_wall_img.rb 
User Count: 1703
Downloading 1703 photos ...
Yammer ERROR: 5530256 No valid ext in https://www.yammer.com/yamage/photos/274144/New_Twitter_photo_small
retry ...
Yammer ERROR: 5530256 No valid ext in https://www.yammer.com/yamage/photos/274144/New_Twitter_photo_small
retry ...
5530256 missing
GV File '/Users/bhuelbue/Documents/ruby/yam/graph/userwall_patchwork_p0.gv' created.
PNG file is ' /Users/bhuelbue/Documents/ruby/yam/graph/userwall_patchwork_p0_osage.png'
Duration 1649.2340 sec.

The produced image is shown below:

'yam_user_wall_img.rb' result image in directory './graph'

yam_last_msg.rb

The program downloads all public Yammer messages into the table 'ymessages'. Message attachements like likes, files, images and modules (questions, praise, ...) are stored into the tables 'ylikes', 'yfiles', 'yimgs' and 'ymodules'.

The number of downloaded number of messages is limited by the parameter $YAM_HIST_CNT in the 'yam_config.rb' program. A value of 250 will call the Yammer API 250 times and one API call will give you a chunk of 20 messages. So for an initial load you can download 250 * 20 = 5,000 messages. Once you do have enough historical messages you should set the $YAM_HIST_CNT down to your need: e.g. 25 (25 * 20 = 500 messages). Find below the console log for an initial load of about 15,000 messages:

C:\project\yam>ruby yam_last_msg.rb 
Messages: 0
Likes   : 0
Modules : 0
Images  : 0
Files   : 0
Last yamid 103500000
19 messages received ...
2011-08-06:106698418 2 0
2011-08-06:106697097 2 0
2011-08-06:106694187 2 0
2011-08-06:106688716 2 1
2011-08-06:106682164 2 0
...
2010-04-15:40820231 2 0
2010-04-15:40818740 2 0
19 messages inserted ...
Messages: 14721
Likes   : 7671
Modules : 763
Images  : 562
Files   : 203
14721 messages retrieved
Duration 3451.4682 sec.

yam_word_cloud.rb

This program creates a 'wordcloud/msgcloud.txt' file to be processed with the IBM Word Cloud Generator Java program. You have to specify the IBM WCG jar file location in 'Yam_config.rb' and you can change the 'word cloud' layout by changing the 'wordcloud/configuration.txt' file. Per default all messages between 'yesterday' and the 'current date' are loaded into the 'word cloud'. If you need a special time frame use:

C:\project\yam>ruby yam_word_cloud.rb -h
Usage: yam_word_cloud.rb [OPTIONS]
Create Yammer Word Cloud
[-ft] options yyyy-mm-dd
    -f, --from=yyyy-mm-dd            Select the starting date
    -t, --to=yyyy-mm-dd              Select the end date

    -h, --help                       Show this help message.

C:\project\yam>ruby yam_word_cloud.rb -f 2011-08-06 -t 2011-08-07
Between 2011-08-06 and 2011-08-07 ...
Messages: 3.7k
IBM WCG for 'C:/project/yam/wordcloud/wc_yam_20110806.png' ...
IBM Word Cloud Generator build 32
Copyright (c)2009 IBM
IBM WCG 'C:/project/yam/wordcloud/wc_yam_20110806.png'

IBM WCG 'word cloud' with input from table 'ymessages'

yam_thread_img.rb

The program 'yam_thread_img.rb' generates a Graphviz '.gv' file by processing the data in the 'ymessages' table in your MySQL db within a given time frame. The required user photos will be downloaded into the sub-directory 'img'.

All yesterday's messages until the current date are processed by default when no specific time parameters are given. It is recommended to run the 'yam_last_msg.rb' to update the 'ymessges' table with the latest Yammer posts available. The generated '.gv' file is then processed by various Graphviz layout programs and produces '.png' images within the sub-directory 'graph'.

In addition to the '.png' graphs, the program generates the tables 'nodes' and 'edges' in your MySQL db for a proper 'Import Database' in the Gephi program for further social network exploration.

Gephi Import Database Table 'nodes' and 'edges'

Another additional goodie delivered by this Ruby program is the import '.txt' file for the Cytoscape network analysis program. Use the import menu in the Cytoscape program and select first column as Source Interaction and the second column as Target Interaction. You can ignore the Interaction Type. Use the 'File' -> 'Import' -> 'Import from Table' menu and navigate to the '.txt' file for Cytoscape in the 'graph' sub-directory.

Cytoscape Import Network from your MySQL database

Read below how to call the Ruby program:

C:\project\yam>ruby yam_thread_img.rb -h
Usage: yam_thread_img.rb [OPTIONS]
Create Yammer Thread Analysis
[-ft] options yyyy-mm-dd
    -f, --from=yyyy-mm-dd            Select the starting date
    -t, --to=yyyy-mm-dd              Select the end date

    -h, --help                       Show this help message.

C:\project\yam>ruby yam_thread_img.rb -f 2011-08-07 -t 2011-08-08
Between 2011-08-07 and 2011-08-08 ...
135 threads found between 2011-08-07 and 2011-08-08
189 yamids (messages)
68 senderids (people)
25 initial posts
126 relations
GV File 'C:/project/yam/graph/thread_analysis_2011-08-07_2011-08-08.gv' created.
PNG file is 'C:/project/yam/graph/thread_analysis_2011-08-07_2011-08-08_dot.png'
PNG file is 'C:/project/yam/graph//thread_analysis_2011-08-07_2011-08-08_fdp.png'
PNG file is 'C:/project/yam/graph/thread_analysis_2011-08-07_2011-08-08_sfdp.png'
PNG file is 'C:/project/yam/graph/thread_analysis_2011-08-07_2011-08-08_twopi.png'
PNG file is 'C:/project/yam/graph/thread_analysis_2011-08-07_2011-08-08_circo.png'
PNG file is 'C:/project/yam/graph/thread_analysis_2011-08-07_2011-08-08_neato.png'
Cytoscape File 'C:/project/yam/graph/cyto_2011-08-07_2011-08-08.txt' created.
Gephi File 'C:/project/yam/graph/gephi_2011-08-07_2011-08-08.sql' created.
mysql yam_development < C:/project/yam/graph/gephi_2011-08-07_2011-08-08.sql running ...
Duration 15.5149 sec.

Graphviz FDP Layout
Graphviz TWOPI Layout
Graphviz CIRCO Layout

Monday, July 18, 2011

Word Clouds and Social Graphs in Yammer

Introduction

Yammer is a tool for internal corporate communications by bringing together all of a company’s employees inside a private and secure enterprise social network. Assumed you are a Yammer member, the Yammer API allows developers to create their own applications like word clouds, user clouds, user walls or social graphs demonstrating how Yammer can change the global enterprise culture into the direction of shared thinking by breaking the functional department silo thinking. For secure API authorization Yammer makes use of the open protocol OAuth.

During the last 2 years I developed command line tools using the Ruby programming language to produce word clouds and social graphs and published the graphs in Yammer. Over time the Yammer analytic graphs became popular in the community and found many practical usage in presentations for enterprise leadership demonstrating the value of social communities. We had a lot of fun when some users started to manipulate the daily word clouds by repeating favorite words in their yams. It was also amazing how fast some user recognized the user name, when I published a personalized word cloud of an organizational high ranked Yammer member and asked: 'Whose word cloud is it?'.

Due to the popularity of my Yammer analytics I was often asked: 'How did you do this?' or 'Is there a packaged app for this?' and I promised to write a blog post. So here is my first blog post I ever wrote. It's about the software ecosystem I have in use and the creation of word clouds and social graphs with Yammer data.

The purpose of this blog post is more to describe the roadmap with the software ecosystem that I used to create Yammer analytics. A detailed description for Windows user 'How to Run Yammer Analytics' will be posted later with the code published on Github.

Software Ecosystem

First of all I have to say: I don't have a packaged app to create Yammer analytics. The software ecosystem in use consists of open source programs that are available for all operating system OS/X, Linux and Windows.

The programming languages in my environment are Ruby, Java and Python. For the word cloud creation you can use the IBM WCG with Java. For Yammer access via the API there is the 'yammer4r' gem available. Using 'gem install yammer4r' will also install all dependencies like JSON and the OAuth security software for Ruby. To avoid excessive Yammer API usage it is recommended to store the data downloaded from Yammer into a database. I selected MySQL community server as the database for the Yammer data, but that is not a must. Running the MySQL client in an easy way in Ruby,  the 'mysql' gem and the 'activerecord' gem from Rails were installed: 'gem install mysql' and 'gem install activerecord'. I recommend to install Rails, which includes the 'activerecord' gem. For downloads of user images from Yammer the famous cURL program with the Ruby 'curb' gem comes to action: 'gem install curb'. For image editing and converting, ImageMagick with the Ruby gem 'rmagick' is installed on my PC: 'gem install rmagick'. Last not least the Graphviz visualization software is required for creating social graphs. Further products I played with for social network analysis are NodeBox for Mac OS/X only and the Java based Vizster program.


bhuelbue:~
→ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)

bhuelbue:~
→ ruby -v
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.5.0]

bhuelbue:~ 
→ mysql -V
mysql  Ver 14.14 Distrib 5.1.54, for apple-darwin10.5.0 (i386) using  EditLine wrapper

bhuelbue:~  
→ curl -V
curl 7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
Protocols: tftp ftp telnet dict ldap http file https ftps 
Features: GSS-Negotiate IPv6 Largefile NTLM SSL libz

bhuelbue:~
→ convert --version
Version: ImageMagick 6.6.6-10 2011-01-05 Q8 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2011 ImageMagick Studio LLC
Features:  OpenCL

Yammer Database

In the Yammer API description you can find all data fields that are delivered for messages and user data via the API.  With the 'yammer_create_oauth.rb' program delivered with the 'yammer4r' ruby package you can register your Ruby program in Yammer. Once you have registered your Yammer Ruby program and you do have a valid 'oauth.yml' file you can start to download user information and messages from Yammer and save all information into a database. I have 7 tables in my MySQL 'ycyam_development' database. The most important tables for Yammer analytics are 'uyams' and 'fyams' for user information and for the public messages.

mysql> show tables;
+-----------------------------+
| Tables_in_ycyam_development |
+-----------------------------+
| flikes                      |
| fyams                       |
| fyfiles                     |
| fyimgs                      |
| fymodules                   |
| gyams                       |
| schema_migrations           |
| uyams                       |
+-----------------------------+
8 rows in set (0.00 sec)

mysql> describe uyams;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int(11)      | NO   | PRI | NULL    | auto_increment |
| yamid           | int(11)      | NO   |     | NULL    |                |
| name            | varchar(128) | NO   |     | NULL    |                |
| full_name       | varchar(128) | YES  |     | NULL    |                |
| email           | varchar(128) | YES  |     | NULL    |                |
| network_id      | int(11)      | NO   |     | NULL    |                |
| network_name    | varchar(64)  | NO   |     | NULL    |                |
| location        | varchar(45)  | YES  |     | NULL    |                |
| job_title       | varchar(128) | YES  |     | NULL    |                |
| state           | varchar(45)  | NO   |     | NULL    |                |
| stats_followers | int(11)      | YES  |     | NULL    |                |
| stats_updates   | int(11)      | YES  |     | NULL    |                |
| stats_following | int(11)      | YES  |     | NULL    |                |
| expertise       | text         | YES  |     | NULL    |                |
| url             | varchar(255) | YES  |     | NULL    |                |
| web_url         | varchar(255) | YES  |     | NULL    |                |
| mugshot_url     | varchar(255) | YES  |     | NULL    |                |
| created_at      | datetime     | YES  |     | NULL    |                |
| updated_at      | datetime     | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+
19 rows in set (0.00 sec)

mysql> describe fyams;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| id            | int(11)      | NO   | PRI | NULL    | auto_increment |
| yamid         | int(11)      | NO   |     | NULL    |                |
| sender_id     | int(11)      | NO   |     | NULL    |                |
| replied_to_id | int(11)      | YES  |     | NULL    |                |
| thread_id     | int(11)      | YES  |     | NULL    |                |
| message_type  | varchar(45)  | YES  |     | NULL    |                |
| sender_type   | varchar(45)  | YES  |     | NULL    |                |
| client_type   | varchar(45)  | YES  |     | NULL    |                |
| url           | varchar(255) | YES  |     | NULL    |                |
| web_url       | varchar(255) | YES  |     | NULL    |                |
| body_parsed   | text         | YES  |     | NULL    |                |
| body_plain    | text         | YES  |     | NULL    |                |
| created_at    | datetime     | YES  |     | NULL    |                |
| updated_at    | datetime     | YES  |     | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+
14 rows in set (0.00 sec)

Word Cloud Generator

The amount of daily data produced by employees public posts is about 30k or much more depending on the number of members in the Yammer network and their activity. Word clouds are a popular graphic representations of the most commonly used words in the messages posted by Yammer members on a given time frame.

A word cloud of all messages of one day in Yammer YCN.

Thanks to IBM researcher Jonathan Feinberg's IBM Word Cloud Generator WCG program word clouds are easy for anyone to create. WCG is a command line application and can be used in an automated process to process a word cloud. It is a Java application, and requires only a Java 5 or Java 6 runtime environment. The WCG program uses a configuration file 'config.txt' to control all of the settings that affect the output. WIthin the 'config.txt' there is a pointer to a stop-word file, where you can exclude unwanted words. A typical call from the command line to generate a word cloud 'wc.png' from an input file 'msg_body-txt' looks like shown below.

bhuelbue:~  
→ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
bhuelbue:~
→ java -jar ibm-word-cloud.jar -c config.txt -w 800 -h 800 < msg_body.txt > wc.png
IBM Word Cloud Generator build 32
Copyright (c)2009 IBM

Yammer messages are unstructured text, so the WCG program uses a word's relative frequency as its weight for the font size. If the input file for WCG is a tab-separated data file containing weighted phrases like the name of the active members who posted Yammer messages you can produce a user cloud.

A user cloud of one day with active member names

Social Graphs

A more personalized view of activities in Yammer is the user wall showing the avatar of all active members during a given timeframe within one cluster. With the Yammer API you access user information of all Yammer members within a network. Once you have the "mugshot_url" for a user, you can download the user image for further usage in social graphs. Yammer allows images in bmp, png, gif and jpeg format for the avatar, so there is some need to modify the user images in size and format. The software I used for downloading the user images is cURL and for converting and editing the images the ImageMagick program is used. The social graphs are produced with the open source visualization software Graphviz. Here is a sample 'test.gv' file to produce a user wall.

graph g {
  size="6,6";
  ratio="fill";
  rankdir="LR";

  n1 [label="", shape="box", width=.6, height=.6, image="~/img/n1.jpg"];
  n2 [label="", shape="box", width=.6, height=.6, image="~/img/n2.jpg"];
  n3 [label="", shape="box", width=.6, height=.6, image="~/img/n3.jpg"];
  ... (many more nodes here)
  nn [label="", shape="box", width=.6, height=.6, image="~/img/nn.jpg"];
}

With the user images stored in the directory '~/img' and the above 'test.gv' you can run the following Graphiz command:

bhuelbue:~
→ osage -Tpng -o test.png test.gv 

The output of the above command is the following graph in png format 'test.png':

User wall of all active members in Yammer during one week

Yammer messages are organized in threads. Parts of each Yammer message are the 'sender_id', 'thread_id' and the 'replied_to_id'. The 'sender_id' points direct to the user who issued the message and the 'replied_to_id' points to another message. If you pick up the 'sender_id' of this message you have a conversation between two Yammer users. You now can collect all Yammer conversations in a given time frame and generate a 'conversation.gv' file for automated graph generation with a Graphviz command. See a skeleton of of the 'conversation.gv' file below.

digraph g {
  size="7,7";
  ratio="fill";
  rankdir="LR";

  m1 [label="", shape="box", width=.6, height=.6, image="~/img/m1.jpg"];
  m2 [label="", shape="box", width=.6, height=.6, image="~/img/m2.jpg"];
  m1  ->  m2;
  m3 [label="", shape="box", width=.6, height=.6, image="~/img/m3.jpg"];
  m4 [label="", shape="box", width=.6, height=.6, image="~/img/m4.jpg"];
  m3 ->  m4;
  m2 [label="", shape="box", width=.6, height=.6, image="~/img/m2.jpg"];
  m5 [label="", shape="box", width=.6, height=.6, image="~/img/m5.jpg"];
  m2  ->  m5;
  ... (many other nodes and edges)
  mx [label="", shape="box", width=.6, height=.6, image="~/img/mx.jpg"];
  my [label="", shape="box", width=.6, height=.6, image="~/img/my.jpg"];
  mx ->  my;
}

With 'conversation.gv' and all users images stored in '~/img' run the following Graphviz program from the command prompt.

bhuelbue:~
→ fdp -Tpng -o conversation.png conversation.gv 

The result is a directed graph 'conversation.png':

Directed graph of all Yammer conversations for typical day

For Mac OS/X users NodeBox is a wonderful program to visualize data using the Python language. Mapping the member name to the email domain and suppressing the .com top level domain, you get a thread analysis showing which companies are connected on Yammer Customer Network YCN. The directed graph below was produced with NodeBox and demonstrates how companies share messages on YCN.

NodeBox social graph for conversation between domains

Vizster is a Java program to visualize online social networks. In Vizster you can either load data from an XML file describing the network or from a custom MySQL database. A Ruby user program with the Yammer API was written to create XML files and to load MySQL database table with social network data for Vizster. 

Yammer social network graph produced with Vizster 

In my database schema for Vizster the Yammer user data and internal active directory data of the enterprise user were synchronized and stored into the table 'profiles'. In both Vizster tables - 'graphs' and 'profiles' - I added the 'id' field and pluralized the default Vizster 'graph' table to 'graphs'. This was required to load the tables with a Ruby user program that interfaced to Yammer via the API and to MySQL via 'activrecord'. In order to use the Yammer 'profiles' table in Vizster you need to amend the 'VizsterDBLoader.java' program reflecting the pluralized table name 'graphs', new fields and the modified queries.

CREATE TABLE  `fyam_development`.`graphs` (
  `id` int(11) NOT NULL auto_increment,
  `uid1` int(10) unsigned NOT NULL default '0',
  `uid2` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE TABLE  `yam_development`.`profiles` (
  `id` int(11) NOT NULL auto_increment,
  `yamid` int(11) NOT NULL,
  `name` varchar(255) NOT NULL,
  `full_name` varchar(255) default NULL,
  `network_id` int(11) NOT NULL,
  `network_name` varchar(64) NOT NULL,
  `location` varchar(64) default NULL,
  `job_title` varchar(128) default NULL,
  `stats_followers` int(11) default NULL,
  `stats_updates` int(11) default NULL,
  `stats_following` int(11) default NULL,
  `url` varchar(255) default NULL,
  `web_url` varchar(255) default NULL,
  `mugshot_url` varchar(255) default NULL,
  `userid` varchar(16) default NULL,
  `fullname` varchar(255) default NULL,
  `displayname` varchar(255) default NULL,
  `employmenttype` varchar(128) default NULL,
  `reportsto` varchar(16) default NULL,
  `distinguishedname` varchar(255) default NULL,
  `globalid` int(10) unsigned default NULL,
  `unixid` int(10) unsigned default NULL,
  `city` varchar(128) default NULL,
  `country` varchar(128) default NULL,
  `phonetieline` varchar(128) default NULL,
  `telephonenumber` varchar(128) default NULL,
  `buildingabbr` varchar(128) default NULL,
  `buildingcode` varchar(128) default NULL,
  `buildingname` varchar(128) default NULL,
  `businessunitcode` varchar(64) default NULL,
  `lastmodified` varchar(16) default NULL,
  `manager` varchar(2) default NULL,
  `mrrole` varchar(2) default NULL,
  `created_at` datetime default NULL,
  `updated_at` datetime default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8