Table of Contents

Introduction

The TechNet Wiki Ninja Belts are an award initiative for authors of articles on TechNet Wiki. As shown in that link, there are a lot of requirements for each progression up through the belts. These requirements cover numerous activities across TechNet Wiki and also from the MSDN profile page statistics. Because this information is scattered and difficult to collect, the work involved in maintaining the Ninja Belt Awards list was becoming unmanageable.

I had been mentioning the need for a "centralized database of everything" for TechNet Wiki activities for several years, as the number of activities grew. But it was the Ninja Belts that were the final straw, which led to the work I am about to describe.

The only way to viably maintain the list of who has what belt, would be to pull all the information into a single relational database source and automate the process of calculating who has which belt.

The Database

Here is a database diagram of how all the tables relevant to the Ninja Belt calculator. Click to enlarge.

[update] This has grown considerably since this article was first published]

 

The Sources

Here is a brief explanation of each of the sources of data I used to collect all of this information.

Top Contributor Winners

This was one of the most painful sets of data to collect, even though it was all created by myself, and partly database-generated in the first place! I wrote a crawler in the all-mighty WPF, which [thanks to the WebBrowser component] crawls through the  last week's worth of revisions, which is listed here. Unfortunately, the auto-generate doesn't give me the winners, that is a subjective decision I have to make. It simply offers the top 10 most changed articles, from which I select a winner. Also unfortunately, not realizing I would need to crawl this data, I used to heavily tweak the text a lot, so it meant writing a "crawler for my crawler results" was quite a job!

I have now extended my Top Contributor Winners blog generation tool, to allow me to select the winners and THEN generate the final blog. This means I can include the new winners statistics section in the TC blog. This also means I save the winners when generating the blog, without need to crawl the blog any more

Guru Winners

The TechNet Guru competition is another of my early initiatives, to help celebrate the work that our community brings to the Wiki. Each month, Microsoft employees, WIki councillors and MVPs from many technical categories review and award medals to the authors of new content published on TechNet Wiki. These winners are mostly listed here, but as this is manually maintained, is often outdated. Over recent times, Gurus have been generating their own user pages of winning articles, pages listing winners of different categories, and many blogs have analysed the groups and other stats of our winners.

So to collect this "Guru-win" data, I adapted the techniques used for the Top Contributors Crawler. I made a new crawler for the Guru Awards blogs urls, which I again would have to manually start it, but would pull the data into the database. Not wanting to bore you too much, this crawler had to go down to each article, to pull out the Wiki User Id, which we tie things into the database with.

Featured Articles

I wrote another crawler to pull the initial list of Featured Articles from this wiki page that listed all previous featured articles. Now that this is imported, the Ninja Bloggers can simply add a new entry through our Azure ASP.Net MVC 5 intranet tool, now called the TechNet Wiki Dojo. This has since updated several times, most recently throughout the second half of 2016. This therefore keeps this category of stats up to date, for the next calculator run.

Interviews

This was pulled with yet another crawler from this other wiki page that lists all past interviews. Again, I added an input form for Ninja Bloggers to add new interviews into the database via our intranet tool. This again keeps the data up to date for future belt calculations.

Profile Statistics

This is a little harder than the rest. This information is pulled from the Statistics and Achievements sections of the profile page of each user - here is mine.

Every time we want to run the Belt Calculator, we need to ensure we use the latest set of figures from these pages. Unfortunately, we do not have direct access to any of the databases behind TechNet Wiki or User Profiles, which is understandably for privacy reasons. However, using a crawler to pull the data from each user's profile page, based on user id (http://social.technet.microsoft.com/wiki/149154/ProfileUrlRedirect.ashx) would be a slow and painful exercise to do every time.

Thankfully, in their wisdom, Microsoft serve the data that is generated into the profile page from a JSON based web service. Once you know where and how, it's a simple matter of calling the same service to pull the raw data into my own system. Known users are users who have won an award, or had a featured article or interview. My new multi-threaded JSON client application can pull all known users (at time of writing, just over 350) in under 10 seconds! Uploading all this data into the Azure database then takes considerably longer, but I'm working on reducing that too.

The Tools

As discussed above, each of these sets of data had to be imported, and it was past the point of being able to do it manually. So I wrote a bunch of [mostly single-use] tools, to crawl web pages for the data instead.

For a brief introduction of how to do this, you can read this MSDN Gallery sample that I uploaded a while ago, which explains some basic techniques for web page manipulation. However, in most of my cases, it is quicker to fall back on old techniques like scanning through the raw html, looking for markers, like the start of a section or a bullet point, then finding text between expected HTML tags, for article/author name and article/author URL.

The Central Database & Website

Once all this data was collected into a single SQL database, the first obvious need was to provide access to the data for our bloggers and councillors.

I have already been providing data about my weekly crawls from an Azure based website for over a year. There is a feature within, to generate the weekly crawl winners blog, and even a translation tool, which can generate the blog in any language that it is loaded with.

This was a great opportunity to bring the site up to MVC 5 and EF6, which was a very painless exercise. As mentioned above, it has been reworked several times since and is constantly being updated with extra tools and features.


I've tried to make it as simple as possible for bloggers to add Featured Articles and Interviews, with a link from the home page.

You'll also notice a section for Guru awards which includes tools to monitor the voting process, but that's another story :)

The snapshot above is very old, the menu is much longer now, depending on a user's privileges. Some trade secrets too, including two how-to videos, which I can't share here. If you are reading this because you are helping me with the awards or calculator, please ask me for more info.

MVC is the most excellent way to quickly produce a website off of the back of a database, I highly recommend it to anyone who faces any kind of database driven web development. Although being one of the UK's longest ASP.NET developers, I can happily say I am delighted to add this to my core set of development skills.

The Calculator

Finally, we get to the calculator, which this article was meant to be about!

Off of the back of all of that collected information, with up to date profile statistics, we can finally run the calculator. For this I tried several coding styles, but found by far the quickest to be simply doing some multi-threaded crunching through each user to see if they pass the next level belt requirements. This takes seconds, with which I then update the User table, with any new Belt Ids. I also output the results as an ordered Data Grid, and also a plain text list - for copying out to emails, etc.

Summary

It was indeed been a hard slog to collect all this information into one central source, and build the tools to create and use it. The result however has kept us running these awards and the only way we would have been able to keep the calculator. It has also, been a lot of fun, and helped me upgrade my skill set once again to the latest and greatest MS has to offer. I feel we are very lucky to have such a large community of developers, reaching back almost a whole generation now.


[Updated for 2017]

So much has changed since this was written. The second half of 2016 was particularly turbulent, with major rewrites and new features. There are changes every month.

Most people who read this article these days would be those taking on one of these tasks. So I wish you the very best and as I've probably already said to you - don't hesitate to ask me any questions you need, to help you help us!