MSRChallenge About



MSR Challenge. We invite researchers to demonstrate the usefulness of their mining tools on the source code repositories, bug data, and mailing list archives of the FreeBSD? distribution, Ultimate Debian Database, and the GNOME desktop suite by participating in the two MSR Challenge tracks:

The winners of both tracks will receive an award. Click here for a more detailed description of the challenge.


Since 2006 the IEEE Working Conference on Mining Software Repositories (MSR) has hosted a mining challenge. The MSR Mining Challenge brings together researchers and practitioners who are interested in applying, comparing, and challenging their mining tools and approaches on software repositories for open source projects. Unlike previous years that have examined a single project, multiple projects in isolation, or a single distribution of projects (GNOME). This year the MSR challenge involves examining FreeBSD? operating system and distribution, the GNOME Desktop Suite of projects, and the Debian/Ubuntu Distribution Database. The emphasis this year is on how the projects are inter-related, how they interact and possibly how they evolve and function within a larger software ecosystem.

There will be two challenge tracks: #1: general and #2: prediction. The winner of each track will be given the MSR 2010 Challenge Award.

Challenge #1: General

In this category you can demonstrate the usefulness of your mining tools. The main task will be to find interesting insights by analyzing the software repositories of the projects within FreeBSD?, GNOME Desktop Suite and the package related meta-data of the Debian/Ubuntu Distribution Database.

FreeBSD? is a BSD license BSD Unix distribution. It includes packages for desktop, server and embedded uses. FreeBSD? also takes responsibility for porting many programs to its distribution via FreeBSD?-ports.

GNOME Desktop Suite of projects. GNOME is very mature, and composed of a number of individual projects (nautilus, epiphany, evolution, etc.) and provides lots of input for mining tools.

The Ultimate Debian Database (UDD) is a database of packages, package dependencies and related bugs. It describes the Debian and Ubuntu distributions.

One could examine multiple projects within these ecosystems. For instance, examining API usage across all projects, training a predictive model on one project and assessing its accuracy on another, or examining how developers' activity spans multiple projects.

Participation is straightforward:

The challenge report should describe the results of your work and cover the following aspects: questions addressed, input data, approach and tools used, derived results and interpretation of them, and conclusions. Keep in mind that the report will be evaluated by a jury. Make sure your contributions, purpose, scope, results and importance or relevance of your work is highlighted within your report. Reports must be at most 4 pages long and in the ICSE format (

The submission will be via Easychair ( Each report will undergo a thorough review, and accepted challenge reports will be published as part of the MSR 2010 proceedings. Authors of selected papers will be invited to give a presentation at the MSR conference in the MSR Challenge track. Data

Feel free to use any data source for the Mining Challenge. For your convenience, we provide repository logs, mirrored repositories, bugzilla database dumps, and various other forms of data linked at the bottom.

Challenge #2: Prediction

This year, the MSR Mining Challenge prediction will involve predicting the growth of bug reports within Debian (in terms of final bug number) between February 1st and April 30th, 2010 (both days included). We want you to predict the newest bug number to appear on April 30th.

Participation is as follows:

Prediction submissions will be scored by their distance from the last bug number that occurs on April 30th 2010.

Frequently Asked Questions

Important Dates

These need to be hammered down:


Thanks To

People behind the Challenge

TODO for this page