Judging a Commit by Its Cover

For the MSR 2016 Challenge, I applied data science to test a gut feeling I had: commits with weird, unusual log messages are often committing lower quality than code than boring, ordinary commit messages.

So: Are commits that look fishy…

…actually hiding dubious code?

Is my hunch true? (Spoilers: Marginally).

To test, we correlated build status from Travis-CI with n-gram language models on commit messages. Here’s the abstract:

Developers summarize their changes to code in commit messages. When a message seems “unusual,” however, this puts doubt into the quality of the code contained in the commit. We trained n-gram language models and used cross-entropy as an indicator of commit message “unusualness” of over 120 000 commits from open source projects. Build statuses collected from Travis-CI were used as a proxy for code quality. We then compared the distributions of failed and successful commits with regards to the “unusualness” of their commit message. Our analysis yielded significant results when correlating cross-entropy with build status.

Intrigued? Read the preprint! (And fork the replication code and data!)

EDIT: Added my presentation.

Acknowledgements to my supervisor, Abram Hindle, and my colleagues S. Kalen Romansky, and Shaiful Chowdhury for their reviews and comments!