Has this ever happened to you?

You’re writing source code, say this:

package ca.ualberta.cs.example;

public class Hello {
    public static void main(string args[]) {
        if (args.length != 2)
            int exitstatus = 2;
            system.out.println("not enough args");
            system.exit(exitstatus);
        }
        system.out.println("hello, world!");
    }
}

And when you go to compile it, you’re bombarded with error messages:

Hello.java:6: error: variable declaration not allowed here
            int exitstatus = 2;
                ^
Hello.java:10: error: <identifier> expected
        system.out.println("hello, world!");
                          ^
Hello.java:10: error: illegal start of type
        system.out.println("hello, world!");
                           ^
Hello.java:12: error: class, interface, or enum expected
}
^
4 errors

Four error messages! So the error must be on at least line 6, right? Right…?

After scratching your head for a while, you figure out that line 5, which looks like this:

if (args.length != 2)

…should actually have an open brace at the end of the line, like this:

if (args.length != 2) {

How the crumb was I supposed to figure that out from pouring over the error messages?

There’s got to be a better way

Introducing Sensibility, a tool designed to not only find where these syntax errors are, but also to tell you how you might be able to fix it!

The secret: use natural language models—specifically n-gram and LSTM language models—to figure out where the “unnatural” code lurks.

But Eddie… using NLP on code? That’s craziness!

Nope! It’s naturalness.

Sensibility can produce a valid fix for about 50% of all single-token syntax errors, often producing the true fix that a novice programmer would have made. For example, given the broken code above, Sensibility produces this output:

Hello.java:5:29: try inserting '{'
        if (args.length != 2)
                               ^
                               {
Hello.java:6:13: try replacing int with {
            int exitStatus = 2;
            ^
            {

Either of which will fix the file, but the former is probably what you intended :).

For more details, read our paper Syntax and Sensibility: Using language models to detect and correct syntax errors to appear in SANER 2018. Here’s a link to a preprint. And here’s the BibTeX.

@inproceedings{eddie2018SANER2018sasulmtdacse,
 accepted = {2017-12-18},
 author = {Eddie Antonio Santos and Joshua Charles Campbell and Dhvani Patel and Abram Hindle and José Nelson Amaral},
 authors = {Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, José Nelson Amaral},
 booktitle = {25th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018)},
 date = {2018-04-21},
 funding = {NSERC Discovery, MITACS Accelerate},
 location = {Campobasso, Italy},
 pagerange = {1--11},
 pages = {1--11},
 role = { Author},
 title = {Syntax and Sensibility: Using language models to detect and correct syntax errors},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/santos2018SANER-syntax.pdf},
 venue = {25th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018)},
 year = {2018}
}

The code is available on GitHub.

Special thanks to my coauthors, Joshua Charles Campbell, Dhvani Patel, Abram Hindle and José Nelson Amaral.