Gavel: An Expo Judging System

Gavel is an automated end-to-end expo judging system. We’ve used it to automate judging at HackMIT, a 1000-person event with over 200 projects and 100 judges. Dozens of other events have also used Gavel since the software was released in private beta in late 2015.

Gavel fully automates expo judging logistics and project ranking — the system tells judges which projects to look at, collects judges’ votes, and produces a ranking of projects.

Here’s a demo of the judge interface:

Model

Gavel’s model is pretty different from traditional scoring methods — it’s based on the method of pairwise comparisons, and the system uses fancy math to come up with a ranking. For reasons described in detail in this post, pairwise comparisons work much better than having judges input scores from 1 to 10 or something like that.

History

The HackMIT team has used Gavel for four events now. We implemented the first prototype, a text message based system built on top of Twilio, at Blueprint in Spring 2015. In Fall 2015, we switched to a new algorithm and reimplemented the system as a web app for HackMIT. We continued to use that implementation for Blueprint in Spring 2016. For this year’s HackMIT, we completely redesigned the frontend and the admin panel of the application, changing basically everything except for the core algorithms. Now the system is a whole lot more user-friendly for both judges and event organizers.

Here’s what judging looked like at HackMIT this year:

Public Release

Using this system has made judging logistics super easy for us, and we believe that it has also led to higher-quality results. We thought that other events could benefit from using Gavel too, so we’re releasing the software to the public!

We’re releasing the latest version of the system, Gavel v1.0, which is what we used at HackMIT 2016. From this point on, Gavel will be developed as an open source project.

Gavel is free software released under AGPLv3, available for download here.

If you use Gavel for your event or have any feedback about anything, I’d love to hear about it.

Validity, Trust, and the Design of Interfaces

In secure communication schemes, there are three main goals — confidentiality, integrity, and authenticity. There are a lot of real-world software and systems out there that don’t get integrity and authenticity quite right, many times as a result of poor interface design.

Considering adversarial models, we see that there are very few situations where integrity by itself is useful. Standalone integrity checks would protect against network errors, for example, but they are not useful for much beyond that. In fact, when dealing with adversaries, integrity without authenticity is worthless. On the other hand, authenticity implies integrity, so that should be the gold standard for security.

Along the same lines, in digital signature systems, validity without trust is worthless. Data being correctly signed by some public key doesn’t mean anything unless the key is trusted. This is especially relevant in web of trust systems such as GPG, where a shared keyring is used to store public keys. Any program can add keys to the keyring, and some programs enable automatic key retrieval, so the mere presence of a public key in the keyring is meaningless. Data signed by a key can only be trusted if the public key has been assigned an explicit trust value or if it can be trusted under some web of trust model.

Interfaces and Security Implications

An interface is a boundary between components in a system, either between two pieces of software or between a program and a human. In security-sensitive applications, interface design is critical. Something that is relying on security-related software needs to know exactly what guarantees are provided by the software. With libraries, it’s necessary to know what is the client’s responsibility and what the library handles internally. When possible, software should do the most intuitive and secure thing by default, and the simplest way to use the software should also be the most secure way.

Bugs in the implementation of software, once identified, can be fixed without consequence. On the other hand, bad interface design is incredibly difficult to fix once there exists software that uses the API — changes could break compatibility with all existing software that uses the API.

Case Studies

To better understand the importance of good interface design in security-critical applications, we can critique existing systems, looking at both human-computer interface boundaries and software-software interface boundaries.

GPGMail for macOS

GPGTools distributes a suite of GPG-related software for macOS. Among these tools is GPGMail, an Apple Mail plugin that lets users send and receive signed and encrypted mail using PGP. When someone receives PGP-signed email, it looks like this:

GPGMail Header

The indication to the user is that everything is fine — there’s a nice big check mark visible. With the default configuration, the plugin automatically downloads the public key and verifies that the signature is valid. Perfect!

Except if we click on the check mark, we can see that everything is not ok:

GPGMail Key Information

In this screen, we can see that “this signature is not to be trusted”. The signature is valid, but it cannot be trusted! It’s a simple thing to check, but the user has to remember to manually check this for every single email received if they want security guarantees. Otherwise, someone could spoof an email and sign it with some other public key that has been uploaded to the key servers, and the mail client would faithfully automatically download the untrusted key from the public key servers and verify that the signature is valid. As the software is designed, the check mark basically means nothing in terms of security.

This is bad design! It’s not a bug in the cryptographic protocols or the implementation of the software, but it’s arguably at least as bad. The mail plugin behaves in an unintuitive manner, and the check mark gives users a false sense of security.

Better Designs

There are email clients handle this better. For example, Evolution shows the following when viewing an email signed by an untrusted key:

Evolution showing a warning about an untrusted signature

There’s a clear warning in the user interface. When the email is signed by a trusted key, it looks very different:

Evolution showing an email signed with a trusted signature

CPAN Signature Checks

CPAN is a package manager for Perl. Like many other package managers, CPAN has support for digital signatures, which can be enabled using the check_sigs option:

CPAN packages can be digitally signed by authors and thus verified with the security provided by strong cryptography. The exact mechanism is defined in the Module::Signature module.

Unfortunately, due to the way Module::Signature’s interface is built, App::Cpan doesn’t really end up providing any strong cryptographic guarantees.

CPAN checks signatures like this:

my $rv = eval { Module::Signature::_verify($chk_file) };

if ($rv == Module::Signature::SIGNATURE_OK()) {
    $CPAN::Frontend->myprint("Signature for $chk_file ok\n");
    return $self->{SIG_STATUS} = "OK";
} else {
    # print error message and abort
}

Module::Signature, in turn, runs this:

my @cmd = (
    $gpg, qw(--verify --batch --no-tty), @quiet, ($KeyServer ? (
        "--keyserver=$keyserver",
        ($AutoKeyRetrieve and $version ge '1.0.7')
            ? '--keyserver-options=auto-key-retrieve'
            : ()
    ) : ()), $fh->filename
);

In the code, $AutoKeyRetrieve is enabled by default, and $keyserver is set to pool.sks-keyservers.net, a public PGP key server. This means that when checking the signature, if the public key is not found in the local keyring, it will automatically be fetched from the key server.

To verify the signature, Module::Signature runs gpg in a subprocess and checks the return value, yielding SIGNATURE_OK if gpg returns 0 and yielding SIGNATURE_BAD otherwise.

Unfortunately, gpg’s return value doesn’t indicate whether the signature is trusted or not — it indicates whether the signature is valid or not. When using a shared keyring and especially when enabling automatic key retrieval, this guarantee doesn’t mean anything. It’s no better than a checksum — it offers no protection against an adversary.

Because of this vulnerability, a man-in-the-middle attacker can run arbitrary code on machines running cpan install, even when CPAN has signature checks enabled.

Who is to Blame?

It’s unclear exactly which package is responsible for this security issue. Is it App::Cpan, Module::Signature, or gpg? App::Cpan uses a function called verify() without understanding exactly what verify means. Module::Signature uses the return value from gpg and doesn’t provide any distinction in signature status besides SIGNATURE_OK and SIGNATURE_BAD. gpg doesn’t expose a nice programmatic interface for simultaneously verifying a signature and validating the trustworthiness of the key that was used, instead only printing warning text to stderr when a key is untrusted.

At this point, it’s pretty hard to fix this issue. It’s hard for Module::Signature to change its API — it’s depended on by hundreds of modules, and an API change could break many of them.

Better Designs

There are libraries similar to Module::Signature that do a better job with the design of their API. For example, GNOME Camel, which is used by Evolution, has a function camel_cipher_context_verify_sync(), which returns one of the following results after verifying a signature:

typedef enum _camel_cipher_validity_sign_t {
    CAMEL_CIPHER_VALIDITY_SIGN_NONE,
    CAMEL_CIPHER_VALIDITY_SIGN_GOOD,
    CAMEL_CIPHER_VALIDITY_SIGN_BAD,
    CAMEL_CIPHER_VALIDITY_SIGN_UNKNOWN,
    CAMEL_CIPHER_VALIDITY_SIGN_NEED_PUBLIC_KEY
} CamelCipherValiditySign;

This return type is much richer than the binary return type of Module::Signature’s verify(), so it’s possible to disambiguate between good signatures made with a trusted key and valid signatures made with an untrusted key.

Conclusion

In both human-computer and software-software interfaces, interface design is critical to security. Bad designs can lead to serious security vulnerabilities, and these vulnerabilities can be incredibly difficult to fix. For this reason, interfaces of security-related components must be designed carefully, have secure defaults, and clearly communicate the responsibilities of the library and the responsibilities of the user.

FAQ Readers Redux

Another year, another HackMIT FAQ readers experiment. From May 31 to August 7, we had the following item in HackMIT’s FAQ:

FAQ Item

Like last year, the experiment wasn’t particularly scientific. We just wanted to give people the opportunity to email us random stuff.

Last year, we did a qualitative analysis of the emails we received, so for a change, this year, we took a very quantitative approach.

Basics

We had 227 unique individuals email us (compared to 493 last year). We responded to 220 of these people, giving us a response rate of 97% (up from 80% last year).

The distribution of response times was pretty good:

Response time distribution

A handful of team members were responsible for the majority of responses:

Team response volume

Some team members got pretty competitive trying to be the fastest to respond to emails:

Median response times

Minimum response times

Time

Even though the experiment began on May 31, we received a lot more email once registration opened on July 1st, receiving the greatest number of emails on July 2nd:

Emails per day

People did sent us email at all hours, but the morning seemed to be the least popular time to send messages:

Emails per time of day

Domains

Unsurprisingly, gmail.com and mit.edu were the most popular email domains among FAQ readers:

Domains

Text

Emoji

Emoji were quite popular in our emails. Among all the emails that were sent and received, here are the most used emoji:

Emoji use

Apparently, the HackMIT team really loves emoji, being responsible for 67% of the total emoji use:

Team emoji use

Sentiment

Luckily, the majority of emails we received were positive (according to a sentiment analysis engine):

Email sentiment

Here’s the most negative email we received:

Negative email

And here’s the most positive:

Positive email

Conclusion

Okay, so most of this data analysis is pretty silly. It’s not meant to be taken too seriously! We had a great time going overboard and making pretty graphs.

In case anyone is curious about how we did the analysis, here’s a short summary. We archived all the emails that were sent to faq-readers@mit.edu, and after the conclusion of the experiment, we loaded all the data into a Jupyter notebook using Python’s mailbox library. We analyzed the data using NumPy, pandas, NLTK, TextBlob, talon, and emoji, and we made the graphs using matplotlib and Seaborn.

Thanks to Claire, Kimberli, Stef, and the rest of the HackMIT team for feedback on this post!