Algorithms in the Real World: Committee Assignment

I recently had another chance to use a fancy algorithm to solve a real-world problem. These opportunities don’t come up all that often, but when they do, it’s pretty exciting!

Every year, MIT HKN has a bunch of eligible students who need to be matched to committees. As part of the assignment process, the officers decide how many spots are available on each committee, and then we have every eligible rank the committees. In the past, officers matched people manually, looking at the data and trying to give each person one of their 1st or 2nd choices. Unfortunately, this is time-consuming and unlikely to result in an optimal assignment if we’re trying to maximize overall happiness.

Example

To see how assignments can be suboptimal, we can go through an example.

Committee Capacity
Tutoring 2
Outreach 1
Social 2
Person Tutoring Outreach Social
Alice 1 3 2
Bob 1 3 2
Charlie 1 2 3
Dave 2 1 3
Eve 2 1 3

In the above table, 1 means first choice, and 3 means third choice. We could imagine assigning people by going down the list and assigning each person to their highest-ranked committee that has available slots. This would result in assigning Alice to Tutoring (1st choice), Bob to Tutoring (1st choice), Charlie to Outreach (2nd choice), Dave to Social (3rd choice), and Eve to Social (3rd choice).

Not only is this algorithm unfair to the people at the bottom of the list, but it’s also suboptimal. If we want to minimize the sum of the rankings for committees we placed people on, we could go with Alice–Social, Bob–Social, Charlie–Tutoring, Dave–Outreach, Eve–Tutoring. This results in a “cost” of 8, which is optimal, rather than the cost of 10 we got with the first assignment, which was constructed greedily.

In the actual data set, there were 8 committees and 57 eligible members, so it wouldn’t have been feasible to manually find an optimal assignment.

Problem Statement

More formally: we have committees and people. Each committee has a capacity of people, where we’re guaranteed that . We know people’s preferences, which for any given person is a permutation of the committees, , mapping the highest ranked committee to and the lowest-ranked committee to . Our goal is to find an assignment of people to committees that solves the following optimization problem:

Above, is the Kronecker delta. Essentially, we want to find the assignment that minimizes cost while satisfying our constraints of having a specific number of people on each committee.

Algorithm

It turns out that the committee assignment problem can be transformed into an instance of the assignment problem (which can also be thought of as finding a minimum weight matching in a bipartite graph).

In the assignment problem, you have people, jobs, and a cost matrix where is the cost of having person do job , and you want to find the minimum cost assignment such that each person is assigned to a unique job.

The committee assignment problem can be trivially transformed into an instance of the assignment problem, simply by making copies of each committee , counting each as a separate “job”, and constructing an appropriate cost matrix from the s.

Luckily, the assignment problem is well-studied in computer science, and there’s a known solution — the Hungarian algorithm solves this problem. There’s even an implementation of the algorithm built into SciPy. This makes solving the committee assignment problem really easy — it only requires a little bit of code to implement the transformation described above.

Results

Using this algorithmic approach to solve the committee assignment problem worked really well for us! We made some slight modifications to the process — one of the committees hand-picked their members, and then we used the algorithm on the remaining 52 members and 7 committees. When running the program, we decided to minimize the sum of the squares of the costs rather than minimizing just the sum.

With 52 people and 7 committees, our implementation ran in less than half a second and gave us an assignment with 37 people getting their first choice, 13 getting their second choice, and 2 getting their third choice.

Gavel: An Expo Judging System

Gavel is an automated end-to-end expo judging system. We’ve used it to automate judging at HackMIT, a 1000-person event with over 200 projects and 100 judges. Dozens of other events have also used Gavel since the software was released in private beta in late 2015.

Gavel fully automates expo judging logistics and project ranking — the system tells judges which projects to look at, collects judges’ votes, and produces a ranking of projects.

Here’s a demo of the judge interface:

Model

Gavel’s model is pretty different from traditional scoring methods — it’s based on the method of pairwise comparisons, and the system uses fancy math to come up with a ranking. For reasons described in detail in this post, pairwise comparisons work much better than having judges input scores from 1 to 10 or something like that.

History

The HackMIT team has used Gavel for four events now. We implemented the first prototype, a text message based system built on top of Twilio, at Blueprint in Spring 2015. In Fall 2015, we switched to a new algorithm and reimplemented the system as a web app for HackMIT. We continued to use that implementation for Blueprint in Spring 2016. For this year’s HackMIT, we completely redesigned the frontend and the admin panel of the application, changing basically everything except for the core algorithms. Now the system is a whole lot more user-friendly for both judges and event organizers.

Here’s what judging looked like at HackMIT this year:

Public Release

Using this system has made judging logistics super easy for us, and we believe that it has also led to higher-quality results. We thought that other events could benefit from using Gavel too, so we’re releasing the software to the public!

We’re releasing the latest version of the system, Gavel v1.0, which is what we used at HackMIT 2016. From this point on, Gavel will be developed as an open source project.

Gavel is free software released under AGPLv3, available for download here.

If you use Gavel for your event or have any feedback about anything, I’d love to hear about it.

Validity, Trust, and the Design of Interfaces

In secure communication schemes, there are three main goals — confidentiality, integrity, and authenticity. There are a lot of real-world software and systems out there that don’t get integrity and authenticity quite right, many times as a result of poor interface design.

Considering adversarial models, we see that there are very few situations where integrity by itself is useful. Standalone integrity checks would protect against network errors, for example, but they are not useful for much beyond that. In fact, when dealing with adversaries, integrity without authenticity is worthless. On the other hand, authenticity implies integrity, so that should be the gold standard for security.

Along the same lines, in digital signature systems, validity without trust is worthless. Data being correctly signed by some public key doesn’t mean anything unless the key is trusted. This is especially relevant in web of trust systems such as GPG, where a shared keyring is used to store public keys. Any program can add keys to the keyring, and some programs enable automatic key retrieval, so the mere presence of a public key in the keyring is meaningless. Data signed by a key can only be trusted if the public key has been assigned an explicit trust value or if it can be trusted under some web of trust model.

Interfaces and Security Implications

An interface is a boundary between components in a system, either between two pieces of software or between a program and a human. In security-sensitive applications, interface design is critical. Something that is relying on security-related software needs to know exactly what guarantees are provided by the software. With libraries, it’s necessary to know what is the client’s responsibility and what the library handles internally. When possible, software should do the most intuitive and secure thing by default, and the simplest way to use the software should also be the most secure way.

Bugs in the implementation of software, once identified, can be fixed without consequence. On the other hand, bad interface design is incredibly difficult to fix once there exists software that uses the API — changes could break compatibility with all existing software that uses the API.

Case Studies

To better understand the importance of good interface design in security-critical applications, we can critique existing systems, looking at both human-computer interface boundaries and software-software interface boundaries.

GPGMail for macOS

GPGTools distributes a suite of GPG-related software for macOS. Among these tools is GPGMail, an Apple Mail plugin that lets users send and receive signed and encrypted mail using PGP. When someone receives PGP-signed email, it looks like this:

GPGMail Header

The indication to the user is that everything is fine — there’s a nice big check mark visible. With the default configuration, the plugin automatically downloads the public key and verifies that the signature is valid. Perfect!

Except if we click on the check mark, we can see that everything is not ok:

GPGMail Key Information

In this screen, we can see that “this signature is not to be trusted”. The signature is valid, but it cannot be trusted! It’s a simple thing to check, but the user has to remember to manually check this for every single email received if they want security guarantees. Otherwise, someone could spoof an email and sign it with some other public key that has been uploaded to the key servers, and the mail client would faithfully automatically download the untrusted key from the public key servers and verify that the signature is valid. As the software is designed, the check mark basically means nothing in terms of security.

This is bad design! It’s not a bug in the cryptographic protocols or the implementation of the software, but it’s arguably at least as bad. The mail plugin behaves in an unintuitive manner, and the check mark gives users a false sense of security.

Better Designs

There are email clients handle this better. For example, Evolution shows the following when viewing an email signed by an untrusted key:

Evolution showing a warning about an untrusted signature

There’s a clear warning in the user interface. When the email is signed by a trusted key, it looks very different:

Evolution showing an email signed with a trusted signature

CPAN Signature Checks

CPAN is a package manager for Perl. Like many other package managers, CPAN has support for digital signatures, which can be enabled using the check_sigs option:

CPAN packages can be digitally signed by authors and thus verified with the security provided by strong cryptography. The exact mechanism is defined in the Module::Signature module.

Unfortunately, due to the way Module::Signature’s interface is built, App::Cpan doesn’t really end up providing any strong cryptographic guarantees.

CPAN checks signatures like this:

my $rv = eval { Module::Signature::_verify($chk_file) };

if ($rv == Module::Signature::SIGNATURE_OK()) {
    $CPAN::Frontend->myprint("Signature for $chk_file ok\n");
    return $self->{SIG_STATUS} = "OK";
} else {
    # print error message and abort
}

Module::Signature, in turn, runs this:

my @cmd = (
    $gpg, qw(--verify --batch --no-tty), @quiet, ($KeyServer ? (
        "--keyserver=$keyserver",
        ($AutoKeyRetrieve and $version ge '1.0.7')
            ? '--keyserver-options=auto-key-retrieve'
            : ()
    ) : ()), $fh->filename
);

In the code, $AutoKeyRetrieve is enabled by default, and $keyserver is set to pool.sks-keyservers.net, a public PGP key server. This means that when checking the signature, if the public key is not found in the local keyring, it will automatically be fetched from the key server.

To verify the signature, Module::Signature runs gpg in a subprocess and checks the return value, yielding SIGNATURE_OK if gpg returns 0 and yielding SIGNATURE_BAD otherwise.

Unfortunately, gpg’s return value doesn’t indicate whether the signature is trusted or not — it indicates whether the signature is valid or not. When using a shared keyring and especially when enabling automatic key retrieval, this guarantee doesn’t mean anything. It’s no better than a checksum — it offers no protection against an adversary.

Because of this vulnerability, a man-in-the-middle attacker can run arbitrary code on machines running cpan install, even when CPAN has signature checks enabled.

Who is to Blame?

It’s unclear exactly which package is responsible for this security issue. Is it App::Cpan, Module::Signature, or gpg? App::Cpan uses a function called verify() without understanding exactly what verify means. Module::Signature uses the return value from gpg and doesn’t provide any distinction in signature status besides SIGNATURE_OK and SIGNATURE_BAD. gpg doesn’t expose a nice programmatic interface for simultaneously verifying a signature and validating the trustworthiness of the key that was used, instead only printing warning text to stderr when a key is untrusted.

At this point, it’s pretty hard to fix this issue. It’s hard for Module::Signature to change its API — it’s depended on by hundreds of modules, and an API change could break many of them.

Better Designs

There are libraries similar to Module::Signature that do a better job with the design of their API. For example, GNOME Camel, which is used by Evolution, has a function camel_cipher_context_verify_sync(), which returns one of the following results after verifying a signature:

typedef enum _camel_cipher_validity_sign_t {
    CAMEL_CIPHER_VALIDITY_SIGN_NONE,
    CAMEL_CIPHER_VALIDITY_SIGN_GOOD,
    CAMEL_CIPHER_VALIDITY_SIGN_BAD,
    CAMEL_CIPHER_VALIDITY_SIGN_UNKNOWN,
    CAMEL_CIPHER_VALIDITY_SIGN_NEED_PUBLIC_KEY
} CamelCipherValiditySign;

This return type is much richer than the binary return type of Module::Signature’s verify(), so it’s possible to disambiguate between good signatures made with a trusted key and valid signatures made with an untrusted key.

Conclusion

In both human-computer and software-software interfaces, interface design is critical to security. Bad designs can lead to serious security vulnerabilities, and these vulnerabilities can be incredibly difficult to fix. For this reason, interfaces of security-related components must be designed carefully, have secure defaults, and clearly communicate the responsibilities of the library and the responsibilities of the user.