I get excited whenever I get to use interesting algorithms to solve real world problems.
Every year at HackMIT, we provide hosting for hackers who travel to come to the event. We get a bunch of MIT students to sign up as hosts in exchange for receiving free air mattresses from us, and things work out pretty well.
To make the whole process go smoothly, we need to match hackers with hosts. This is actually somewhat complicated.
Hackers may need hosting on Friday night, Saturday night, or both; hosts may be able to host different numbers of people on Friday and Saturday. Hackers may be allergic to pets; hosts may live in cat-friendly dorms. Hackers may not be okay with smoking; hosts may live in dorms where smoking is allowed. Hackers or hosts may or may not care about being paired with someone of the same gender.
We send out a hosting survey to all hackers and hosts to make sure that everyone is comfortable with their living arrangements.
In the past, we’ve manually matched people based on compatibility, but that gets pretty painful once there are a couple hundred hackers who need to be matched.
Instead, we can phrase the problem as a bipartite matching problem: we have two sets of people, hackers and hosts, and based on the results of the survey, we can figure out who is compatible with whom. If hosts can take multiple hackers, we can deal with that via node duplication.
A compatibility graph might look something like this:
A simple algorithm to find matches could loop through the hackers and match them with the first compatible host who is available. It’s a greedy algorithm, and it’s fast and easy to code. If we run this algorithm on our sample data, we get the following matching:
This is a maximal matching — we cannot add any more matches to this graph. Unfortunately, it is not a maximum matching — if we had done a better job, we could have matched a greater number of people.
We have to use a more sophisticated algorithm to get a better result. We can treat this as a max flow problem and use an algorithm like Edmonds-Karp or Hopcroft-Karp to compute a maximum bipartite matching:
This is much better. Using this algorithm guarantees that we match as many people as possible!
With real world data, we can see that using a good algorithm actually makes a difference. For HackMIT 2015, we had 359 hackers and 234 unique hosts. Some hosts were willing to host for multiple days; taking that into account, we effectively had 367 hosts. There wasn’t much room for error.
The greedy algorithm manages to match 95% of the hackers with a maximal matching, which is actually not that bad.
The optimal algorithm matches 100%.