I recently had the pleasure of working on some exciting social features for Lumosity’s Family Plans. What better way to encourage people to train than to leverage the motivating community of their family and friends? One exciting component of the “Social Family Plan” feature is the weekly challenge. Members work together to collectively accomplish a single goal; this is something like “everyone completes three workouts” or “each person completes a workout in their least-proficient training category”.
Let’s take a look at a small, hypothetical feature which lends itself to interesting implementations, adapted from an old interview problem we used to assign to potential developers.
The Problem
Given a set of members on a family plan and all of their workouts from the past week (including the score and categories trained), let’s find the person least proficient in a given set of categories.
To be more precise, given the following input of a set of workouts and categories:
> finder workouts.csv attention speed
Where workouts.csv looks like:
Groucho | 15800 | attention |
Groucho | 23580 | speed, attention |
Harpo | 8200 | memory |
Harpo | 17750 | attention, speed, flexibility |
Which returns:
> Harpo, 17750
Approach
One strategy we can use here is to use a greedy algorithm to quickly find each family member’s lowest total score in a given set of categories. You can find a Ruby implementation of this strategy in the related Github repository, with the covered examples.
If we take a closer look at this problem, we can identify it as the weighted set cover problem:
Given a set U of n elements, a collection S1, S2, … , Sm of subsets of U, with weights wi, find a collection C of these sets Si whose union is equal to U and such that the sum of its weights is minimized.
We can use a solution to the weighted set cover problem to determine the lowest-scoring set of workouts that contain all of the given categories per person. Once we have each person’s lowest score, all we need to do is choose the minimum.
Let’s use a greedy algorithm for finding one member’s lowest-scoring set:
While there are still requested categories:
workout = get the lowest scoring workout
add workout to solution set
remove categories from requested categories
Return the sum of the solution set's workout scores
To get the lowest scoring workout, we do the following (this is the greedy part):
For each workout:
num_shared = # of categories shared between current workout's categories
and requested categories (set intersection)
score_weight = workout's score / num_shared
if score_weight is the lowest workout score so far
set the workout as the best choice
Return the best choice
Finally, pick the person with the lowest score.
Drawbacks
Why wouldn’t you want to do this?
One significant reason is that correctness is not guaranteed. Remember how I kept stressing that this implementation uses a greedy algorithm? There are some problems for which a greedy algorithm will not always yield the optimal result. This problem is one of them.
For example, given a user with the following workouts:
Zeppo | 2100 | memory, problem_solving |
Zeppo | 2000 | memory, flexibility |
Zeppo | 700 | problem_solving |
Zeppo | 500 | flexibility |
If you request:
> finder drawbacks.csv memory flexibility problem_solving
The result is:
> Zeppo, 3200
The optimal solution is:
> Zeppo, 2600
So what’s happening? The greedy algorithm’s nature is to choose what’s best at that moment. Let’s walk through:
- The first choice is 500: flexibility: it offers the minimum ratio of price / categories included.
- Of the remaining options, it chooses 700: problem_solving, which again offers the best ratio.
- Finally, it selects 2000: memory, flexibility as it covers memory for the least ratio.
The problem comes after step one; it should select 2100: memory, problem_solving, but doesn’t because the strategy is to “choose the most fulfilling option first”.
Benefits
If it’s not always right, why would this solution be even remotely reasonable?
It has a reasonable time complexity. If you do it right, the greedy algorithm has a time complexity of O(n log m) where n is the number of requested categories and m is the number of workouts. Other solutions can take a long time. One “succinct” solution involves computing every combination of workout sets, finding ones that have all the requested categories, and then choosing the one with the lowest score. The time complexity of this towers over that of the greedy algorithm.
You might say to yourself, “Well right is right! Who cares what the running time will be?” What about if member had, say, thousands (maybe even millions) of gameplays? And we have to calculate this information every time the home page is loaded. Still no big deal? Yeah, exactly. Sometimes imprecision is a reasonable tradeoff in the name of speed.
In Conclusion
Greedy algorithms can be a great solution to your problem, but remember to be careful and thoughtful about their implications. There are a myriad of intelligent solutions to this problem, with varying efficiencies. Have a good one? Share it with us!
Again, a Ruby implementation can be found in the related repository. Feel free to make a pull request with your own solution if you feel so inclined.