Stochastic Friendship Maximization Algorithm
Problem Statement
Given N potential friends, determine optimal pairings.
Overview
The algorithm has two steps, (a) obtain pairwise numerical friendship scores and (b) identify pairings that maximize friendship.
Stage 1: Obtaining Rankings
The procedure for obtaining rankings is:
– Gather N potential friends and the same number of conversation prompts.
– Pair participants, and have them talk casually for roughly five minutes on one of the topics. After the time is up, have each person secretly rate their perceived friendship potential on a 0-10 scale.
– Repeat using a round robin approach to measure all possible pairings.
The desired outcome of this stage is a matrix representing perceived friendship possibility. The diagonal should be set equal to zero (self-friendship is lonely, and so assumed to be suboptimal). The matrix should be complete and asymmetric, as each pairing was ranked by both members.
Discussion of Stage 1: Obtaining Rankings
Ideally the prompts will have the potential to be either divisive or spark connection – smalltalk is to be avoided. A future writeup about topic selection is being written.
It is worth noting that some data validation and cleanup may be required. Initial empirical studies exhibited missing values, numbers outside the scoring range (e.g. “one million”), and non-numerical results (e.g. a picture of a cat wearing a party hat).
Stage 2: Friendship Maximization
First, the full asymmetric friendship matrix is transformed into a symmetric matrix by taking the element-wise geometric mean of the matrix and its transform.
Next a stochastic optimization is performed. Random pairings are generated and the friendship score is calculated. This is repeated for a fixed number of iterations. The optimality criteria is the mean of the indices of the symmetric matrix corresponding to the proposed pairings. The pairing with the highest score is declared the friendship maximizing paring.
Discussion of Stage 2: Friendship Maximization
The first version of this algorithm used the algebraic mean rather than the geometric mean to calculate symmetric friendship scores from the asymmetric matrix. Using the geometric mean is recommended because highly dissimilar rankings are unlikely to produce a good friendship. Specifically, a (5,5) friendship ranking and a (9,1) friendship ranking have the same algebraic mean, but the (5,5) pairing is much more likely to produce a mutually satisfactory friendship and has a higher geometric mean.
Multiple cost functions were examined for the optimization stage. Most produced unsatisfactory results. For example, using the median produced an unacceptably high variance in pairwise friendship scores – simply put, there is no cost in making a few really bad pairings.
Finally, the choice of a fully stochastic search was made for three reasons:
Serendipity: Finding the global maximum would deny the participants the feeling of luck, which is an important part of the human condition.
Efficiency: Data entry from paper forms took two orders of magnitude longer than running 100,000 iterations, so optimizing the code does not make sense.
Simplicity: I had already had a few beers before coding this up, and so keeping it straightforward was important.
Future Work
There are three improvements I would like to see:
1. A Bayesian extension, where social web mining can be used to generate prior distributions.
2. Using simulated annealing rather than a full stochastic search.
3. Additional empirical testing may improve conversation prompts.
References