Sep 18, 2017

Representing investor preferences as a vector

At 4Degrees we're focused on forming stronger and more authentic relationships between professionals. My co-founder and I both come from venture capital, so we've decided to focus our early efforts toward that mission in the VC and entrepreneur space.

One of the questions we've struggled with early on is how to intelligently match investors to startups. That type of matching has a lot of potential value- both to flag interesting relationships that already exist as well as to tee up introductions that could be mutually beneficial.

The first problem we ran into is that there's no good public structured dataset with this information. This isn't a deal breaker for us: we know a few sources of this information that aren't public and/or aren't structured and have the capabilities to wrangle that to our needs.

Probably more importantly, the traditional way of thinking of investor interests makes the obvious solutions in this space frustrating.

To illustrate, I'll start with the "good" case. Adam is a partner at Pritzker Group Venture Capital focused on healthcare technology investment. Silvervue is a startup providing solutions to hospitals. Adam is interested in Silvervue. All is good.

Now the "bad". Chris is another partner at PGVC focused on B2B investment. Outbound Engine is a marketing automation platform focused on the SMB market (small and medium sized businesses). Using the simple logic above (which predominates even the sophisticated approaches today), Chris should be interested in Outbound Engine. But he's not. While Chris does make some investments into SMB-targeted businesses, his true focus is on enterprise-targeted businesses. Even if a dataset does differentiate between SMB and enterprise investment (most don't), Chris does technically invest in SMB businesses. He just needs to see a more robust set of validating factors to lean in.

That brings us to the issue with current approaches: they assume investment interest is binary. As Chris' example shows, they're not.

In talking with Ben Blaiszik last week, we came across an interesting alternative. What if we treated investment interest as a continuous range, varying by sector? Perhaps instead of just being interested in SMB-targeted startups, Chris has a 0.3 interest value (in comparison to his 0.8 interest in enterprise-targeted companies).

Transforming the data in this way allows for much more intelligent and accurate matching of investor interests to startup focuses.

But why stop there? On the flip side, a startup's sector(s) could also be vectorized for more accurate representation.

The implications beyond this single matching problem are very interesting as well. This type of data structure allows us to make more powerful connections between investors and between entrepreneurs. And the structure could perhaps be extended to a multitude of other attributes: personal interests, industry expertise, skillsets. The list goes on and on.

At first blush, this type of structure presents a data collection challenge. Humans aren't really conditioned to apply gradations to their categories like this. But that doesn't trip us up for too long- the far more interesting application is categorization at scale. And when you think about the usage of probabilistic classifiers for automated categorization, this data structure is actually particularly well-suited. Rather than setting a binary threshold and converting a probability estimate to a 0 or 1, why not just score that probability directly as an element in the vector space?