Datasets for network analysis

Stanford Network Analysis Platform

A huge area of interest in statistics and machine learning is that of graph recovery, both directed and undirected. The application of network recovery tools are practically limitless, ranging from social media to genomics and sports analytics. I have read about lots of different methods to recover causal relationships among a group of variables, and the complexity of the underlying algorithms range from simple correlation measures to more sophisticated concepts such as random forest or variational Bayes.

In my opinion, one of the major caveat in the field of network recovery is the lack of methods capable of inferring causal network from high-dimensional data. With the advent of the Big Data trend, I believe that this lack of available software will become more and more glaring.

The usual pipeline for researchers that aim to develop new methods for network recovery involves the assessment against synthetic data and subsequently on real-life data. For these reasons, I am posting here a few publicly available databases that provide a wide breadth of interesting datasets to analyze.

Stanford Network Analysis Platform (SNAP)


NYC open data

UN data


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s