Unsupervised Learning Algorithms for Graph Data Open Access
Downloadable ContentDownload PDF
This thesis presents new techniques for unsupervised learning from graph data. Data from many applications, including social networks, transportation systems, images, and climate variables measured over multiple latitudes and longitudes, can be represented as graphs. Graphs provide a convenient representation of relationships between variables. Unsupervised learning is a critical first step in making sense of unlabeled data. The recent surge in the deployment of sensors and data acquisition systems have led to an explosion in the amount of data available, a significant portion of which is unlabeled. Many existing approaches for unsupervised machine learning on graphs are computationally expensive and do not scale well to large amounts of data. The work presented here addresses this issue by proposing theoretically justifiable algorithms that provide a good tradeoff between computation time and accuracy. This thesis focuses on two unsupervised learning tasks: spectral clustering and nearest neighbor search using graph kernels.I present two novel sampling-based approaches to spectral clustering that scale to a significantly larger number of input points compared to the current state of the art. I also present theoretical results showing that these sampling-based approaches yield a good approximation with respect to a popular spectral clustering objective. In the case of nearest neighbor search, graph kernels are used to propose a hierarchical scheme that allows for efficient queries when the input data has a large number of points in a high dimensional space.