Language models are either trained only on a repository data or post-trained using therepository after having been trained on a huge dataset like Wikipedia. Either way, sincethe distribution of the repository data (usually a domain-specific corpus) and the real-worlddistribution of concepts (such...
A large portion of data available on the web is present in the so called ''Deep Web''. The deep web consists of private or hidden databases that lie behind form-like query interfaces that allow users to browse these databases in a controlled manner. While hidden database interfaces are normally...
In this dissertation, I explore the expanding area of interest in the field of business analytics: the extraction of meaningful association structures and their connection to applicable business theory. In particular I develop a series of models based on dynamic Bayesian methods to address some...
We exist in a world of "big data." With every cyberspace browsed, e-mail sent, status shared, photo uploaded, tweet tweeted, or search query issued, we leave digital imprints behind, that exponentially increase the massive amount of data floating in the ether, specifically in cables and airwaves....
This thesis presents new techniques for unsupervised learning from graph data. Data from many applications, including social networks, transportation systems, images, and climate variables measured over multiple latitudes and longitudes, can be represented as graphs. Graphs provide a convenient...