Bari,Nima-PhDDissertation Open Access
Downloadable ContentDownload PDF
As we march through this millennium, clustering big data is a big deal for all data and computer scientists. We are living in an age that faces unprecedented growth of available large-scale data, both structured and unstructured. Data is being collected at exceptional rates from a broad range of online websites and applications. For instance, as Berkovich et al. describe, "Facebook reports about 6 billion new photos are uploaded each month and 72 hours of video is uploaded every minute." Researchers and developers are faced with this large amount of data that needs to processed, analyzed, and clustered. Analysis of big data essentially drives every aspect of our daily life, including but not limited to, retail services, mobile services, financial services, manufacturing, and life sciences. This dissertation addresses big data clustering with an efficient time complexity. The 23-bit feature of Golay code is one of the most efficient clustering algorithms in a linear time complexity. This dissertation introduces a novel and unprecedented big data computational model. This model uses 23-bit questions template as one of the most efficient clustering methods that can be used for any type of big data (movies, video, text, images, and so on) for a large volume with at least 8.3 million unique data points. One of the most important features of this 23-bit questions method is that it is based on fuzzy logic for clustering big data points. Marcel J.E. Golay introduced the mathematical model of Golay code in 1945. This dissertation focuses on investigating and developing a 23-bit questions metaknowledge template inspired by Golay code for big data clustering.