Electronic Thesis/Dissertation



Downloadable Content

Download PDF

Big Data is the new term of the exponential growth of data in the Internet. The importance of Big Data is not about how large it is, but about what information you can get from analyzing these data. Such analysis would help many businesses on making smarter decisions, and provide time and cost reduction. Therefore, to make such analysis, you will definitely need to search the large files on Big Data. Big Data is such a construction where sequential search is prohibitively inefficient, in terms of time and energy. Therefore, any new technique that allows very efficient search in very large files is highly demanded. This research presents an innovative approach for efficient searching with fuzzy criteria in very large information systems (Big Data). Organization of efficient access to a large amount of information by an "approximate" or "fuzzy" indication is a rather complicated Computer Science problem. Usually, the solution of this problem relies on a brute force approach, which results in sequential look-up of the file. In many cases, this substantially undermines system performance. The suggested technique uses different approach based on the Pigeonhole Principle. It searches binary strings that match the given request approximately. Considering the following problem, a data to be searched is presented as a bit-attribute vector. The searching operation consists of finding a subset of this bit-attribute vector that is within particular Hamming distance.The analysis of this new method shows significant gain in performance in the organization of this searching. It substantially reduces the sequential search operations and works extremely efficiently from several orders of magnitude including speed, cost and energy.

Author Language Keyword Date created Type of Work Rights statement GW Unit Degree Advisor Committee Member(s) Persistent URL