Data Structures, Algorithms and Applications for Big Data Analytics: Single, Multiple and All Repeated Patterns Detection in Discrete Sequences

atmire.migration.oldid5504
dc.contributor.advisorAlhajj, Reda
dc.contributor.authorXylogiannopoulos, Konstantinos
dc.contributor.committeememberRokne, Jon
dc.contributor.committeememberPardalos, Panayote
dc.contributor.committeememberKawash, Jalal
dc.contributor.committeememberHelaoui, Mohamed
dc.date.accessioned2017-04-27T19:09:57Z
dc.date.available2017-04-27T19:09:57Z
dc.date.issued2017
dc.date.submitted2017en
dc.description.abstractMy research work of the current thesis focuses on the detection of single, multiple and all repeated patterns in sequences. Many algorithms exist for single pattern detection that take an input argument (i.e., pattern to be detected) and produce as outcome the position(s) where the pattern exists. However, to the best of my knowledge, there is nothing in literature related to all repeated patterns detection, i.e., the detection of every pattern that occurs at least twice in one or more sequences. This is a very important problem in science because the outcome can be used for various practical applications, e.g., forecasting purposes in weather analysis or finance by detecting patterns having periodicity. The main problem of detecting all repeated patterns is that all data structures used in computer science are incapable of scaling well for such purposes due to their space and time complexity. In order to analyze sequences of Megabytes the space capacity required to construct the data structure and execute the algorithm can be of Terabyte magnitude. In order to overcome such problems, my research has focused on simultaneous optimization of space and time complexity by introducing a new data structure (LERP-RSA) while the mathematical foundation that guarantees its correctness and validity has also been built and proved. A unique, innovative algorithm (ARPaD), which takes advantage of the exceptional characteristics of the introduced data structure and allows big data mining with space and time optimization, has also been created. Additionally, algorithms for single (SPaD) and multiple (MPaD) pattern detection have been created, based on the LERP-RSA, which outperform any other known algorithm for pattern detection in terms of efficiency and usage of minimal resources. The combination of the innovative data structure and algorithm permits the analysis of any sequence of enormous size, greater than a trillion characters, in realistic time using conventional hardware. Moreover, several methodologies and applications have been developed to provide solutions for many important problems in diverse scientific and commercial fields such as Finance, Event and Time Series, Bioinformatics, Marketing, Business, Clickstream Analysis, Data stream Analysis, Image Analysis, Network Security and Mathematics.en_US
dc.identifier.citationXylogiannopoulos, K. (2017). Data Structures, Algorithms and Applications for Big Data Analytics: Single, Multiple and All Repeated Patterns Detection in Discrete Sequences (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25522en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/25522
dc.identifier.urihttp://hdl.handle.net/11023/3754
dc.language.isoeng
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgaryen
dc.publisher.placeCalgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectComputer Science
dc.titleData Structures, Algorithms and Applications for Big Data Analytics: Single, Multiple and All Repeated Patterns Detection in Discrete Sequences
dc.typedoctoral thesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameDoctor of Philosophy (PhD)
ucalgary.item.requestcopytrue
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2017_xylogiannopoulos_konstantinos.pdf
Size:
3.63 MB
Format:
Adobe Portable Document Format
Description:
Thesis