Users Tracing in Online Text Systems

atmire.migration.oldid4475
dc.contributor.advisorSafavi-Naini, Reihaneh
dc.contributor.authorLe, Hoi
dc.date.accessioned2016-06-07T17:31:36Z
dc.date.available2016-06-07T17:31:36Z
dc.date.issued2016
dc.date.submitted2016en
dc.description.abstractPrivacy for online systems including social networks, specialized websites such as reviewing systems, movies forums, etc. have become primary concern people who use these websites. Users of these websites must register accounts and input personal information, which maybe directly related to their identities. Their reviews, tweeting, comments, or chat messages provide more information about them through their writing characteristics. This threatened to reveal their identities and other personal information. A patient's records need to be accessible for research purposes or be provided to a third party. The user’s identity or health status must remain protected. Current methods provide more tools to eliminate portions of text in the records that can be used to infer those sensitive information. We provide a new approach to select parts of the text that must be removed. The novelty of this approach is using information theoretic measures to capture the definition of sensitive inference. Using this approach we almost double the number of detected inferences compared to the existing state-of-the-art systems. Human characteristics such as writing characteristics can be used to identify them with more information. This information can be used to trace users' activities across websites by performing writing style matching. To protect users from being traced, obfuscating their writing styles is necessary. However this is not an easy-to-accomplish task. In this thesis, we will show that there are security flaws in current works and design a writing style obfuscation algorithm which has a number of important security properties. As stylometry techniques have been expanded to new domains such as tweets, comments, chat messages and codes, the same privacy concerns exist in both traditional and new domains. Number of challenges exist such as authors can be traced or identified across domains. We have analysed privacy of multi-user Twitter accounts, and showed that authors can be recognized using data from other domains such as blogs.en_US
dc.identifier.citationLe, H. (2016). Users Tracing in Online Text Systems (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/28613en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/28613
dc.identifier.urihttp://hdl.handle.net/11023/3042
dc.language.isoeng
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgaryen
dc.publisher.placeCalgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectComputer Science
dc.subject.classificationPrivacy, Information Theory, Document Redaction, Secure Obfuscation, Tweet Privacyen_US
dc.titleUsers Tracing in Online Text Systems
dc.typedoctoral thesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameDoctor of Philosophy (PhD)
ucalgary.item.requestcopytrue
Files