Users Tracing in Online Text Systems

Le, Hoi

Users Tracing in Online Text Systems

Date

2016

Authors

Le, Hoi

Abstract

Privacy for online systems including social networks, specialized websites such as reviewing systems, movies forums, etc. have become primary concern people who use these websites. Users of these websites must register accounts and input personal information, which maybe directly related to their identities. Their reviews, tweeting, comments, or chat messages provide more information about them through their writing characteristics. This threatened to reveal their identities and other personal information. A patient's records need to be accessible for research purposes or be provided to a third party. The user’s identity or health status must remain protected. Current methods provide more tools to eliminate portions of text in the records that can be used to infer those sensitive information. We provide a new approach to select parts of the text that must be removed. The novelty of this approach is using information theoretic measures to capture the definition of sensitive inference. Using this approach we almost double the number of detected inferences compared to the existing state-of-the-art systems. Human characteristics such as writing characteristics can be used to identify them with more information. This information can be used to trace users' activities across websites by performing writing style matching. To protect users from being traced, obfuscating their writing styles is necessary. However this is not an easy-to-accomplish task. In this thesis, we will show that there are security flaws in current works and design a writing style obfuscation algorithm which has a number of important security properties. As stylometry techniques have been expanded to new domains such as tweets, comments, chat messages and codes, the same privacy concerns exist in both traditional and new domains. Number of challenges exist such as authors can be traced or identified across domains. We have analysed privacy of multi-user Twitter accounts, and showed that authors can be recognized using data from other domains such as blogs.

Keywords

Computer Science

Citation

Le, H. (2016). Users Tracing in Online Text Systems (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/28613

URI

http://hdl.handle.net/11023/3042

Collections

Open Theses and Dissertations

Full item page