Privacy for online systems including social networks, specialized websites such as reviewing
systems, movies forums, etc. have become primary concern people who use these
websites. Users of these websites must register
accounts and input personal information, which maybe directly related to their identities.
Their reviews, tweeting, comments, or chat messages
provide more information about them through their writing characteristics. This
threatened to reveal their identities and other personal information.
A patient's records need to be accessible for research purposes or be provided
to a third party. The user’s identity or health status must
Current methods provide more tools to eliminate portions of text in the
records that can be used to infer those sensitive information. We provide a new approach
to select parts of the text that must be removed. The novelty of this approach
is using information theoretic measures to capture the definition of sensitive inference.
Using this approach we almost double the number of detected inferences compared to the
existing state-of-the-art systems.
Human characteristics such as writing characteristics can be used to identify them with more information. This information can be used to trace users' activities
across websites by performing writing
style matching. To protect users from being traced, obfuscating their writing styles is necessary.
However this is not an easy-to-accomplish task.
In this thesis, we will show that there are security flaws in current works and design a
writing style obfuscation algorithm which has a number of important security
As stylometry techniques have been expanded to
new domains such as tweets, comments, chat messages and codes, the same privacy concerns
exist in both traditional and new domains. Number of challenges exist such as authors can be traced or identified
across domains. We have analysed privacy of multi-user Twitter accounts, and showed that
authors can be recognized using data from other domains such as blogs.