As the data produced by enterprises and individuals that need to be stored and utilized is rapidly increasing, data owners are motivated to outsource their local complex data management systems into the cloud for its great flexibility and economic savings. To protect data privacy and combat unsolicited accesses in cloud and beyond, sensitive data has to be encrypted before outsourcing; this, however, obsoletes the traditional data utilization service based on plaintext keyword search. Thus, enabling an encrypted cloud data search service with privacy-assurance is of paramount importance. Considering the potentially large number of on-demand data users and huge amount of outsourced data files in the cloud, this problem is particularly challenging, as it is extremely difficult to meet also the requirements of performance, system usability and scalability. This research project aims to explore such a privacy-assured and effective cloud data utilization service with high service-level performance and usability, by investigating the two challenging research tasks: fuzzy keyword search and ranked keyword search over encrypted cloud data.
Fuzzy keyword search, opposing to exact keyword match, tolerates minor typos and format inconsistencies in user search request, and greatly enhances system usability and user searching experience. Its challenge lies in the fact that two words similar to each other would no longer be so after one-way cryptographic transformation (for encrypted keyword search). To address the problem, we plan to explore a brand new symbol-based trie-traverse searching approach, in which transformed fuzzy keywords extracted from data files are stored using a multi-way tree structure to support efficient search, while protecting keyword privacy. [2,5]
Ranked keyword search further ensures the file retrieval accuracy and allows the user to find the most/least relevant information efficiently. We explore the statistical measure approach (i.e. relevance score) from information retrieval (IR), and properly hide the scores in an order-preserved manner. The resulting design is expected to facilitate efficient server-side ranking without losing keyword privacy. For practical performance, different system parameters and the corresponding security/efficiency tradeoff are yet to be thoroughly investigated. [1,4]
Another promising research direction we further propose to explore is the secure multi-keywords semantic search, which takes into consideration conjunction of keywords, sequence of keywords, and even the complex natural language semantics to produce highly relevant search results, while maintaining various stringent privacy guarantees.[3]