# Summary: Fuzzy Keyword Search over Encrypted Data in Cloud Computing

This is a summary of the paper “Fuzzy Keyword Search over Encrypted Data in Cloud Computing” by Li et al. (INFOCOM 2010). All the information is based on my understanding of the paper and may contain inaccuracies.

## Overview

This work is motivated by the security concerns present in the cloud computing environment. As a solution for protecting data privacy, the data is encrypted before being uploaded. Since searchable encryption schemes only provide exact keyword match, there is a need for fuzzy search over encrypted data.

The fuzzy search can identify items that are not exactly the same, but similar enough to satisfy a given similarity threshold. This solution considers edit distance as the similarity metric, defined as the number of operations (substitution, insertion and deletion) required to transform one string into another. The result of the fuzzy search is a set containing file IDs whose corresponding files contain words that are similar to the query keyword (i.e., $ed(w, w') \leq k$, for query keyword $w$ and the similarity threshold $k$). The file IDs are encrypted using symmetric encryption and are retrieved using trapdoors calculated using the same encryption key.

Considering all the operations (substitution, insertion and deletion) separately can create large fuzzy sets, thus increasing the storage requirements. The wildcard-based fuzzy search uses wildcards to denote all the operations at a given position in the word. Using this approach, the number of elements in the fuzzy sets are reduced from $(2 \ell + 1) \times + 1$ to $2 \ell + 1 + 1$ ($\ell$ is the length of the keyword).

## Contributions

The wildcard-based approach reduces the size of the fuzzy sets used to find candidates for matching the query keyword.

The scheme uses symmetric encryption, which offers good performance.

Search privacy is assured by the system.

## Limitations

Even reducing the size of the fuzzy sets, the number of possibilities is still large.

The system requires a predefined set of query keywords.

Multiple keywords are not supported in one query.

The proposal does not present an experimental evaluation.

## Final Notes

A simple presentation about this work is available at SlideShare: