# Summary: Efficient Similarity Search over Encrypted Data

This is a summary of the paper “Efficient Similarity Search over Encrypted Data” by Kuzu et al. (ICDE 2012). All the information is based on my understanding of the paper and may contain inaccuracies.

## Overview

This paper proposes a method to perform similarity search over encrypted data by using locality sensitive hashing (LSH).

The similarity searchable encryption scheme used is a non-deterministic method that can generate a trapdoor $T$ for a feature $f$ of a particular data item. A secure index $I$ is created for the data items and it is used to find items that have a specific feature.

Three schemes are proposed: Basic single server, two-round multi-server and one-round multi-server.

The basic single server search scheme has six main steps:

1. Key generation: When private keys are generated.
2. Index construction: A secure index $I$ is created for the collection of data items
3. Data encryption: The data items are encrypted and uploaded, and the keys and hash functions are shared with the users.
4. Trapdoor construction: Using LSH, a trapdoor is constructed for the query feature and it is sent to the server.
5. Search: The server searches on the index using the received trapdoor and returns the bit vectors that represent items that can be part of the result. The user decrypts the bit vectors, ranks the identifiers and sends the desired identifiers to the server. Finally, the server returns the encrypted items.
6. Data decryption: The user decrypts the items that belong to the result.

The basic scheme can leak the association between identifiers and trapdoors, which can be used to perform statistical attacks. The solution for this uses two servers: One server stores the index and the other stores the data items. After users decrypt and rank the bit vectors from the server holding the index, they can request the items from the other server by using the desired identifiers.

Although the multi-server approach is more secure, it still requires the user to do a considerable amount of work to decrypt the bit vectors and rank the identifiers. To reduce the client computation, a one-round search scheme uses the Paillier cryptosystem, which is non-deterministic and homomorphic additive, to rank the items. In this scheme, the server holding the index receives a trapdoor from the user and computes scores between the trapdoor and the other bit vectors. The encrypted scores are sent directly to the other server, which decrypts them and ranks the items according to the scores. After that, the encrypted results are sent to the user, who decrypts them and obtains the desired plaintexts.

In general terms, the two-round multi-server scheme is good for scenarios that have a large number of data items, and the one-round scheme is better if the items have a large number of features.

## Contributions

The index constructed using LSH is secure under an adapted definition of semantic security.

The fuzzy search provides a fault-tolerant way to search for keywords, allowing the user to find the desired results even if the query contains typos or other errors.

Information leakages (e.g., the association between identifiers and trapdoors) that allow statistical attacks are also considered by this work.

## Limitations

The multi-server model assumes that the servers are honest-but-curious and do not collaborate with each other, with might be unrealistic in some scenarios.

Although the one-round scheme minimizes the computation perform by the user, it incurs in an extra communication cost between the servers.