This is a summary of the paper “Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee” by Wang et al. (INFOCOM 2015). All the information is based on my understanding of the paper and may contain inaccuracies.
This work proposes a searchable encryption protocol based on inverted indices. The authors highlight the advantages of inverted indices, especially for indexing large amounts of text documents. Based on these advantages and on its wide adoption in the context of plaintext data, this research presents a system that encrypts the existing inverted indices while maintaining their search capabilities.
The focus of the paper is on overcoming two important limitations presented by the previous works:
- Since the server can see the patterns of the encrypted array representing the index for a given keyword, the index must be rebuilt after a query uses that keywords.
- Lack of support for conjunctive multi-keyword search.
The main building blocks for this work are inverted indices themselves, private set intersection and blind storage:
- A inverted index has a number of inverted lists (one for each keyword), and each list has the IDs of the documents that contain that keyword.
- Private set intersection can securely find the intersection of the inverted lists that satisfy each of the keywords in the query.
- The blind storage protocol securely retrieves documents from the cloud server after the document IDs have been obtained using the searchable encryption protocol.
There are three main entities in the system:
- A data owner who constructs the index, encrypts the documents and issues trapdoors for queries.
- A honest-but-curious cloud server that stores the encrypted documents and the encrypted index. The server performs the search using the index, but should not learn anything about the documents, the index or the queries.
- Trusted users that request trapdoors from the data owner and then use those trapdoors to perform queries on the server.
The main steps of the protocol are as follows:
- It is assumed that the data owner already has a plaintext inverted index for the documents. The encryption of the index is done by creating polynomials for each inverted list and encrypting the coefficients using the Paillier cryptosystem. The index and the encrypted documents are then sent to the cloud server.
- A user contacts the data owner to obtain a trapdoor for a given query. The trapdoor contains the coefficients of the query polynomial, which is padded with random terms to avoid leakage of the number of keywords in the query.
- The server uses the encrypted index provided by the data owner and the trapdoor provided by the user to calculate the encrypted intersection of the inverted lists that contain the query keywords.
- The user decrypts the result with the help of the data owner, obtaining the IDs of the documents that satisfy the query.
- Using a blind storage protocol, the user can retrieve the documents from the cloud server without leaking any additional information.
A experimental evaluation is performed using datasets of up to 12000 documents:
- The index generation step happens only once and its cost depends mainly on the size of the encryption keys used in the system.
- The trapdoor generation can become expensive due to matrix multiplications, but the authors claim that it can be optimized with special matrix multiplication algorithms.
- In the computation done by the server, the encryption takes most of the time, but it can be parallelized for better performance.
Assuming a existing inverted index contributes to an easier application of this method to systems that already use inverted indices over plaintext data.
The use of cheaper operations (e.g.: multiplication and exponentiation) instead of more expensive ones (e.g.: pairing) contributes to a better performance in comparison to other works.
Allowing conjunctive multi-keyword queries is a desirable feature for the use of this system in real-world scenarios.
The data owner must always be online and participating in the system by issuing trapdoors for the queries and helping in the decryption of results. This can become a bottleneck for the scalability of the system.
Considering that the data owner has to participate in the protocol anyway, it is not clear why the processing to find the documents’ IDs should be done at the cloud server and not at the data owner. In other words, it seems the cloud could be responsible only for the storage and the distribution of documents for the authorized users, but the search using the inverted index could be done at the data owner.
The proposal does not support disjunctive multi-keyword queries, which are also frequently used in real-word applications.
A presentation introducing this work is available at SlideShare.