This is a summary of the paper “CryptDB: Protecting Confidentiality with Encrypted Query Processing” by Popa et al. (SOSP 2011). All the information is based on my understanding of the paper and may contain inaccuracies. For further information, please visit the official website.
CryptDB is a system that enable SQL queries over encrypted data, claiming high security and good performance (i.e., low overhead compared to unencrypted DBMS). This work assumes a passive adversary model, in which a curious DBA tries to learn something from the data or the database server is compromised by an external threat.
The basic flow of execution of CryptDB is as follows:
- The data owner encrypts their data and send them to the cloud server.
- Users can then send plaintext queries to a proxy, which parses the query and rewrites it to a secure format by encrypting variables and changing column names.
- The proxy send the query to the cloud server, which executes it and returns the encrypted result to the proxy.
- The proxy decrypts the results and send them to the users.
The implementation is done by user-defined functions incorporated to MySQL. Each record is expanded to a number of new columns that hold the different encryptions of the record. The cryptosystems used include: random (AES or Blowfish in CBC mode with a random initialization vector), deterministic (Blowfish or AES in CMC mode), order-preserving encryption, homomorphic encryption (Paillier) using ciphertext packing, and word search (SEARCH).
The evaluation uses the TCP-C workload and shows a 21-25% loss in throughput compared to unencrypted MySQL. The storage overhead is about 3.76 times.
CryptDB uses different encryption schemes to support different operations that are commonly seen in a DBMS (e.g., equality checks, order comparisons, aggregates and joins). These different encryption schemes are combined using the idea of adjustable query-based encryption, which changes the encryption for data items at runtime. To achieve this, an onion model is used: the data items have layers of encryption, which are decrypted in case an inner layer is required for some computation.
The keys for decryption are derived from users’ passwords, and are not stored in the cloud server.
Although the onion model provides flexibility, the security level decreases over the time as the layers are decrypted. It is possible to issue re-encryption of layers, but ultimately the security of the system is given by the security of the innermost layer (i.e., deterministic encryption).
Despite the security against attacks targeting the cloud server, attacks against the proxy can leak keys and data from the logged in users. This happens because the proxy derives the keys at runtime and keeps them while a user is using the system. The other users (not logged in) are not affected.
CryptDB does not assure integrity and freshness of data, as well as completeness of results. It also does not cover attacks to user machines.
I made a presentation introducing CryptDB and uploaded the slides to SlideShare:
The source code of CryptDB is available at the official website. I tried to run it and had a few problems with the installation at first, but recently I found a Docker container that made everything easier.