the security of das— auditing for outsourced data 闫巧芝 2009.4.16

The Security of DAS—Auditing for Outsourced Data

闫巧芝 2009.4.16

Outline

Background

Current approach

Signature Based

Data Structure Based

Probability Based ★ My thinking

Introduction

Providing Database As-a-Service Instead of taking care of all the database management task in house,

all the user’s data is sent to a service provider

The BENEFIT User can eliminate in-house hardware, software and expertise needs

to run DBMS

Challenges: Communication overhead .

.

Trusted client VS Untrusted server

Security Concern

•Our Focus

•Validation of query results (Correct & Complete & Freshness)

Server-Tamper with data Insert, delete or alter data stored in server Return part of the results

To be lazy Hostile Deleting

Ignore, or delay the update operations

Outline

Introduction

Current approach

Signature Based


Probability Based

My thinking

•Starting point ：•Client audit the result to guarantee the server’s honesty

Signature Based基于数组签名验证方法优点 --缺点

1.RSA Signature逐条验证，加密解密时间开销较大 ;结果集大时，所耗带宽与验证时间很大

;

可验证正确性不能验证完整性

2.Batch Verification of RSA Signature

批处理验证提高效率 ;

3.Condensed RSA signature

将多个数据签名集成为一个 ; 返回的结果集只有一个签名 ; 减少带宽消耗和客户端的计算 ; 只能对同一个用户的签名进行集成 ;

4.Aggregated-BGLS可以将多个用户的签名统一集成 ; 。对多用户签名的情况，验证的开销很大

;

5.Digital Signature Aggregation and Chaining (DSAC)

可验证 range query 的完整性 ; 需防止元组顺序信息的泄漏 ; 验证结果集正确性的消耗大 ; 对无顺序关系的属性列，难以验证完整性 ;

可验证正确性和 Range Query 的完整性

参考杨平平报告内容

Outline

Introduction

Current approach

Signature Based


Probability Based

My thinking


•Merkle Hash Tree Based Method Server

t1=(10,1)

t2=(20,2)

etuple a b

x4Z3tfX2ShOSM 10 E(1)=2

JpO8eLTVgwV1E

20 E(2)=5


Service Provider

1

2

3

A

Table T

4

h(1) h(2) h(3) h(4)

h(1|2) h(3|4)

h(1|2|3|4)

Signature(h(1|2|3|4))

Merkle Tree

Query Q

SELECT AFROM TWHERE A = 2

User Device

Public Key

2

A

T’ VO

h(1) h(2)

h(1|2) h(3|4)

h(1|2|3|4)

Signature(h(1|2|3|4))

•Merkle Hash Tree Based Method

Select A from T where A <3•A={1,2}, { h(3), h(4), Signature(h(1|2|3|4) }•A=2, { h(1), h(3|4), Signature(h(1|2|3|4) }


Merkle Hash Tree Based Method

优点缺点

☆ 可以验证元组正确性 ☆ 能验证简单的元组级的 rang

e query 的结果完整性

◇ 不支持 join sum 、 avg 的正确性 / 完整性的验证 ◇ 每个涉及范围查询的属性都需建立 MHT 消耗大 ◇ 满二叉树，数据的每次更新都需调整 MHT ，维护困

难 ◇ 结点多，传输代价大。

Outline

Background

Current Methods

Signature Based


Probability Based

My thinking

Current methods

Challenge Token Method [Sion 2005 VLDB]

Fake Tuple Method [Min Xie 2007 VLDB] Freshness [Min Xie 2008 EDBT]

Dual Encryption Method [Haixun Wang 2008 CIKM]

Challenge Token Method

S1 S1

{q1,q2,…,qn }

Client ServerH(qi(S1))

{r1,r2,…,rn }

i

Motivation ： to prevent laziness

Function HashingWay -One a is H

Current methods

Integrity Auditing of Outsourced Data




Fake Tuple Method

Outline Motivation Security Model & Checking Method Checking Tuple Generating Freshness Guarantee

Drawbacks of previous methods

Major Concerned Factor:

The Server-Side is not a purely traditional relational engine, need

to combine with some additional data structure

How about Server just a traditional Relation Engine ?

How to check Correctness ?

How to check Completeness ?

Fake Tuple Method


A High-Level View of Fake Tuple Method

Only concern with Simple Selection Query

The Process: [ Server has stored the fake tuples, which user knows clearly.] 1. The User send a query Q to the Server 2. Server execute Q on its site & return result R to User 3. User check on its site whether all checking tuples that

Covered By Q are really all in result Q

There is a probability that the attacker can escape from being caught.

Correctness Concern

Goal: Protect Tuple From Being Tampered

Assume the data has n fields {A1, A2, …, An}

A tuple t (a1, a2, …, an)

Tuple Checkinga...aaH

Tuple Originala...aaHheader

n

n

1)(

)(

10

10

Function HashingWay -One a is H

Checking Method

Goal of our checking Checking Tuples Covered by Q should be c

ontained in the result

Simply to check whether a set Ssub is contained in result set S can be very inefficient to implement

Here they Check Tuple Count instead to do the job Number of checking tuples in result R: CountA

Number of checking tuples should be contained in R: CountS

Check whether CountA equals CountS instead

TO

TC

Q

Fake Tuple Method


A Naïve Checking Tuple Generating Scheme

A Naïve Solution We randomly generate a series of checking tuples Maintain a copy at the client site When a query Q is send to the server, also run the query on the client

site, check whether the checking tuple count of the client site result equals count in the result from server

•Drawback:

1. Have to store the set of checking tuples at the client site, which may cause great storage overhead

2. Search may inefficient.

A Novel Deterministic Method (1)

How about instead of storing the tuples, we just store a Function on the client side?

Intuitive Benefit: Small Storage Cost Efficient to get the checking tuple count

A Novel Deterministic Method (2)

We catch original data’s distribution using Histogram

Associate each bucket with the parameters for our deterministic function F (Here assume linear function)

Use a function F together with the distribution / density information of original data to generate the checking tuples

Checking Tuple Generating

The Generating Process

x1 x2 x3 x4

y4

y3

y2

y1

Encrypt

Function GeneratingChecking Tuple

Generating1 2

3

Completeness Checking

Join Auditing

Join two tables T1 and T2 SELECT * FROMT1 and T2 WHERE T1.B = T2.B

We have 4 cases here: Checking Tuples from T1 joins with Checking Tuples from T2 Checking Tuples from T1 joins with Original Tuples from T2 Original Tuples from T1 joins with Checking Tuples from T2 Original Tuples from T1 joins with Original Tuples from T2

Join Auditing

A B C

T1 T2

1 2

2 3

2 2

3 3

A C

1 2

2 3

2

3

Header1

O

C

BHeader2

O

O

Header2

O

O

Header1

O

C

B

4 1

5 10

C

C

3 4

4 5

O

C

3 4 1CO

4 5 10CC

Case 1

Case 3

Case 2

Case 4

T1.B = T2.B

T2T1

Bucket Optimize Concern

How can we compress the data structure?

Fake Tuple Method

Outline Motivation

Security Model & Checking Method

Checking Tuple Generating

Freshness Guarantee

A High-Level View of Freshness Guarantee

Accompany “normal” updates with “fake” ones that the attacker cannot detect

Clients know about “fake” operations

Check whether results of “fake” updates that should have been executed actually show up in query result

A High-Level View of Implementation

Four Deterministic

At Deterministic Time Slot

According to Deterministic Generation Functions

To get Deterministic Update Generation

To generate the Deterministic fake tuples

Fake Tuple Method

优点缺点

☆ 能验证结果的正确性； ☆ 能验证简单的 query ， join

操作的返回结果集的完整性； ☆ 支持 client 的更新操作；

☆ client 无需维护大量元组（只需存储fake tuple 生成函数）；

☆ 通过元组个数进行验证，因此效率性能较好。

☆ 可以保证数据的 Freshness

◇ 随着原数据的增加， server 中存储的 fake t

uple 成比例增加，进而浪费 server 存储空间，降低查询效率，增加传输压力

◇ 更新操作伴随着对 fake tuple 的调整和维护◇ client 需维护格结构，随更新操作进行调整

Current methods





Motivation Implement details

Data Transformation Query Composition

Security

•What if the two service providers conspire in cheating

Cost

•Using two service providers incurs double cost.

•Run time cost is also high

Motivation

Current methods





Motivation Implemention

Data Transformation Query Composition

Overview

Build on top of any encryption scheme that supports queries over encrypted data.

The idea is to encrypt some tuples in a database with two keys. Given a tuple r in the original database T and two encryption keys k and k0, th

e encrypted database Ts can contain both Ek(r) and Ek0 (r)

etuple

x4Z3tfX2ShOSM Ek （ 1 ） = 2

JpO8eLTVgwV1E Ek0 （ 1 ）= 5

Server t = (1)

aI

Data Transformation

Dual information

client concern:1. whether t is a valid?2. Which part does t belong to ?

q: SELECT * FROM T WHERE predicate ;

attribute value

=,>,<,

Query Encryption

Ek(t) Ek’(t)

Query Correspondence Attacks

Server•Query results on dual encrypted database

Client

Check Completeness

q server

1 2) { ( ), ( ),..., ( )}iQ q q q （

1( )q

( )iq

Server

Client

)Q（ '( )kq

Query Generating

Result-based Random method Query-based method My thinking

Result-based Checking

t = (1,2,3) ,

a1 a2 a3

1 2 3… … …

Generate query q:

Can’t control the size of query result

Random Checking

according to the histogram of ai, roughly p percent of the data in the range of [c0, c1].

Check tuples maybe take little percent !

Far from efficient

Query-based Method

q: SELECT * FROM T WHERE C ;

1 2, ,..., kc c cuser-specified

overlap threshold

relax(c) 规则

Thanks for your attention!

Q&A?

the security of das— auditing for outsourced data 闫巧芝 2009.4.16

Documents