the security of das— auditing for outsourced data 闫巧芝 2009.4.16
TRANSCRIPT
Outline
Background
Current approach
Signature Based
Data Structure Based
Probability Based ★ My thinking
Introduction
Providing Database As-a-Service Instead of taking care of all the database management task in house,
all the user’s data is sent to a service provider
The BENEFIT User can eliminate in-house hardware, software and expertise needs
to run DBMS
Challenges: Communication overhead .
.
Trusted client VS Untrusted server
Security Concern
•Our Focus
•Validation of query results (Correct & Complete & Freshness)
Server-Tamper with data Insert, delete or alter data stored in server Return part of the results
To be lazy Hostile Deleting
Ignore, or delay the update operations
Outline
Introduction
Current approach
Signature Based
Data Structure Based
Probability Based
My thinking
•Starting point :•Client audit the result to guarantee the server’s honesty
Signature Based基于数组签名验证方法 优点 --缺点
1.RSA Signature逐条验证,加密解密时间开销较大 ;结果集大时,所耗带宽与验证时间很大
;
可验证正确性不能验证完整性
2.Batch Verification of RSA Signature
批处理验证提高效率 ;
3.Condensed RSA signature
将多个数据签名集成为一个 ; 返回的结果集只有一个签名 ; 减少带宽消耗和客户端的计算 ; 只能对同一个用户的签名进行集成 ;
4.Aggregated-BGLS可以将多个用户的签名统一集成 ; 。对多用户签名的情况,验证的开销很大
;
5.Digital Signature Aggregation and Chaining (DSAC)
可验证 range query 的完整性 ; 需防止元组顺序信息的泄漏 ; 验证结果集正确性的消耗大 ; 对无顺序关系的属性列,难以验证完整性 ;
可验证正确性和 Range Query 的完整性
参考杨平平报告内容
Outline
Introduction
Current approach
Signature Based
Data Structure Based
Probability Based
My thinking
Data Structure Based
•Merkle Hash Tree Based Method Server
t1=(10,1)
t2=(20,2)
etuple a b
x4Z3tfX2ShOSM 10 E(1)=2
JpO8eLTVgwV1E
20 E(2)=5
Data Structure Based
Service Provider
1
2
3
A
Table T
4
h(1) h(2) h(3) h(4)
h(1|2) h(3|4)
h(1|2|3|4)
Signature(h(1|2|3|4))
Merkle Tree
Query Q
SELECT AFROM TWHERE A = 2
User Device
Public Key
2
A
T’ VO
h(1) h(2)
h(1|2) h(3|4)
h(1|2|3|4)
Signature(h(1|2|3|4))
•Merkle Hash Tree Based Method
Select A from T where A <3•A={1,2}, { h(3), h(4), Signature(h(1|2|3|4) }•A=2, { h(1), h(3|4), Signature(h(1|2|3|4) }
Data Structure Based
Merkle Hash Tree Based Method
优点 缺点
☆ 可以验证元组正确性 ☆ 能验证简单的元组级的 rang
e query 的结果完整性
◇ 不支持 join sum 、 avg 的正确性 / 完整性的验证 ◇ 每个涉及范围查询的属性都需建立 MHT 消耗大 ◇ 满二叉树,数据的每次更新都需调整 MHT ,维护困
难 ◇ 结点多,传输代价大。
Outline
Background
Current Methods
Signature Based
Data Structure Based
Probability Based
My thinking
Current methods
Challenge Token Method [Sion 2005 VLDB]
Fake Tuple Method [Min Xie 2007 VLDB] Freshness [Min Xie 2008 EDBT]
Dual Encryption Method [Haixun Wang 2008 CIKM]
Challenge Token Method
S1 S1
{q1,q2,…,qn }
Client ServerH(qi(S1))
{r1,r2,…,rn }
i
Motivation : to prevent laziness
Function HashingWay -One a is H
Current methods
Integrity Auditing of Outsourced Data
Challenge Token Method [Sion 2005 VLDB]
Fake Tuple Method [Min Xie 2007 VLDB] Freshness [Min Xie 2008 EDBT]
Dual Encryption Method [Haixun Wang 2008 CIKM]
Fake Tuple Method
Outline Motivation Security Model & Checking Method Checking Tuple Generating Freshness Guarantee
Drawbacks of previous methods
Major Concerned Factor:
The Server-Side is not a purely traditional relational engine, need
to combine with some additional data structure
How about Server just a traditional Relation Engine ?
How to check Correctness ?
How to check Completeness ?
Fake Tuple Method
Outline Motivation Security Model & Checking Method Checking Tuple Generating Freshness Guarantee
A High-Level View of Fake Tuple Method
Only concern with Simple Selection Query
The Process: [ Server has stored the fake tuples, which user knows clearly.] 1. The User send a query Q to the Server 2. Server execute Q on its site & return result R to User 3. User check on its site whether all checking tuples that
Covered By Q are really all in result Q
There is a probability that the attacker can escape from being caught.
Correctness Concern
Goal: Protect Tuple From Being Tampered
Assume the data has n fields {A1, A2, …, An}
A tuple t (a1, a2, …, an)
Tuple Checkinga...aaH
Tuple Originala...aaHheader
n
n
1)(
)(
10
10
Function HashingWay -One a is H
Checking Method
Goal of our checking Checking Tuples Covered by Q should be c
ontained in the result
Simply to check whether a set Ssub is contained in result set S can be very inefficient to implement
Here they Check Tuple Count instead to do the job Number of checking tuples in result R: CountA
Number of checking tuples should be contained in R: CountS
Check whether CountA equals CountS instead
TO
TC
Q
Fake Tuple Method
Outline Motivation Security Model & Checking Method Checking Tuple Generating Freshness Guarantee
A Naïve Checking Tuple Generating Scheme
A Naïve Solution We randomly generate a series of checking tuples Maintain a copy at the client site When a query Q is send to the server, also run the query on the client
site, check whether the checking tuple count of the client site result equals count in the result from server
•Drawback:
1. Have to store the set of checking tuples at the client site, which may cause great storage overhead
2. Search may inefficient.
A Novel Deterministic Method (1)
How about instead of storing the tuples, we just store a Function on the client side?
Intuitive Benefit: Small Storage Cost Efficient to get the checking tuple count
A Novel Deterministic Method (2)
We catch original data’s distribution using Histogram
Associate each bucket with the parameters for our deterministic function F (Here assume linear function)
Use a function F together with the distribution / density information of original data to generate the checking tuples
Checking Tuple Generating
The Generating Process
x1 x2 x3 x4
y4
y3
y2
y1
Encrypt
Function GeneratingChecking Tuple
Generating1 2
3
Join Auditing
Join two tables T1 and T2 SELECT * FROMT1 and T2 WHERE T1.B = T2.B
We have 4 cases here: Checking Tuples from T1 joins with Checking Tuples from T2 Checking Tuples from T1 joins with Original Tuples from T2 Original Tuples from T1 joins with Checking Tuples from T2 Original Tuples from T1 joins with Original Tuples from T2
Join Auditing
A B C
T1 T2
1 2
2 3
2 2
3 3
A C
1 2
2 3
2
3
Header1
O
C
BHeader2
O
O
Header2
O
O
Header1
O
C
B
4 1
5 10
C
C
3 4
4 5
O
C
3 4 1CO
4 5 10CC
Case 1
Case 3
Case 2
Case 4
T1.B = T2.B
T2T1
Fake Tuple Method
Outline Motivation
Security Model & Checking Method
Checking Tuple Generating
Freshness Guarantee
A High-Level View of Freshness Guarantee
Accompany “normal” updates with “fake” ones that the attacker cannot detect
Clients know about “fake” operations
Check whether results of “fake” updates that should have been executed actually show up in query result
A High-Level View of Implementation
Four Deterministic
At Deterministic Time Slot
According to Deterministic Generation Functions
To get Deterministic Update Generation
To generate the Deterministic fake tuples
Fake Tuple Method
优点 缺点
☆ 能验证结果的正确性; ☆ 能验证简单的 query , join
操作的返回结果集的完整性; ☆ 支持 client 的更新操作;
☆ client 无需维护大量元组(只需存储fake tuple 生成函数);
☆ 通过元组个数进行验证,因此效率性能较好。
☆ 可以保证数据的 Freshness
◇ 随着原数据的增加, server 中存储的 fake t
uple 成比例增加,进而浪费 server 存储空间,降低查询效率,增加传输压力
◇ 更新操作伴随着对 fake tuple 的调整和维护◇ client 需维护格结构,随更新操作进行调整
Current methods
Integrity Auditing of Outsourced Data
Challenge Token Method [Sion 2005 VLDB]
Fake Tuple Method [Min Xie 2007 VLDB] Freshness [Min Xie 2008 EDBT]
Dual Encryption Method [Haixun Wang 2008 CIKM]
Motivation Implement details
Data Transformation Query Composition
Security
•What if the two service providers conspire in cheating
Cost
•Using two service providers incurs double cost.
•Run time cost is also high
Motivation
Current methods
Integrity Auditing of Outsourced Data
Challenge Token Method [Sion 2005 VLDB]
Fake Tuple Method [Min Xie 2007 VLDB] Freshness [Min Xie 2008 EDBT]
Dual Encryption Method [Haixun Wang 2008 CIKM]
Motivation Implemention
Data Transformation Query Composition
Overview
Build on top of any encryption scheme that supports queries over encrypted data.
The idea is to encrypt some tuples in a database with two keys. Given a tuple r in the original database T and two encryption keys k and k0, th
e encrypted database Ts can contain both Ek(r) and Ek0 (r)
etuple
x4Z3tfX2ShOSM Ek ( 1 ) = 2
JpO8eLTVgwV1E Ek0 ( 1 )= 5
Server t = (1)
aI
Data Transformation
Dual information
client concern:1. whether t is a valid?2. Which part does t belong to ?
Check Completeness
q server
1 2) { ( ), ( ),..., ( )}iQ q q q (
1( )q
( )iq
Server
Client
)Q( '( )kq
Result-based Checking
t = (1,2,3) ,
a1 a2 a3
1 2 3… … …
Generate query q:
Can’t control the size of query result
Random Checking
according to the histogram of ai, roughly p percent of the data in the range of [c0, c1].
Check tuples maybe take little percent !
Far from efficient
Query-based Method
q: SELECT * FROM T WHERE C ;
1 2, ,..., kc c cuser-specified
overlap threshold
relax(c) 规则