A. Our contribution
In this paper, our main goal is to develop an efficient
solution for the E2E decentralized e-voting scheme without
the authenticated channel. To obtain this goal, we first redesign
elliptic curve cryptosystem based the privacy-preserving multiparty
sum protocol (PPSP for short) that is a variant of
Yang et al.’s solution [16]. Our PPSP is more efficient than
both the original protocol [16] and the 2-round anonymous
veto protocol [13] that is employed in [3]–[5]. Secondly,
we combine PPSP with a modified authentication method
to obtain the new decentralized e-voting scheme that has a
number of the following advantages:
No trusted party engages in the e-voting system. The
(untrusted) voting server only computes the public parameters
for the voters.
Each voter clicks his choice (e.g., yes/no buttons) on the
voting website to cast his encrypted ballot to the voting
server via the public network (e.g., Internet). No one
knows his selection beyond him (even if there are up
to some voter colluding with the voting server).
Our solution is efficient and convenient. Excepting the
pre-processing parameters stage, each voter only interacts
once with the voting server.
B. Organization
The main content of this paper is organized as follows.
Section II reviews the necessary preliminaries that used in
this work. Our main contribution is presented in Section III.
Finally, Section IV concludes the obtained results of this paper.
44 trang |
Chia sẻ: Tuệ An 21 | Ngày: 08/11/2024 | Lượt xem: 64 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Luận án Nghiên cứu phát triển một số giao thức tính tổng bảo mật hiệu quả trong mô hình dữ liệu phân tán đầy đủ và ứng dụng, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
BỘ GIÁO DỤC
VÀ ĐÀO TẠO
VIỆN HÀN LÂM KHOA HỌC
VÀ CÔNG NGHỆ VIỆT NAM
HỌC VIỆN KHOA HỌC VÀ CÔNG NGHỆ
-------------------------------
VŨ DUY HIẾN
DANH MỤC CÔNG TRÌNH CÔNG BỐ
NGHIÊN CỨU PHÁT TRIỂN MỘT SỐ GIAO THỨC TÍNH
TỔNG BẢO MẬT HIỆU QUẢ TRONG MÔ HÌNH DỮ LIỆU
PHÂN TÁN ĐẦY ĐỦ VÀ ỨNG DỤNG
LUẬN ÁN TIẾN SĨ NGÀNH HỆ THỐNG THÔNG TIN
Mã số: 9 48 01 04
Hà Nội, 2024
Danh mục các công trình của tác giả
1. Duy-Hien Vu, The-Dung Luong, Tu-Bao Ho, and Chung-Tien Nguyen.
Privacy-preserving frequency mining protocol based on elliptic curve ElGamal
cryptosystem. HNUE Journal of Science, 63:87-96, 2018
2. Duy-Hien Vu, The-Dung Luong, Tu-Bao Ho, and Chung-Tien Nguyen. An
Efficient Approach for Electronic Voting Scheme without An Authenticated
Channel. In Proceedings of the 10th International Conference on Knowledge
and Systems Engineering, 376-381. IEEE, 2018
3. Duy-Hien Vu, The-Dung Luong, and Tu-Bao Ho. An efficient approach for
secure multi-party computation without authenticated channel. Information
Sciences, 527:356-368, 2020
4. Duy-Hien Vu, Trong-Sinh Vu, and The-Dung Luong. An efficient and practical
approach for privacy-preserving Naive Bayes classifcation. Journal of
Information Security and Applications, 68, 2022
5. Vu Duy Hien, Luong The Dung, and Hoang Duc Tho. An Efficient Solution for
Privacy-preserving Naive Bayes Classifcation in Fully Distributed Data Model.
Journal of Science and Technology on Information Security, 15:56-62, 2022
1
HNUE JOURNAL OF SCIENCE DOI: ...
Natural Sciences 2018, Volume ..., Issue ..., pp. ...-...
This paper is available online at
PRIVACY-PRESERVING FREQUENCY MINING PROTOCOL BASED ON ELLIPTIC
CURVE ELGAMAL CRYPTOSYSTEM
Vu Duy Hien
1
, Luong The Dung
2
, Ho Tu Bao
3
and Nguyen Chung Tien
2
1
Faculty of Management Information Systems, Banking Academy of Vietnam
2
Faculty of Information Security, Academy of Cryptography Techniques
3
School of Knowledge Science, Japan Advanced Institute of Science and Technology
Abstract. Privacy-preserving frequency mining is a quite simple technique, but it is very
useful in privacy-preserving machine learning and data mining. In this paper, we construct an
elliptic curve analog of the ElGamal system-based protocol for privacy-preserving frequency
mining in fully distributed setting. In comparison to the original protocol of Yang et al., our
solution has much lower communication overhead. Moreover, the experiments show that the
executing time of our proposed solution is also lower than that of the original one.
Keywords: Privacy-preserving data mining, Secure multi-party computation, Elliptic curve
cryptosystem.
1. Introduction
The term data mining has appeared in the database community since 1990s. This term aims to
discover knowledge from large datasets. However, for the data that contains the sensitive and private
information (e.g., the patients' disease information, the customers' income), traditional data mining
process is incompatible. So, the issues of privacy preservation in data mining has attracted a lot of
attention from the research community. This called privacy-preserving data mining (PPDM for short).
Basically, a privacy-preserving data mining solution has three basic properties as follows:
Accuracy: the accuracy of output result is not lost.
Privacy: the sensitive and private information is not disclosed.
Efficiency: the PPDM solution’s performance is high enough to be used to develop the practical
applications.
Where the accuracy and privacy characteristics are strictly required.
There are two approaches to construct a PPDM solution: perturbation-based and cryptographic-
based approaches. The solutions based on the perturbation approach are very efficient, but have a
trade-off between privacy and accuracy. For the PPDM solutions based on cryptography, the privacy
of data holders is safely preserved and the output result is accurately guaranteed, but the performance
is quite poor [1].
In this work, we focus cryptography-based privacy-preserving frequency mining (PPFM for short)
protocol that is a quite simple technique, but it is very useful in privacy-preserving machine
learning and data mining [2]. Furthermore, we consider the PPFM solution for fully distributed
setting where the data set is distributed across a large number of users, and each record is only held by
one party.
Vu Duy Hien, Luong The Dung, Ho Tu Bao and Nguyen Chung Tien
2
In the literature, many cryptographic solutions have proposed for PPFM in fully distributed
setting. They are used to construct the practical applications such as ID3 tree and association rules
mining [2], Naive Bayes classifier [2], electronic voting system [3-5].
To the best of our knowledge, the first cryptographic protocol for PPFM in fully distributed
scenario was introduced in [2] by Yang et al. This solution does not need communication channels
between different users. It also does not require multi-round interaction between any party and the
miner. In addition, this protocol provides strong privacy for each user without loss of accuracy.
However, because the solution of Yang et al. [2] is based on ElGamal cryptosystem, so the
performance of [2] is quite poor.
Lately, Hao et al. proposed a series of election voting systems [3, 4] based on a privacy-
preserving frequency counting protocol that called 2-round anonymous veto [6]. These protocols
can safely protect the information of each voter’s ballot. Moreover, they also guarantee that the
voting result is counted correctly. However, the computational complexity and communication
cost of each voter in [3] are very expensive. Inspiring from the works [6] and [3], the authors
developed the voting scheme [4] using the DRE-i system to compute the restructured public key
for each voter. So the voters’ costs reduce greatly, but the total computational complexity of
voting system increases, even the performance of [4] is poorer than that of [2].
Based on Boneh-Franklin identity-based encryption, Wu et al. constructed a privacy-
preservation protocol [7] for mining of support counts in fully distributed scenario. The authors
show that this protocol is very efficient and practical, but its privacy is not guaranteed since the
secret master key s is known by all parties. Several other protocols [8{11] that have the same ideal
with PPFM have proposed. However, these solutions have the low privacy level, since they need
to use a trusted third party.
Recently, Hao et al. proposed the verifiable classroom voting system [5] that is also based on
elliptic curve analog of the ElGamal system. Although the computational complexity and
communication cost of each voter is optimized, the total computational complexity of the voting
system is equal to that of the protocol [4].
In briefly, most of existing solutions for PPFM in fully distributed setting have a trade-off
between privacy and efficiency. Therefore, it is very significant to develop the efficient PPFM
solutions for fully distributed setting while the accuracy is intact and the privacy is still protected
safely.
In this paper, our main goal is to develop the efficient solution for PPFM in fully distributed
setting. To obtain this goal, we first redesign the original PPFM protocol mentioned in Yang et
al.’s protocol [2]. Next, we optimize this redesigned PPFM protocol based on elliptic curve analog
of the ElGamal system. And therefore, our solution’s performance is better than that of [2]. To
illustrate the efficiency of our solution, we implement it to compute the frequency value for
different numbers of users from 2000 to 10000.
Received July 25, 2018. Revised August 8, 2018. Accepted August 15, 2018.
Contact Vu Duy Hien, e-mail: hienvd@bav.edu.vn.
2. Preliminaries
2.1. Problem definition
In the fully distributed setting, there are 𝑛 users {𝑈1, , 𝑈𝑛}, in which each user 𝑈𝑖 holds a
private boolean value 𝑣𝑖 {0,1}, and the miner who needs to find out the sum of all users’ private
values 𝑠 = ∑ 𝑣𝑖
𝑛
𝑖=1 .
Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem
3
Inspiring from the work of Yang et al. [2], we design elliptic curve analog of the ElGamal
system-based PPFM protocol that allows the miner to compute the value s without knowing the
private values.
2.2. Definition of Privacy
In this study, our protocol is based on the semi-honest model that each user must follow the
rules of the protocol, but anyone may be corrupted. Thus, we have the definition of privacy for
frequency mining in fully distributed setting [2, 12] as follows:
Definition 1. Assume that each user 𝑈𝑖 has private keys 𝑝𝑖 , 𝑞𝑖 and public keys 𝑃𝑖, 𝑄𝑖 . A
frequency mining protocol protects each user’s privacy against the miner and 𝑡 corrupted users in
the semi-honest model if, ∀𝐼 ⊆ {1, 2, , 𝑛} such that |𝐼| = 𝑡 , there exists a probabilistic
polynomial-time algorithm M such that:
{𝑀 (𝑠, [𝑣𝑖, 𝑝𝑖, 𝑞𝑖]𝑖∈𝐼 , [𝑃𝑗, 𝑄𝑗]𝑗∉𝐼)} ≡
𝑐 {𝑣𝑖𝑒𝑤𝑀𝑖𝑛𝑒𝑟,{𝑈𝑖}𝑖∈𝐼([𝑣𝑖,𝑝𝑖,𝑞𝑖]𝑖=1
𝑛 )}
Where ≡
𝑐
is computational indistinguishability.
This definition states that the computation is secure and the honest users’ privacy is
guaranteed, if the miner and the corrupted users learn nothing from the output s and the public
values of the honest users.
2.3. Elliptic curve analog of the ElGamal system
In this section, we review elliptic curve analog of the ElGamal system [13] that is the main
facility to construct our solution.
Let 𝐸(𝐹𝑑) be an elliptic curve over a finite field 𝐹𝑞 with a point 𝑂 at infinity and q be a large
prime, in which the discrete logarithm problem on the elliptic curve 𝐸 is hard. In addition, G is a
base point of the elliptic curve E with order q (i.e., 𝑞. 𝐺 = 𝑂). The private key is the random
number 𝑑 [1; 𝑞 − 1], and the corresponding public key curve point is 𝑄 = 𝑑. 𝐺.
To encrypt the plaintext m, the sender uses the public key 𝑄 to compute the ciphertext 𝐶
from the plaintext m as follows: he randomly chooses k from [1; 𝑞 − 1] and computes the
ciphertext 𝐶(𝐶1 = 𝑃𝑚 + 𝑘. 𝑄; 𝐶2 = 𝑘. 𝐺) where 𝑃𝑚 is a point of 𝐸 and 𝑥𝑃𝑚 = 𝑚. To decrypt
the ciphertext 𝐶 using the private key 𝑑 , the receiver may compute 𝑚 = 𝑥𝑀 , in which 𝑀 =
𝐶1 + (−𝑑. 𝐶2).
Under the decisional Diffie-Hellman assumption for the curve E, elliptic curve analog of the
ElGamal system is semantically secure.
3. Privacy-preserving frequency mining protocol in fully distributed
setting
3.1. Setup
Let 𝐸(𝐹𝑑) be an elliptic curve with a point 𝑂 at infinity and d be a large prime, in which the
discrete logarithm problem on the elliptic curve 𝐸 is hard. In addition, 𝐺 is a base point of the
elliptic curve E with order d (i.e., 𝑑. 𝐺 = 𝑂).
Each user 𝑈𝑖 keeps a private value 𝑣𝑖 {0,1}. Nobody knows this value, beyond him. Before
the PPFM protocol starts, each user chooses two private keys 𝑝𝑖, 𝑞𝑖 [1; 𝑑 − 1], after that he
computes the corresponding public keys 𝑃𝑖 = 𝑝𝑖 . 𝐺, 𝑄𝑖 = 𝑞𝑖. 𝐺. These public keys sent to the miner
before the protocol starts.
3.2. Protocol
Vu Duy Hien, Luong The Dung, Ho Tu Bao and Nguyen Chung Tien
4
The PPFM protocol in fully distributed setting consists of three main phases described in
Figure 1.
PHASE 1: PRE-COMPUTING
Miner pre-computes the public values:
𝑃 = ∑ 𝑝𝑖
𝑛
𝑖=1 ; 𝑄 = ∑ 𝑞𝑖
𝑛
𝑖=1
Miner 𝑼𝒊: 𝑃, 𝑄
PHASE 2: COMPUTING THE MESSAGE
𝑼𝒊 computes:
𝑀𝑖 = 𝑣𝑖 . 𝐺 + 𝑞𝑖. 𝑃 − 𝑝𝑖 . 𝑄
𝑼𝒊 Miner: 𝑀𝑖
PHASE 3: SECURE FREQUENCY COMPUTATION
Miner computes:
𝑀 = ∑ 𝑀𝑖
𝑛
𝑖=1
𝐾 ∶= 𝑂.
𝐹𝑜𝑟 𝑠 = 0 𝑡𝑜 𝑛:
𝐼𝑓 𝐾 = 𝑀, 𝑡ℎ𝑒𝑛 𝑜𝑢𝑡𝑝𝑢𝑡 𝑠.
𝐸𝑙𝑠𝑒 𝐾 ∶= 𝐾 + 𝐺.
Figure 1. A privacy-preserving frequency mining protocol for fully distributed setting
3.3. Proof of Correctness
In this section, we show that the final output of the PPFM protocol in fully distributed setting
based on elliptic curve analog of the ElGamal system is the sum of all parties’ private values. To
do this, we prove the following theorem.
Theorem 1. The protocol for privacy-preserving frequency mining presented in Figure 1
exactly counts the number of 1’s values of all users’ inputs.
Proof. We show that, in this protocol, if the miner finds out a value s, then s is the secure sum
of all parties’ private values.
Suppose that s.G = M. Then:
s.G = ∑ 𝑀𝑖
𝑛
𝑖=1
s.G = ∑ (𝑣𝑖. 𝐺 + 𝑞𝑖. 𝑃 − 𝑝𝑖 . 𝑄)
𝑛
𝑖=1
s.G = ∑ 𝑣𝑖 . 𝐺 + ∑ (𝑞𝑖 ∑ 𝑃𝑘
𝑛
𝑘=1 − 𝑝𝑖 ∑ 𝑄𝑘
𝑛
𝑘=1 )
𝑛
𝑖=1
𝑛
𝑖=1
s.G = ∑ 𝑣𝑖 . 𝐺 + ∑ 𝑞𝑖 ∑ 𝑝𝑘 . 𝐺
𝑛
𝑘=1 − ∑ 𝑝𝑖
𝑛
𝑖 ∑ 𝑞𝑘 . 𝐺
𝑛
𝑘=1
𝑛
𝑖=1
𝑛
𝑖=1
s.G = ∑ 𝑣𝑖 . 𝐺
𝑛
𝑖=1
Thus, 𝑠. 𝐺 = ∑ 𝑣𝑖. 𝐺
𝑛
𝑖=1 , and therefore 𝑠 = ∑ 𝑣𝑖
𝑛
𝑖=1 . Note that the value of s is not too large,
so it can be computed by the brute-force method.
3.4. Privacy Analysis
In this section, we first prove that the PPFM protocol in fully distributed setting protects each
honest user’s privacy in the semi-honest model under the necessary assumptions. Then, we show
that this protocol still preserves each honest user’s privacy in the case of (𝑛 − 2) parties
colluding with the miner.
We recall that, each user 𝑈𝑖 only sends a point 𝑀𝑖 that is the ciphertext of his private value.
This point is represented as the following equation:
Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem
5
𝑀𝑖 = 𝑣𝑖 . 𝐺 − 𝑝𝑖 . 𝑄 + 𝑞𝑖 ∑ 𝑝𝑘 . 𝐺
𝑛
𝑘=1
We easily decide that the ciphertext 𝑀𝑖 is equivalent to the first part of an elliptic curve
analog of the ElGamal (𝑃𝑚 + 𝑞𝑖. 𝑃, 𝑞𝑖. 𝐺) respectively 𝑃𝑚 = 𝑣𝑖 . 𝐺 − 𝑝𝑖 . 𝑄, the private key is ∑ 𝑝𝑖
and 𝑞𝑖 is uniformly chosen at random from [1,2, , 𝑑 − 1]. Under the decisional Diffie-Hellman
assumption on the elliptic curve, the elliptic curve analog of the ElGamal cryptosystem is
semantically secure. Thus, our protocol preserves each honest user’s privacy in the semi-honest
model.
Continuously, we prove that the new privacy-preserving sum protocol protects each user’s
privacy (even if there are up to 𝑛 − 2 users colluding with the miner) as long as the elliptic curve
analog of the ElGamal encryption scheme is secure. We have the following theorem:
Theorem 2. The protocol for privacy-preserving frequency mining in fully distributed setting
presented in Figure 1 protects each honest user’s privacy against the miner and up to (𝑛 − 2)
corrupted users.
Proof. We construct a simulator M that simulates computing the joint view of the miner and
the corrupted users by a polynomial time algorithm. In particular, we give an algorithm that
computes the view of the miner and the corrupted users in polynomial time only using the final
output s, corrupted users’ knowledge, public keys, and some elliptic curve analog of the ElGamal
encryption. Therefore, combining our algorithm with a simulator for the ciphertexts, we obtain a
complete proof.
Without loss of generality, we assume that 𝑈1 and 𝑈2 do not collude and 𝐼 = {3, 4, , 𝑛}. In
the protocol presented in Figure 1, each user only sends a point 𝑀𝑖 to the miner. So our algorithm
only simulates the computation for 𝑀1, 𝑀2. Below is the computations of simulator M based on
the view of the miner and the corrupted users using some encryption as its input: (𝑈12, 𝑉12) =
{𝑣2. 𝐺 + 𝑞1. (𝑝2. 𝐺), 𝑝2. 𝐺}, (𝑈21, 𝑉21) = {𝑣1. 𝐺 + 𝑞2. (𝑝1. 𝐺), 𝑝1. 𝐺}.
Simulator M computes 𝑀1, 𝑀2 as follows:
𝑀1
′ = 𝑈12 + 𝑄1. ∑ 𝑝𝑖
𝑖∈𝐼
− 𝑈21 − 𝑃1. ∑ 𝑞𝑖
𝑖∈𝐼
𝑀2
′ = 𝑈21 + 𝑄2. ∑ 𝑝𝑖
𝑖∈𝐼
− 𝑈12 − 𝑃2. ∑ 𝑞𝑖
𝑖∈𝐼
Thus, following the definition 1, our PPFM protocol for fully distributed scenario is
semantically secure.
3.5. Performance Evaluation
In this section, we implement our solution and the original protocol [2] in the C# language of
Visual Studio 2010 environment, using the System.Numerics namespace to compare the
performance of them (i.e., communication overhead and time complexity). Note that all public
key operations in our protocol are defined over the safe curve 25519 [14], and the protocol [2]
uses 256 𝑏𝑖𝑡𝑠 private keys and 3072 𝑏𝑖𝑡𝑠 public keys that have the same security level with the
curve 25519. Moreover, our experiments run on the laptop with a 2.6𝐺𝐻𝑧 Intel core 𝑖5 processor
and 4𝐺𝐵 memory.
For the communication overhead comparison, we consider the number of communication
messages and these length (bits) in all phases of our solution and the protocol [2].
For the time complexity comparison, we measure the total executing time of each protocol
for different numbers of users, from 2000 to 10000. This time consists of the time for each user
Vu Duy Hien, Luong The Dung, Ho Tu Bao and Nguyen Chung Tien
6
to perform the necessary computations and the time required for the miner. We assume that all
users perform their tasks at the same time, and the network latency is not included in the total
executing time.
3.5.1. Communication Overhead
Considering the protocol of Yang et al. [2], before this protocol starts, each user needs to
send two public keys to the miner. After the miner computes two public keys, he sends these keys
for all users. In the first phase of [2], each user 𝑈𝑖 also needs to send two values 𝑚𝑖; ℎ𝑖 to the
miner. Because each public key is 3072 bits length, the protocol [2] exchanges 6n messages using
18432n bits where n is the number of users.
For our solution, before it starts, each user needs to send two public keys (i.e., two points) to
the miner. Next, in the first phase, the miner computes two public keys, after that he sends them to
all users. In the second phase, each user needs to only send a point 𝑀𝑖 to the miner. Because each
point of the curve consists of two elements in which each element is 256 bits length, so our
solution only exchanges 10n messages using 2560n bits in which n is the number of users.
Table 1 presents the communication overhead comparison between our solution and Yang et
al.’s protocol [2]. We can see that our solution exchanges more number of messages than the
protocol of Yang et al. However, the proposed solution transfers much lower number of bits than
the protocol [2].
Table 1. The communication overhead comparison between our solution and Yang et al.’s
protocol.
Protocols The number of messages The number of bits
The protocol [2] 6n 18432n
Our solution 10n 2560n
3.5.2. Time complexity of the protocol
As presented before, the new protocol is improved from the solution [2]. In particular, in
Yang et al.’s protocol, each user must compute two values 𝑚𝑖 and ℎ𝑖 to send to the miner. Based
on the tuples of two values, the miner computes the multiplication of the values
𝑚𝑖
ℎ𝑖
. Hence, the
computational complexity of the miner is high.
Unlike the protocol [2], in our solution, each user only computes a unique point 𝑀𝑖 and the
miner only computes the sum of the points 𝑀𝑖 . However, this only makes each user’s
computational complexity increase negligibly. Furthermore, the computational complexity of the
miner reduces greatly. Thus, the total executing time of our protocol is much lower than that of
the original protocols of Yang et al. as shown in Figure 2.
Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem
7
Figure 2. The computing frequency value time in fully distributed setting comparisons between
our solution and Yang et al.’s protocol
According to the comparison results, we can state that our solution is more efficient