Luận án Nghiên cứu phát triển một số giao thức tính tổng bảo mật hiệu quả trong mô hình dữ liệu phân tán đầy đủ và ứng dụng

A. Our contribution In this paper, our main goal is to develop an efficient solution for the E2E decentralized e-voting scheme without the authenticated channel. To obtain this goal, we first redesign elliptic curve cryptosystem based the privacy-preserving multiparty sum protocol (PPSP for short) that is a variant of Yang et al.’s solution [16]. Our PPSP is more efficient than both the original protocol [16] and the 2-round anonymous veto protocol [13] that is employed in [3]–[5]. Secondly, we combine PPSP with a modified authentication method to obtain the new decentralized e-voting scheme that has a number of the following advantages: No trusted party engages in the e-voting system. The (untrusted) voting server only computes the public parameters for the voters. Each voter clicks his choice (e.g., yes/no buttons) on the voting website to cast his encrypted ballot to the voting server via the public network (e.g., Internet). No one knows his selection beyond him (even if there are up to some voter colluding with the voting server). Our solution is efficient and convenient. Excepting the pre-processing parameters stage, each voter only interacts once with the voting server. B. Organization The main content of this paper is organized as follows. Section II reviews the necessary preliminaries that used in this work. Our main contribution is presented in Section III. Finally, Section IV concludes the obtained results of this paper.

44 trang | Chia sẻ: Tuệ An 21 | Ngày: 08/11/2024 | Lượt xem: 568 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Luận án Nghiên cứu phát triển một số giao thức tính tổng bảo mật hiệu quả trong mô hình dữ liệu phân tán đầy đủ và ứng dụng, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

BỘ GIÁO DỤC VÀ ĐÀO TẠO VIỆN HÀN LÂM KHOA HỌC VÀ CÔNG NGHỆ VIỆT NAM HỌC VIỆN KHOA HỌC VÀ CÔNG NGHỆ ------------------------------- VŨ DUY HIẾN DANH MỤC CÔNG TRÌNH CÔNG BỐ NGHIÊN CỨU PHÁT TRIỂN MỘT SỐ GIAO THỨC TÍNH TỔNG BẢO MẬT HIỆU QUẢ TRONG MÔ HÌNH DỮ LIỆU PHÂN TÁN ĐẦY ĐỦ VÀ ỨNG DỤNG LUẬN ÁN TIẾN SĨ NGÀNH HỆ THỐNG THÔNG TIN Mã số: 9 48 01 04 Hà Nội, 2024 Danh mục các công trình của tác giả 1. Duy-Hien Vu, The-Dung Luong, Tu-Bao Ho, and Chung-Tien Nguyen. Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem. HNUE Journal of Science, 63:87-96, 2018 2. Duy-Hien Vu, The-Dung Luong, Tu-Bao Ho, and Chung-Tien Nguyen. An Efficient Approach for Electronic Voting Scheme without An Authenticated Channel. In Proceedings of the 10th International Conference on Knowledge and Systems Engineering, 376-381. IEEE, 2018 3. Duy-Hien Vu, The-Dung Luong, and Tu-Bao Ho. An efficient approach for secure multi-party computation without authenticated channel. Information Sciences, 527:356-368, 2020 4. Duy-Hien Vu, Trong-Sinh Vu, and The-Dung Luong. An efficient and practical approach for privacy-preserving Naive Bayes classifcation. Journal of Information Security and Applications, 68, 2022 5. Vu Duy Hien, Luong The Dung, and Hoang Duc Tho. An Efficient Solution for Privacy-preserving Naive Bayes Classifcation in Fully Distributed Data Model. Journal of Science and Technology on Information Security, 15:56-62, 2022 1 HNUE JOURNAL OF SCIENCE DOI: ... Natural Sciences 2018, Volume ..., Issue ..., pp. ...-... This paper is available online at PRIVACY-PRESERVING FREQUENCY MINING PROTOCOL BASED ON ELLIPTIC CURVE ELGAMAL CRYPTOSYSTEM Vu Duy Hien 1 , Luong The Dung 2 , Ho Tu Bao 3 and Nguyen Chung Tien 2 1 Faculty of Management Information Systems, Banking Academy of Vietnam 2 Faculty of Information Security, Academy of Cryptography Techniques 3 School of Knowledge Science, Japan Advanced Institute of Science and Technology Abstract. Privacy-preserving frequency mining is a quite simple technique, but it is very useful in privacy-preserving machine learning and data mining. In this paper, we construct an elliptic curve analog of the ElGamal system-based protocol for privacy-preserving frequency mining in fully distributed setting. In comparison to the original protocol of Yang et al., our solution has much lower communication overhead. Moreover, the experiments show that the executing time of our proposed solution is also lower than that of the original one. Keywords: Privacy-preserving data mining, Secure multi-party computation, Elliptic curve cryptosystem. 1. Introduction The term data mining has appeared in the database community since 1990s. This term aims to discover knowledge from large datasets. However, for the data that contains the sensitive and private information (e.g., the patients' disease information, the customers' income), traditional data mining process is incompatible. So, the issues of privacy preservation in data mining has attracted a lot of attention from the research community. This called privacy-preserving data mining (PPDM for short). Basically, a privacy-preserving data mining solution has three basic properties as follows: Accuracy: the accuracy of output result is not lost. Privacy: the sensitive and private information is not disclosed. Efficiency: the PPDM solution’s performance is high enough to be used to develop the practical applications. Where the accuracy and privacy characteristics are strictly required. There are two approaches to construct a PPDM solution: perturbation-based and cryptographic- based approaches. The solutions based on the perturbation approach are very efficient, but have a trade-off between privacy and accuracy. For the PPDM solutions based on cryptography, the privacy of data holders is safely preserved and the output result is accurately guaranteed, but the performance is quite poor [1]. In this work, we focus cryptography-based privacy-preserving frequency mining (PPFM for short) protocol that is a quite simple technique, but it is very useful in privacy-preserving machine learning and data mining [2]. Furthermore, we consider the PPFM solution for fully distributed setting where the data set is distributed across a large number of users, and each record is only held by one party. Vu Duy Hien, Luong The Dung, Ho Tu Bao and Nguyen Chung Tien 2 In the literature, many cryptographic solutions have proposed for PPFM in fully distributed setting. They are used to construct the practical applications such as ID3 tree and association rules mining [2], Naive Bayes classifier [2], electronic voting system [3-5]. To the best of our knowledge, the first cryptographic protocol for PPFM in fully distributed scenario was introduced in [2] by Yang et al. This solution does not need communication channels between different users. It also does not require multi-round interaction between any party and the miner. In addition, this protocol provides strong privacy for each user without loss of accuracy. However, because the solution of Yang et al. [2] is based on ElGamal cryptosystem, so the performance of [2] is quite poor. Lately, Hao et al. proposed a series of election voting systems [3, 4] based on a privacy- preserving frequency counting protocol that called 2-round anonymous veto [6]. These protocols can safely protect the information of each voter’s ballot. Moreover, they also guarantee that the voting result is counted correctly. However, the computational complexity and communication cost of each voter in [3] are very expensive. Inspiring from the works [6] and [3], the authors developed the voting scheme [4] using the DRE-i system to compute the restructured public key for each voter. So the voters’ costs reduce greatly, but the total computational complexity of voting system increases, even the performance of [4] is poorer than that of [2]. Based on Boneh-Franklin identity-based encryption, Wu et al. constructed a privacy- preservation protocol [7] for mining of support counts in fully distributed scenario. The authors show that this protocol is very efficient and practical, but its privacy is not guaranteed since the secret master key s is known by all parties. Several other protocols [8{11] that have the same ideal with PPFM have proposed. However, these solutions have the low privacy level, since they need to use a trusted third party. Recently, Hao et al. proposed the verifiable classroom voting system [5] that is also based on elliptic curve analog of the ElGamal system. Although the computational complexity and communication cost of each voter is optimized, the total computational complexity of the voting system is equal to that of the protocol [4]. In briefly, most of existing solutions for PPFM in fully distributed setting have a trade-off between privacy and efficiency. Therefore, it is very significant to develop the efficient PPFM solutions for fully distributed setting while the accuracy is intact and the privacy is still protected safely. In this paper, our main goal is to develop the efficient solution for PPFM in fully distributed setting. To obtain this goal, we first redesign the original PPFM protocol mentioned in Yang et al.’s protocol [2]. Next, we optimize this redesigned PPFM protocol based on elliptic curve analog of the ElGamal system. And therefore, our solution’s performance is better than that of [2]. To illustrate the efficiency of our solution, we implement it to compute the frequency value for different numbers of users from 2000 to 10000. Received July 25, 2018. Revised August 8, 2018. Accepted August 15, 2018. Contact Vu Duy Hien, e-mail: hienvd@bav.edu.vn. 2. Preliminaries 2.1. Problem definition In the fully distributed setting, there are 𝑛 users {𝑈1, , 𝑈𝑛}, in which each user 𝑈𝑖 holds a private boolean value 𝑣𝑖 {0,1}, and the miner who needs to find out the sum of all users’ private values 𝑠 = ∑ 𝑣𝑖 𝑛 𝑖=1 . Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem 3 Inspiring from the work of Yang et al. [2], we design elliptic curve analog of the ElGamal system-based PPFM protocol that allows the miner to compute the value s without knowing the private values. 2.2. Definition of Privacy In this study, our protocol is based on the semi-honest model that each user must follow the rules of the protocol, but anyone may be corrupted. Thus, we have the definition of privacy for frequency mining in fully distributed setting [2, 12] as follows: Definition 1. Assume that each user 𝑈𝑖 has private keys 𝑝𝑖 , 𝑞𝑖 and public keys 𝑃𝑖, 𝑄𝑖 . A frequency mining protocol protects each user’s privacy against the miner and 𝑡 corrupted users in the semi-honest model if, ∀𝐼 ⊆ {1, 2, , 𝑛} such that |𝐼| = 𝑡 , there exists a probabilistic polynomial-time algorithm M such that: {𝑀 (𝑠, [𝑣𝑖, 𝑝𝑖, 𝑞𝑖]𝑖∈𝐼 , [𝑃𝑗, 𝑄𝑗]𝑗∉𝐼)} ≡ 𝑐 {𝑣𝑖𝑒𝑤𝑀𝑖𝑛𝑒𝑟,{𝑈𝑖}𝑖∈𝐼([𝑣𝑖,𝑝𝑖,𝑞𝑖]𝑖=1 𝑛 )} Where ≡ 𝑐 is computational indistinguishability. This definition states that the computation is secure and the honest users’ privacy is guaranteed, if the miner and the corrupted users learn nothing from the output s and the public values of the honest users. 2.3. Elliptic curve analog of the ElGamal system In this section, we review elliptic curve analog of the ElGamal system [13] that is the main facility to construct our solution. Let 𝐸(𝐹𝑑) be an elliptic curve over a finite field 𝐹𝑞 with a point 𝑂 at infinity and q be a large prime, in which the discrete logarithm problem on the elliptic curve 𝐸 is hard. In addition, G is a base point of the elliptic curve E with order q (i.e., 𝑞. 𝐺 = 𝑂). The private key is the random number 𝑑  [1; 𝑞 − 1], and the corresponding public key curve point is 𝑄 = 𝑑. 𝐺. To encrypt the plaintext m, the sender uses the public key 𝑄 to compute the ciphertext 𝐶 from the plaintext m as follows: he randomly chooses k from [1; 𝑞 − 1] and computes the ciphertext 𝐶(𝐶1 = 𝑃𝑚 + 𝑘. 𝑄; 𝐶2 = 𝑘. 𝐺) where 𝑃𝑚 is a point of 𝐸 and 𝑥𝑃𝑚 = 𝑚. To decrypt the ciphertext 𝐶 using the private key 𝑑 , the receiver may compute 𝑚 = 𝑥𝑀 , in which 𝑀 = 𝐶1 + (−𝑑. 𝐶2). Under the decisional Diffie-Hellman assumption for the curve E, elliptic curve analog of the ElGamal system is semantically secure. 3. Privacy-preserving frequency mining protocol in fully distributed setting 3.1. Setup Let 𝐸(𝐹𝑑) be an elliptic curve with a point 𝑂 at infinity and d be a large prime, in which the discrete logarithm problem on the elliptic curve 𝐸 is hard. In addition, 𝐺 is a base point of the elliptic curve E with order d (i.e., 𝑑. 𝐺 = 𝑂). Each user 𝑈𝑖 keeps a private value 𝑣𝑖 {0,1}. Nobody knows this value, beyond him. Before the PPFM protocol starts, each user chooses two private keys 𝑝𝑖, 𝑞𝑖  [1; 𝑑 − 1], after that he computes the corresponding public keys 𝑃𝑖 = 𝑝𝑖 . 𝐺, 𝑄𝑖 = 𝑞𝑖. 𝐺. These public keys sent to the miner before the protocol starts. 3.2. Protocol Vu Duy Hien, Luong The Dung, Ho Tu Bao and Nguyen Chung Tien 4 The PPFM protocol in fully distributed setting consists of three main phases described in Figure 1. PHASE 1: PRE-COMPUTING  Miner pre-computes the public values: 𝑃 = ∑ 𝑝𝑖 𝑛 𝑖=1 ; 𝑄 = ∑ 𝑞𝑖 𝑛 𝑖=1  Miner 𝑼𝒊: 𝑃, 𝑄 PHASE 2: COMPUTING THE MESSAGE  𝑼𝒊 computes: 𝑀𝑖 = 𝑣𝑖 . 𝐺 + 𝑞𝑖. 𝑃 − 𝑝𝑖 . 𝑄  𝑼𝒊  Miner: 𝑀𝑖 PHASE 3: SECURE FREQUENCY COMPUTATION  Miner computes: 𝑀 = ∑ 𝑀𝑖 𝑛 𝑖=1 𝐾 ∶= 𝑂. 𝐹𝑜𝑟 𝑠 = 0 𝑡𝑜 𝑛: 𝐼𝑓 𝐾 = 𝑀, 𝑡ℎ𝑒𝑛 𝑜𝑢𝑡𝑝𝑢𝑡 𝑠. 𝐸𝑙𝑠𝑒 𝐾 ∶= 𝐾 + 𝐺. Figure 1. A privacy-preserving frequency mining protocol for fully distributed setting 3.3. Proof of Correctness In this section, we show that the final output of the PPFM protocol in fully distributed setting based on elliptic curve analog of the ElGamal system is the sum of all parties’ private values. To do this, we prove the following theorem. Theorem 1. The protocol for privacy-preserving frequency mining presented in Figure 1 exactly counts the number of 1’s values of all users’ inputs. Proof. We show that, in this protocol, if the miner finds out a value s, then s is the secure sum of all parties’ private values. Suppose that s.G = M. Then: s.G = ∑ 𝑀𝑖 𝑛 𝑖=1 s.G = ∑ (𝑣𝑖. 𝐺 + 𝑞𝑖. 𝑃 − 𝑝𝑖 . 𝑄) 𝑛 𝑖=1 s.G = ∑ 𝑣𝑖 . 𝐺 + ∑ (𝑞𝑖 ∑ 𝑃𝑘 𝑛 𝑘=1 − 𝑝𝑖 ∑ 𝑄𝑘 𝑛 𝑘=1 ) 𝑛 𝑖=1 𝑛 𝑖=1 s.G = ∑ 𝑣𝑖 . 𝐺 + ∑ 𝑞𝑖 ∑ 𝑝𝑘 . 𝐺 𝑛 𝑘=1 − ∑ 𝑝𝑖 𝑛 𝑖 ∑ 𝑞𝑘 . 𝐺 𝑛 𝑘=1 𝑛 𝑖=1 𝑛 𝑖=1 s.G = ∑ 𝑣𝑖 . 𝐺 𝑛 𝑖=1 Thus, 𝑠. 𝐺 = ∑ 𝑣𝑖. 𝐺 𝑛 𝑖=1 , and therefore 𝑠 = ∑ 𝑣𝑖 𝑛 𝑖=1 . Note that the value of s is not too large, so it can be computed by the brute-force method. 3.4. Privacy Analysis In this section, we first prove that the PPFM protocol in fully distributed setting protects each honest user’s privacy in the semi-honest model under the necessary assumptions. Then, we show that this protocol still preserves each honest user’s privacy in the case of (𝑛 − 2) parties colluding with the miner. We recall that, each user 𝑈𝑖 only sends a point 𝑀𝑖 that is the ciphertext of his private value. This point is represented as the following equation: Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem 5 𝑀𝑖 = 𝑣𝑖 . 𝐺 − 𝑝𝑖 . 𝑄 + 𝑞𝑖 ∑ 𝑝𝑘 . 𝐺 𝑛 𝑘=1 We easily decide that the ciphertext 𝑀𝑖 is equivalent to the first part of an elliptic curve analog of the ElGamal (𝑃𝑚 + 𝑞𝑖. 𝑃, 𝑞𝑖. 𝐺) respectively 𝑃𝑚 = 𝑣𝑖 . 𝐺 − 𝑝𝑖 . 𝑄, the private key is ∑ 𝑝𝑖 and 𝑞𝑖 is uniformly chosen at random from [1,2, , 𝑑 − 1]. Under the decisional Diffie-Hellman assumption on the elliptic curve, the elliptic curve analog of the ElGamal cryptosystem is semantically secure. Thus, our protocol preserves each honest user’s privacy in the semi-honest model. Continuously, we prove that the new privacy-preserving sum protocol protects each user’s privacy (even if there are up to 𝑛 − 2 users colluding with the miner) as long as the elliptic curve analog of the ElGamal encryption scheme is secure. We have the following theorem: Theorem 2. The protocol for privacy-preserving frequency mining in fully distributed setting presented in Figure 1 protects each honest user’s privacy against the miner and up to (𝑛 − 2) corrupted users. Proof. We construct a simulator M that simulates computing the joint view of the miner and the corrupted users by a polynomial time algorithm. In particular, we give an algorithm that computes the view of the miner and the corrupted users in polynomial time only using the final output s, corrupted users’ knowledge, public keys, and some elliptic curve analog of the ElGamal encryption. Therefore, combining our algorithm with a simulator for the ciphertexts, we obtain a complete proof. Without loss of generality, we assume that 𝑈1 and 𝑈2 do not collude and 𝐼 = {3, 4, , 𝑛}. In the protocol presented in Figure 1, each user only sends a point 𝑀𝑖 to the miner. So our algorithm only simulates the computation for 𝑀1, 𝑀2. Below is the computations of simulator M based on the view of the miner and the corrupted users using some encryption as its input: (𝑈12, 𝑉12) = {𝑣2. 𝐺 + 𝑞1. (𝑝2. 𝐺), 𝑝2. 𝐺}, (𝑈21, 𝑉21) = {𝑣1. 𝐺 + 𝑞2. (𝑝1. 𝐺), 𝑝1. 𝐺}. Simulator M computes 𝑀1, 𝑀2 as follows: 𝑀1 ′ = 𝑈12 + 𝑄1. ∑ 𝑝𝑖 𝑖∈𝐼 − 𝑈21 − 𝑃1. ∑ 𝑞𝑖 𝑖∈𝐼 𝑀2 ′ = 𝑈21 + 𝑄2. ∑ 𝑝𝑖 𝑖∈𝐼 − 𝑈12 − 𝑃2. ∑ 𝑞𝑖 𝑖∈𝐼 Thus, following the definition 1, our PPFM protocol for fully distributed scenario is semantically secure. 3.5. Performance Evaluation In this section, we implement our solution and the original protocol [2] in the C# language of Visual Studio 2010 environment, using the System.Numerics namespace to compare the performance of them (i.e., communication overhead and time complexity). Note that all public key operations in our protocol are defined over the safe curve 25519 [14], and the protocol [2] uses 256 𝑏𝑖𝑡𝑠 private keys and 3072 𝑏𝑖𝑡𝑠 public keys that have the same security level with the curve 25519. Moreover, our experiments run on the laptop with a 2.6𝐺𝐻𝑧 Intel core 𝑖5 processor and 4𝐺𝐵 memory. For the communication overhead comparison, we consider the number of communication messages and these length (bits) in all phases of our solution and the protocol [2]. For the time complexity comparison, we measure the total executing time of each protocol for different numbers of users, from 2000 to 10000. This time consists of the time for each user Vu Duy Hien, Luong The Dung, Ho Tu Bao and Nguyen Chung Tien 6 to perform the necessary computations and the time required for the miner. We assume that all users perform their tasks at the same time, and the network latency is not included in the total executing time. 3.5.1. Communication Overhead Considering the protocol of Yang et al. [2], before this protocol starts, each user needs to send two public keys to the miner. After the miner computes two public keys, he sends these keys for all users. In the first phase of [2], each user 𝑈𝑖 also needs to send two values 𝑚𝑖; ℎ𝑖 to the miner. Because each public key is 3072 bits length, the protocol [2] exchanges 6n messages using 18432n bits where n is the number of users. For our solution, before it starts, each user needs to send two public keys (i.e., two points) to the miner. Next, in the first phase, the miner computes two public keys, after that he sends them to all users. In the second phase, each user needs to only send a point 𝑀𝑖 to the miner. Because each point of the curve consists of two elements in which each element is 256 bits length, so our solution only exchanges 10n messages using 2560n bits in which n is the number of users. Table 1 presents the communication overhead comparison between our solution and Yang et al.’s protocol [2]. We can see that our solution exchanges more number of messages than the protocol of Yang et al. However, the proposed solution transfers much lower number of bits than the protocol [2]. Table 1. The communication overhead comparison between our solution and Yang et al.’s protocol. Protocols The number of messages The number of bits The protocol [2] 6n 18432n Our solution 10n 2560n 3.5.2. Time complexity of the protocol As presented before, the new protocol is improved from the solution [2]. In particular, in Yang et al.’s protocol, each user must compute two values 𝑚𝑖 and ℎ𝑖 to send to the miner. Based on the tuples of two values, the miner computes the multiplication of the values 𝑚𝑖 ℎ𝑖 . Hence, the computational complexity of the miner is high. Unlike the protocol [2], in our solution, each user only computes a unique point 𝑀𝑖 and the miner only computes the sum of the points 𝑀𝑖 . However, this only makes each user’s computational complexity increase negligibly. Furthermore, the computational complexity of the miner reduces greatly. Thus, the total executing time of our protocol is much lower than that of the original protocols of Yang et al. as shown in Figure 2. Privacy-preserving frequency mining protocol based on elliptic curve ElGamal cryptosystem 7 Figure 2. The computing frequency value time in fully distributed setting comparisons between our solution and Yang et al.’s protocol According to the comparison results, we can state that our solution is more efficient

Các file đính kèm theo tài liệu này:

luan_an_nghien_cuu_phat_trien_mot_so_giao_thuc_tinh_tong_bao.pdf
HienVD-abstract-EN.pdf
HienVD-abstract-VN.pdf
HienVD-thesis.pdf
NCS VuDuyHien. Mẫu 4-HV Trang thông tin đóng góp mới TV TA.docx
QĐ 187 ngay 25.3.2024 vv HD danh gia cap HV Vu Duy Hien_0001.pdf
Trang thông tin đóng góp mới TV TA_0001.pdf