Apply markov chains model and fuzzy time series for forecasting - Luận văn, đồ án, đề tài tốt nghiệp

The time series forcasting with preditve variable object X changing over time in order to achieve predictive accuracy is always a challenge to scientists, not only in Vietnam but also globally. Because it is not easy to find a suitable probability distribution for this predictive variable object at the point t was born. Historical data need to be collected and analyzed, in order to find a perfect fit. It is, however, a distribution can only fit with statistics in a particular time in time series analysis, and varies at other certain point of time. Therefore, the use of a fixed distribution for the predictable object is not applicable for this analysis. For the above mentioned reason, the building of predictable time series forcasting model requires connection and syncognition between historical and future statistics, in order to set up a dependent model between data obtained at present t and in the past t-1, t-2. If the connection X X X X t t t p t p t t q t q               1 1 2 2 1 1        is set up, we can generate an autoregressive integrated moving average (ARIMA) [15] model. This model is applicatable widely for its practical theory and intergrated into almost current statistical software such as Eviews, SPSS, matlab, R, and etc. It is, however, many real time sequencing shows that they do not change linearly. Therefore, model such as ARIMA does not suit. R. Parrelli pointed it out in [28] that there is a non-linerable connection in economic or financial time series variance indicators. The generalized autoregressive conditional heteroskedasticity (GARCH) [25,28] is the most popular non-linerable time series forecasting analysis to mention. The limitation of this model lies in the assumption that statistics vary in a fixed distribution (normally standard distribution), while actual statistics shows that distribution is statistically significant [39] (while standard distribution has a balanced variation). Another time series forecasting is Artificial Neural Network (ANN which was developed recently. ANN models do not based on deterministic distribution of statistics; instead it functions like human brain trying to find rules and pathes to training data, experimental testing, and result summarizing. ANN model is usually used for statistics classification purpose [23]. More recently, a new theory of statistical machine learning called Support Vector Machine (SVM) serving as answer to forcast and classification which caught attention of scientiests [36,11,31]. SVM is applied widely in many areas such as approximate function, regression analysis and forecast [11,31]. The biggest limitation of SVM is that with huge training files, it requires enomous calculation as awell as complexity of the linear regession exercise.

27 trang | Chia sẻ: thientruc20 | Lượt xem: 855 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Apply markov chains model and fuzzy time series for forecasting, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

MINISTRY OF EDUCATION AND TRAINING VIETNAM ACADEMY OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY ------------------------------- DAO XUAN KY APPLY MARKOV CHAINS MODEL AND FUZZY TIME SERIES FOR FORECASTING Major: Math Fundamentals for Informatics Code: 62.46.01.10 SUMMARY OF MATHEMATICS DOCTORAL DISSERTATION Ha Noi, 2017 This work is completed at: Graduate University of Science and Technology Vietnam Academy of Science and Technology Supervisor 1: Assoc. Prof. Dr. Doan Van Ban Supervisor 2: Dr. Nguyen Van Hung Reviewer 1: Reviewer 2: Reviewer 3: This Dissertation will be officially presented in front of the Doctoral Dissertation Grading Committee, meeting at: Graduate University of Science and Technology Vietnam Academy of Science and Technology At . hrs . day . month. year . This Dissertation is available at: 1. Library of Graduate University of Science and Technology 2. National Library of Vietnam LIST OF PUBLISHED WORKS [1] Dao Xuan Ky and Luc Tri Tuyen. A markov-fuzzy combination model for stock market forecasting. International Journal of Applied athematics and StatisticsTM, 55(3):109–121, 2016. [2] Đào Xuân Kỳ, Lục Trí Tuyen, va Phạm Quốc Vương. A combination of higher order markov model and fuzzy time series for stock market forecasting”. In Hội thảo lần thứ 19: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông, Hà Nội, pages 1–6, 2016. [3] Đào Xuân Kỳ, Lục Trí Tuyen, Phạm Quốc Vương, va Thạch Thị Ninh. Mô hinh markov-chuỗi thời gian mờ trong dự báo chứng khoán. In Hội thảo lần thứ 18: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông, TP HCM, pages 119–124, 2015. [4] Lục Trí Tuyen, Nguyễn Văn Hung, Thạch Thị Ninh, Phạm Quốc Vương, Nguyễn Minh Đức, va Đào Xuân Kỳ. A normal-hidden markov model model in forecasting stock index. Journal of Computer Science and Cybernetics, 28(3):206–216, 2012. [5] Dao Xuan Ky and Luc Tri Tuyen. A Higher order Markov model for time series forecasting. International Journal of Applied athematics and StatisticsTM, vol 57(3):1-18, 2018. Introduction The time series forcasting with preditve variable object X changing over time in order to achieve predictive accuracy is always a challenge to scientists, not only in Vietnam but also globally. Because it is not easy to find a suitable probability distribution for this predictive variable object at the point t was born. Historical data need to be collected and analyzed, in order to find a perfect fit. It is, however, a distribution can only fit with statistics in a particular time in time series analysis, and varies at other certain point of time. Therefore, the use of a fixed distribution for the predictable object is not applicable for this analysis. For the above mentioned reason, the building of predictable time series forcasting model requires connection and syncognition between historical and future statistics, in order to set up a dependent model between data obtained at present t and in the past t-1, t-2. If the connection 1 1 2 2 1 1 t t t p t p t t q t qX X X X                 is set up, we can generate an autoregressive integrated moving average (ARIMA) [15] model. This model is applicatable widely for its practical theory and intergrated into almost current statistical software such as Eviews, SPSS, matlab, R, and etc. It is, however, many real time sequencing shows that they do not change linearly. Therefore, model such as ARIMA does not suit. R. Parrelli pointed it out in [28] that there is a non-linerable connection in economic or financial time series variance indicators. The generalized autoregressive conditional heteroskedasticity (GARCH) [25,28] is the most popular non-linerable time series forecasting analysis to mention. The limitation of this model lies in the assumption that statistics vary in a fixed distribution (normally standard distribution), while actual statistics shows that distribution is statistically significant [39] (while standard distribution has a balanced variation). Another time series forecasting is Artificial Neural Network (ANN which was developed recently. ANN models do not based on deterministic distribution of statistics; instead it functions like human brain trying to find rules and pathes to training data, experimental testing, and result summarizing. ANN model is usually used for statistics classification purpose [23]. More recently, a new theory of statistical machine learning called Support Vector Machine (SVM) serving as answer to forcast and classification which caught attention of scientiests [36,11,31]. SVM is applied widely in many areas such as approximate function, regression analysis and forecast [11,31]. The biggest limitation of SVM is that with huge training files, it requires enomous calculation as awell as complexity of the linear regession exercise. To address the limitations and promote the strengths of exisiting models, a new and trendy research method was introduced which is called Combined Anaysis (CA) ie. a combination of of different methods to increase the forecast accuracy. Numerrous studies have been conducted based on this method, and many combined models have been published [43,5,6]. Some methods uses the Markov chain (MC) as well as hidden Markov (HMM). Refiul Hassan [19] developed a united model by matching an HMM with an ANN and GA to generate forecast a day -1 stock price. This model aims to identify similar patterns from historical statistics. Then ANN and GA models are used to interpolate the neighbor values ò the defined statistics model. Yang [41] combined the HMM model using synchoronous clustering technique to increase the accuracy of the forecasting model. The weighted Markov model was used by Peng [27] in predicting and analyzing desease transmission rate in Jiangsu, China. These combined models proved to bring practical and meaningful results, as well as increase the accuracy in prediction compared to traditional ones [27,41,19]. The above mentioned models, despite having improved significantly in terms of accuracy in prediction, still face difficulties with fuzzy statistics (there are uncertain molecules). To deal with fuzzy statistics, a new research direction was introduced recently, which is called Fuzzy Time Series (FTS). The first result from this theory worth to mention is Song and Chissom [34]. These studies focused on improving the Fuzzy Time Series model and finding ways for the forecasting analysis. Jilani and Nan combined Heuristic model with Fuzzy Time Series model to improve the model accuracy [24]. Chen and Hwang expanded the Fuzzy Time series model into Binary model [14] and then Hwang and Yu developed it into N-scale model to forecast stock indicators [21]. In a recent paper [35], BaiQuing Sun has expanded the Fuzzy Time Series model into multi-order to forecast stock price in the future. Qisen Cai [10] combined the Fuzzy Time Series model with ant optimization and regession to obtain a better outcome. In Vietnam, the Fuzzy Time Series model was recently applied in a number of specific areas, some to mention include the study of Nguyen Duy Hieu and Partners [2] in semantic analysis. Additionally, the study of Nguyen Cong Dieu [3,4] combined The Fuzzy Time Series model with techniques to adjust some parameter in maths or specific charactors of statistics aiming to the forecast accuracy. The study of Nguyen Cat Ho [1] used sonographic algebra in Fuzzy Time Series model which showed the higher forecast accuracy compared to several existing modesl. Up to now, inspite of many new models combining existing one aiming to improve the forecast accuracy, there is a fact that these models are complex yet accuracy not improving. Therefore, there may arise some other direction aiming to simplify the model while ensure the forecast accuracy. The objective of this dissertation focuses on two key issues. Firstly, to modelize time series by states in which each is a deterministic probability distribution (standard distribution) and to evaluate the suitability of the model based on experimental results. Secondly, combine Markov chain and new Fuzzy Time series models to improve the forecast accuracy. In addition, to expand the advanced Markov chain model to accommodate seasonal statistics. The dissertation consists of 3 chapters – Chapter I presents overall study of Markov chain and hidden Marko and Fuzzy Time Series models; Chapter II presents time series modelling into states in which 1) each state is standard distribution vs. average i , variance 2 i , 1,2,...,i m with m is the state; 2) states over time followed Markov chain. The model, then was tested on VN-Index indicator to evaluate efficiency of model forecast. Last chapter presents the analysis of limitations and unmatches between forecasting models and deterministic probability distribution as a motivation for the combined model proposed in Chapter 3. Chapter III presents combined Markov chain and Fuzzy Time Series models in time series forecasting. This chapter also presents the expanded and advanced Markov chain with two chain concepts which are conventional higher order Markov (CMC) and improved higher order Markov (IMC). These models, then, were programmed in the R language and tested wit data sets that corresponded exactly with comparision model sets. Chapter 1 - Overview & Proposal 1.1. Markov chain 1.1.1. Definitions Consider an economic or material system S with m possible states, denoted by I :  1,2,..., .I m System S evolves randomly in discrete time ( 0,1,2,..., ,...t n ), called nC and set to a random variable coresponding to the state ò the system S at the time (C )nn I . Definition 1.1.1. Random variable sequense ( ,nC n ) is a Markov chain if and only if all 0 1,c ,...,cnc I : 0 0 1 1 1 1 1 1( | , ,..., ) ( | )n n n n n n n nPr C c C c C c C c Pr C c C c          (1.1.1) (iwith a condition this probability makes sense) Definition 1.1.2. Markov chain is considererd comprable if and only if the possiblity in (1.1.1) is not dependent on n and non-comparable in other cases. . For the time being, we consider the comparable case, in which 1 1 ( | )n n n n ijPr C c C c     , And matrix Γ by definition: .ij   Γ To define fully the development of a Markov chain, it is necessary to fix an iniital distribtuion for state 0C , for example, a vector: 1 2( , ,..., ),mp p pp In this chapter, we stop at considering comparable Markov chain which is featured by couple ( , )p Γ . Definition 1.2.3. A Markov matrix Γ is considered formal if there exists a positive integer k, such that all elements of the matrix are actually positive. 1.1.2. Markov chain classification Take i I and put ( )d i is the largest general divisor of a set of intgers n such that ( ) 0.nii  Definition 1.2.4. If ( ) 1d i  , state i is considered a revolving cycle ( )d i . If ( ) 1,d i  then sate i is not revolving. Easy to see, if 0ii  then i is not revolving. However, the opposite is not pretty true. Definition 1.2.5. Markov chain of which all its states not revolving is call irrevolving Markov chain. Definition 1.2.6. A state i is called reaching state j (written i j ) if exist an integer n such that 0.nij  i jC means i can not reach j . Definition 1.2.7. State i and j is called inter-connected if i jand j i , or if .i j We write .i j Definition 1.2.8. State i is called essential if it connects with every state that it reaches; the opposite is call non-essential. Relationship determines an equivelent relationship in state space I resulted in a class division on .I The equivalent class contains symbol i denoted by ( )Cl i . Definition 1.2.9. Markov chain is called not expandable if there is only one equivalent class on it. Definition 1.2.10. Subset E of state space I is considered closed if: 1,ij j E    với mọi .i E Definition 1.2.11. State i I of Markov chain ( )tC is considered regressed if exists state i j I and n such that 0nji  . Oppositely, i is called forwarding state (moving). 1.1.3. Markov matrix estimation Consider Markov chain ( ), 1,2,...tC t  and suppose to observe n and other states 1 2, ,..., nc c c . Symbols 1 2, ,..., n nc c c c generated by random variables nC then the logical function of forwarding probability matrix is given by:  1 11 1 2 ( ) ( ) | n n n t t t t t Pr C c Pr C c Pr C c C c         1 1 1 1 2 ( ) | n t t t t t Pr C c Pr C c C c       11 1 2 ( ) t t n c c t Pr C c       Define numbers of transfer ijn  number of times that state i forwards, follwed by state j in chain ,nC then likelihood looks like: 1 1 1 1 ( ) ( ) ij k k n ij i j L p Pr C c       We need to find the maximum rational function ( )L p with the hiddens are ij . To solve this exercise, first we take logarit of ( )L p to make a total function aiming to take the derivative easily. . 1 1 , ( ) log ( ) log ( ) logij ij i j p L p Pr C c n     Due to 1ij j   , each 1 2 , 1 m i ij j i      , take the derivative by parameter: 1 1 ij i ij ij i n n        Given derivative equals to 0 obtained at ij we have: 1 1 ˆ ˆ ij i ij i n n    therefore 1 1 ˆ ˆ ij ij i i n n    true with all 1j  therefore 1 ˆ ij ij m ij j n n     1.2. Hidden Markov Model A HMM includes two basis components: chain , 1,...,tX t T consists observations and , 1,.., , {1,2,..., }tC i t T i m   which were generated from those observations. In deed, HMM model is a special case of mixed dependent model [16] and tC which are mixed components. 1.2.1. Definition and Symbols Symbols ( ) ( )à t tvX C displayhistorical statistics from point of time 1 to point of time t , which can be summarized as the simpliest HMM model as follows: ( 1) 1( | ) ( | ), 2,3,..., . t t t tPr C Pr C C t T   C ( 1) ( )( | , ) ( | ),t tt t tPr X C Pr X C t   X Now we introduce some symbols which are used in the study. In case of discrete observation, by definition:    | .t tip x Pr X x C i   In the case of continuity, ( )ip x is tX „s probability function range, if Markov chain receives state i at point of time t . We symbolize a comparative Markov chain‟s forwarding matrix as Γ with its components ij defined by: 1( | ).ij t tPr C j C i    From now on, m distributes ( )ip x is called dependent dependencies of the model. 1.2.2. Likelihood and maximum estimation of likelihood For discrete observation tX , define    i tu t Pr C i  với 1,2,..., ,i T we have: 1 ( ) ( ) ( | ) m t t t t i Pr X x Pr C i Pr X x C i       1 ( ) ( ). m i i i u t p x   (1.2.1) For convenience in calculating , fomula (1.2.1) can be re-written in the form of the following matrix: 1 1 (m) ( ) 0 0 1 Pr(X =x)=(u (t),...,u (t)) 0 0 0 0 ( ) 1 t m p x p x                 (t) ( ) .x  u P 1 in which (x)P is diagonal matrix with the i element on the diagonal line ( )ip x . On the other hand, by nature of the pure Markov chain , 1(t) (1) tu u Γ with (1)u is an initital distribution of Markov chain, usually denoted with stop distribution as δ . Thus, we have 1( ) (1) ( ) .ttPr X x x    u Γ P 1 (1.2.2) Now call TL is the likelihood function of the model with T observe 1 2, ,..., Tx x x then ( ) ( )( )TL Pr  T T X x . Derived from the simutaneous probability formula ( ) ( ) 1 1 1 ( , ) (C ) ( | ) ( | ), T T k k k k k k Pr Pr Pr C C Pr X C     T T 1X C (1.2.3) We sum on all possible states of kC , then using the method as the fomula (1.2.2), we have 1 2( ) ( )... ( ) .T TL x x x  P ΓP ΓP 1 If initial distribution δ is the stop distribution of Markov chain, then 21( ) ( )... ( ) .T TL x x x  ΓP ΓP ΓP 1 To calculate likelihood easily by algorithm, reduce the number of operations that the computer needs to perform, we define vector tα where 1,...,t T by 2 1 2 1( ) ( )... ( ) ( ) ( ), t t s s tx x x x x      P ΓP ΓP P ΓP (1.2.4) Then we have 1, và ( ), 2.T T t t tL x t     1 ΓP (1.2.5) It is easy to calculate TL by regression algorithm. To find the parameter set satisfies TL maximal, we can perform two methods: Direct estimation of extreme values function TL (MLE): Firsly,from equation (1.2.5) we need to calculate logarit of TL effectively to advantageous to find the maximum based on the progressive probabilities tα . For 0,1,..., ,t T we define the vector /t tt w  , where ( )t tt i w i    1 , and ( )tx tB P . We have 00 ;w     1 1 1 0 ;  1 1 ;t t t t tw w   B ( ) .T TtTL w w    T1 1 Then 1 1 ( / ) T T T t t t L w w w     . From equation (1.4.13) we have  1t tw w   tB 1 , then    1 1 1 1 log log / log 1 . T T T t t t t t t L w w B       EM Algorithm: This algorithm is called Baum-Welch algorithm[9] for consistent Markov chain (Not necessarily Markov stop). The algorithm uses forward probabilities (FWP) and backward probabilities (BWP) to calculate TL . 1.2.3. Forecasting distribution For discrete observations, forecasting distribution ( ) ( )( | )n nn hPr X x X x   is a ratio of TL based on conditional probability ( ) ( ) ( ) ( ) ( ) ( ) ( , ) ( | ) ( ) T T T T T h T h T T Pr X x X x Pr X x X x Pr X x         ( ) ... ( ) ( ) ...      h 1 2 3 T 1 2 3 T P x B B B Γ P x 1 P x B B B 1 ( .T T      hΓ P x 1 1 By / 1 $,T T T    we have ( ) ( )( | ) ( ) .T T hT h TPr X x X x        P x 1 Forecasting distribution can be written as probability of dependency random variables: ( ) ( ) 1 ( | ) ( ) ( ). m T T T h i i i Pr X x X x h p x     where the weight ( )i h is the i component of vector . h T  1.2.4. Viterbi algorithm The objective of Viterbi algorithm is to find the best of state sequences 1 2, ,..., Ti i i corresponding to the observation sequence 1 2, ,..., Tx x x which maximizes the function .TL Set 1 1 1 1 1( , ) ( ),i i iPr C i X x p x     where 2,3,...,t T 1 2 1 ( 1) ( 1) ( ) ( ) , ,...,max ( , , ).t t t T T ti c c c tPr C c C i X x       Then we can see that probability tj satisfies the following recursion process for 2,3,...,t T and 1,2,..., :i m  1,max ( ) ( ).tj i t i ij j tp x   The best state sequence 1 2, ,..., Ti i i is determined by regression from 1,.., argmaxT Ti i m i    and for 1, 2,...,1,t T T   we have 1, 1,..., argmax( ). tt ti i i i m i      1.2.5. Status forecasting For status forecasting, we only use the Bayes formula in classical. For 1,2,..., ,i m ( ) ( )( | ) (, ) / (, )T T hT h T TPr C i X x L i      h Tα Γ i Note that, when ,h  hnΓ moves towards the stop distribution of the Markov chain 1.3. Fuzzy time series 1.3.1. Some concepts Suppose U be the discourse domain. This space determines a set of objects. If A is a crisp set of U then we can determine exactly a feature function: ( ) { Definition 1.3.1. [34]: Suppose U be the discource domain and 1 2{ , ,..., }nU u u u . A fuzzy set A in U defined: 1 1 2 2 n n= ( )/ + ( )/ +...+ ( )/A A AA f u u f u u f u u Af is membership function of fuzzy set A and Af : [0;1],U  ( )A if u is a degree of membership (the rank) of iu on A . Definition 1.3.2. [34]: Let ( )( 0,1,2,...)Y t t  be a time series that its values in the discource,