The time series forcasting with preditve variable object X changing over time in order
to achieve predictive accuracy is always a challenge to scientists, not only in Vietnam but
also globally. Because it is not easy to find a suitable probability distribution for this
predictive variable object at the point t was born. Historical data need to be collected and
analyzed, in order to find a perfect fit. It is, however, a distribution can only fit with
statistics in a particular time in time series analysis, and varies at other certain point of time.
Therefore, the use of a fixed distribution for the predictable object is not applicable for this
analysis.
For the above mentioned reason, the building of predictable time series forcasting
model requires connection and syncognition between historical and future statistics, in
order to set up a dependent model between data obtained at present t and in the past t-1, t-2.
If the connection
X X X X t t t p t p t t q t q 1 1 2 2 1 1 is set up, we can
generate an autoregressive integrated moving average (ARIMA) [15] model. This model is
applicatable widely for its practical theory and intergrated into almost current statistical
software such as Eviews, SPSS, matlab, R, and etc.
It is, however, many real time sequencing shows that they do not change linearly.
Therefore, model such as ARIMA does not suit. R. Parrelli pointed it out in [28] that there
is a non-linerable connection in economic or financial time series variance indicators. The
generalized autoregressive conditional heteroskedasticity (GARCH) [25,28] is the most
popular non-linerable time series forecasting analysis to mention. The limitation of this
model lies in the assumption that statistics vary in a fixed distribution (normally standard
distribution), while actual statistics shows that distribution is statistically significant [39]
(while standard distribution has a balanced variation). Another time series forecasting is
Artificial Neural Network (ANN which was developed recently. ANN models do not based
on deterministic distribution of statistics; instead it functions like human brain trying to find
rules and pathes to training data, experimental testing, and result summarizing. ANN model
is usually used for statistics classification purpose [23]. More recently, a new theory of
statistical machine learning called Support Vector Machine (SVM) serving as answer to
forcast and classification which caught attention of scientiests [36,11,31]. SVM is applied
widely in many areas such as approximate function, regression analysis and forecast
[11,31]. The biggest limitation of SVM is that with huge training files, it requires enomous
calculation as awell as complexity of the linear regession exercise.
27 trang |
Chia sẻ: thientruc20 | Lượt xem: 683 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Apply markov chains model and fuzzy time series for forecasting, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
MINISTRY OF EDUCATION
AND TRAINING
VIETNAM ACADEMY OF
SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
-------------------------------
DAO XUAN KY
APPLY MARKOV CHAINS MODEL AND
FUZZY TIME SERIES FOR FORECASTING
Major: Math Fundamentals for Informatics
Code: 62.46.01.10
SUMMARY OF MATHEMATICS DOCTORAL
DISSERTATION
Ha Noi, 2017
This work is completed at:
Graduate University of Science and Technology
Vietnam Academy of Science and Technology
Supervisor 1: Assoc. Prof. Dr. Doan Van Ban
Supervisor 2: Dr. Nguyen Van Hung
Reviewer 1:
Reviewer 2:
Reviewer 3:
This Dissertation will be officially presented in front of the Doctoral Dissertation
Grading Committee, meeting at:
Graduate University of Science and Technology
Vietnam Academy of Science and Technology
At . hrs . day . month. year .
This Dissertation is available at:
1. Library of Graduate University of Science and Technology
2. National Library of Vietnam
LIST OF PUBLISHED WORKS
[1] Dao Xuan Ky and Luc Tri Tuyen. A markov-fuzzy combination
model for stock market forecasting. International Journal of Applied
athematics and StatisticsTM, 55(3):109–121, 2016.
[2] Đào Xuân Kỳ, Lục Trí Tuyen, va Phạm Quốc Vương. A combination
of higher order markov model and fuzzy time series for stock market
forecasting”. In Hội thảo lần thứ 19: Một số vấn đề chọn lọc của Công
nghệ thông tin và truyền thông, Hà Nội, pages 1–6, 2016.
[3] Đào Xuân Kỳ, Lục Trí Tuyen, Phạm Quốc Vương, va Thạch Thị
Ninh. Mô hinh markov-chuỗi thời gian mờ trong dự báo chứng khoán.
In Hội thảo lần thứ 18: Một số vấn đề chọn lọc của Công nghệ thông
tin và truyền thông, TP HCM, pages 119–124, 2015.
[4] Lục Trí Tuyen, Nguyễn Văn Hung, Thạch Thị Ninh, Phạm Quốc
Vương, Nguyễn Minh Đức, va Đào Xuân Kỳ. A normal-hidden
markov model model in forecasting stock index. Journal of Computer
Science and Cybernetics, 28(3):206–216, 2012.
[5] Dao Xuan Ky and Luc Tri Tuyen. A Higher order Markov model for
time series forecasting. International Journal of Applied athematics
and StatisticsTM, vol 57(3):1-18, 2018.
Introduction
The time series forcasting with preditve variable object X changing over time in order
to achieve predictive accuracy is always a challenge to scientists, not only in Vietnam but
also globally. Because it is not easy to find a suitable probability distribution for this
predictive variable object at the point t was born. Historical data need to be collected and
analyzed, in order to find a perfect fit. It is, however, a distribution can only fit with
statistics in a particular time in time series analysis, and varies at other certain point of time.
Therefore, the use of a fixed distribution for the predictable object is not applicable for this
analysis.
For the above mentioned reason, the building of predictable time series forcasting
model requires connection and syncognition between historical and future statistics, in
order to set up a dependent model between data obtained at present t and in the past t-1, t-2.
If the connection 1 1 2 2 1 1 t t t p t p t t q t qX X X X is set up, we can
generate an autoregressive integrated moving average (ARIMA) [15] model. This model is
applicatable widely for its practical theory and intergrated into almost current statistical
software such as Eviews, SPSS, matlab, R, and etc.
It is, however, many real time sequencing shows that they do not change linearly.
Therefore, model such as ARIMA does not suit. R. Parrelli pointed it out in [28] that there
is a non-linerable connection in economic or financial time series variance indicators. The
generalized autoregressive conditional heteroskedasticity (GARCH) [25,28] is the most
popular non-linerable time series forecasting analysis to mention. The limitation of this
model lies in the assumption that statistics vary in a fixed distribution (normally standard
distribution), while actual statistics shows that distribution is statistically significant [39]
(while standard distribution has a balanced variation). Another time series forecasting is
Artificial Neural Network (ANN which was developed recently. ANN models do not based
on deterministic distribution of statistics; instead it functions like human brain trying to find
rules and pathes to training data, experimental testing, and result summarizing. ANN model
is usually used for statistics classification purpose [23]. More recently, a new theory of
statistical machine learning called Support Vector Machine (SVM) serving as answer to
forcast and classification which caught attention of scientiests [36,11,31]. SVM is applied
widely in many areas such as approximate function, regression analysis and forecast
[11,31]. The biggest limitation of SVM is that with huge training files, it requires enomous
calculation as awell as complexity of the linear regession exercise.
To address the limitations and promote the strengths of exisiting models, a new and
trendy research method was introduced which is called Combined Anaysis (CA) ie. a
combination of of different methods to increase the forecast accuracy. Numerrous studies
have been conducted based on this method, and many combined models have been
published [43,5,6]. Some methods uses the Markov chain (MC) as well as hidden Markov
(HMM). Refiul Hassan [19] developed a united model by matching an HMM with an ANN
and GA to generate forecast a day -1 stock price. This model aims to identify similar
patterns from historical statistics. Then ANN and GA models are used to interpolate the
neighbor values ò the defined statistics model. Yang [41] combined the HMM model using
synchoronous clustering technique to increase the accuracy of the forecasting model. The
weighted Markov model was used by Peng [27] in predicting and analyzing desease
transmission rate in Jiangsu, China. These combined models proved to bring practical and
meaningful results, as well as increase the accuracy in prediction compared to traditional
ones [27,41,19]. The above mentioned models, despite having improved significantly in
terms of accuracy in prediction, still face difficulties with fuzzy statistics (there are
uncertain molecules).
To deal with fuzzy statistics, a new research direction was introduced recently, which
is called Fuzzy Time Series (FTS). The first result from this theory worth to mention is
Song and Chissom [34]. These studies focused on improving the Fuzzy Time Series model
and finding ways for the forecasting analysis. Jilani and Nan combined Heuristic model
with Fuzzy Time Series model to improve the model accuracy [24]. Chen and Hwang
expanded the Fuzzy Time series model into Binary model [14] and then Hwang and Yu
developed it into N-scale model to forecast stock indicators [21]. In a recent paper [35],
BaiQuing Sun has expanded the Fuzzy Time Series model into multi-order to forecast stock
price in the future. Qisen Cai [10] combined the Fuzzy Time Series model with ant
optimization and regession to obtain a better outcome. In Vietnam, the Fuzzy Time Series
model was recently applied in a number of specific areas, some to mention include the
study of Nguyen Duy Hieu and Partners [2] in semantic analysis. Additionally, the study of
Nguyen Cong Dieu [3,4] combined The Fuzzy Time Series model with techniques to adjust
some parameter in maths or specific charactors of statistics aiming to the forecast accuracy.
The study of Nguyen Cat Ho [1] used sonographic algebra in Fuzzy Time Series model
which showed the higher forecast accuracy compared to several existing modesl.
Up to now, inspite of many new models combining existing one aiming to improve
the forecast accuracy, there is a fact that these models are complex yet accuracy not
improving. Therefore, there may arise some other direction aiming to simplify the model
while ensure the forecast accuracy.
The objective of this dissertation focuses on two key issues. Firstly, to modelize time
series by states in which each is a deterministic probability distribution (standard
distribution) and to evaluate the suitability of the model based on experimental results.
Secondly, combine Markov chain and new Fuzzy Time series models to improve the
forecast accuracy. In addition, to expand the advanced Markov chain model to
accommodate seasonal statistics.
The dissertation consists of 3 chapters – Chapter I presents overall study of Markov
chain and hidden Marko and Fuzzy Time Series models; Chapter II presents time series
modelling into states in which 1) each state is standard distribution vs. average i , variance
2
i , 1,2,...,i m with m is the state; 2) states over time followed Markov chain. The model,
then was tested on VN-Index indicator to evaluate efficiency of model forecast. Last chapter
presents the analysis of limitations and unmatches between forecasting models and
deterministic probability distribution as a motivation for the combined model proposed in
Chapter 3. Chapter III presents combined Markov chain and Fuzzy Time Series models in
time series forecasting. This chapter also presents the expanded and advanced Markov chain
with two chain concepts which are conventional higher order Markov (CMC) and improved
higher order Markov (IMC). These models, then, were programmed in the R language and
tested wit data sets that corresponded exactly with comparision model sets.
Chapter 1 - Overview & Proposal
1.1. Markov chain
1.1.1. Definitions
Consider an economic or material system S with m possible states, denoted by I :
1,2,..., .I m System S evolves randomly in discrete time ( 0,1,2,..., ,...t n ), called nC and set
to a random variable coresponding to the state ò the system S at the time (C )nn I .
Definition 1.1.1. Random variable sequense ( ,nC n ) is a Markov chain if and only if all
0 1,c ,...,cnc I :
0 0 1 1 1 1 1 1( | , ,..., ) ( | )n n n n n n n nPr C c C c C c C c Pr C c C c (1.1.1)
(iwith a condition this probability makes sense)
Definition 1.1.2. Markov chain is considererd comprable if and only if the possiblity in (1.1.1)
is not dependent on n and non-comparable in other cases. .
For the time being, we consider the comparable case, in which
1 1
( | )n n n n ijPr C c C c ,
And matrix Γ by definition:
.ij Γ
To define fully the development of a Markov chain, it is necessary to fix an iniital distribtuion
for state 0C , for example, a vector:
1 2( , ,..., ),mp p pp
In this chapter, we stop at considering comparable Markov chain which is featured by
couple ( , )p Γ .
Definition 1.2.3. A Markov matrix Γ is considered formal if there exists a positive integer k,
such that all elements of the matrix are actually positive.
1.1.2. Markov chain classification
Take i I and put ( )d i is the largest general divisor of a set of intgers n such that ( ) 0.nii
Definition 1.2.4. If ( ) 1d i , state i is considered a revolving cycle ( )d i . If ( ) 1,d i then sate i is
not revolving.
Easy to see, if 0ii then i is not revolving. However, the opposite is not pretty true.
Definition 1.2.5. Markov chain of which all its states not revolving is call irrevolving Markov
chain.
Definition 1.2.6. A state i is called reaching state j (written i j ) if exist an integer n such
that 0.nij
i jC means i can not reach j .
Definition 1.2.7. State i and j is called inter-connected if i jand j i , or if .i j We write
.i j
Definition 1.2.8. State i is called essential if it connects with every state that it reaches; the
opposite is call non-essential.
Relationship determines an equivelent relationship in state space I resulted in a class
division on .I The equivalent class contains symbol i denoted by ( )Cl i .
Definition 1.2.9. Markov chain is called not expandable if there is only one equivalent class on
it.
Definition 1.2.10. Subset E of state space I is considered closed if:
1,ij
j E
với mọi .i E
Definition 1.2.11. State i I of Markov chain ( )tC is considered regressed if exists state i
j I and n such that 0nji . Oppositely, i is called forwarding state (moving).
1.1.3. Markov matrix estimation
Consider Markov chain ( ), 1,2,...tC t and suppose to observe n and other states
1 2, ,..., nc c c . Symbols 1 2, ,...,
n
nc c c c generated by random variables
nC then the logical
function of forwarding probability matrix is given by:
1 11 1
2
( ) ( ) |
n
n n t t
t t
t
Pr C c Pr C c Pr C c C c
1 1 1 1
2
( ) |
n
t t t t
t
Pr C c Pr C c C c
11 1
2
( )
t t
n
c c
t
Pr C c
Define numbers of transfer ijn number of times that state i forwards, follwed by state
j in chain ,nC then likelihood looks like:
1 1
1 1
( ) ( ) ij
k k
n
ij
i j
L p Pr C c
We need to find the maximum rational function ( )L p with the hiddens are ij . To solve
this exercise, first we take logarit of ( )L p to make a total function aiming to take the derivative
easily. .
1 1
,
( ) log ( ) log ( ) logij ij
i j
p L p Pr C c n
Due to 1ij
j
, each 1
2
, 1
m
i ij
j
i
, take the derivative by parameter:
1
1
ij i
ij ij i
n n
Given derivative equals to 0 obtained at ij we have:
1
1
ˆ ˆ
ij i
ij i
n n
therefore
1 1
ˆ
ˆ
ij ij
i i
n
n
true with all 1j therefore
1
ˆ ij
ij m
ij
j
n
n
1.2. Hidden Markov Model
A HMM includes two basis components: chain , 1,...,tX t T consists observations and
, 1,.., , {1,2,..., }tC i t T i m which were generated from those observations. In deed, HMM
model is a special case of mixed dependent model [16] and tC which are mixed components.
1.2.1. Definition and Symbols
Symbols ( ) ( )à t tvX C displayhistorical statistics from point of time 1 to point of time t ,
which can be summarized as the simpliest HMM model as follows:
( 1)
1( | ) ( | ), 2,3,..., .
t
t t tPr C Pr C C t T
C
( 1) ( )( | , ) ( | ),t tt t tPr X C Pr X C t
X
Now we introduce some symbols which are used in the study. In case of discrete
observation, by definition:
| .t tip x Pr X x C i
In the case of continuity, ( )ip x is tX „s probability function range, if Markov chain
receives state i at point of time t .
We symbolize a comparative Markov chain‟s forwarding matrix as Γ with its
components ij defined by:
1( | ).ij t tPr C j C i
From now on, m distributes ( )ip x is called dependent dependencies of the model.
1.2.2. Likelihood and maximum estimation of likelihood
For discrete observation tX , define i tu t Pr C i với 1,2,..., ,i T we have:
1
( ) ( ) ( | )
m
t t t t
i
Pr X x Pr C i Pr X x C i
1
( ) ( ).
m
i i
i
u t p x
(1.2.1)
For convenience in calculating , fomula (1.2.1) can be re-written in the form of the
following matrix: 1
1 (m)
( ) 0 0 1
Pr(X =x)=(u (t),...,u (t)) 0 0
0 0 ( ) 1
t
m
p x
p x
(t) ( ) .x u P 1
in which (x)P is diagonal matrix with the i element on the diagonal line ( )ip x . On the other
hand, by nature of the pure Markov chain , 1(t) (1) tu u Γ with (1)u is an initital distribution of
Markov chain, usually denoted with stop distribution as δ . Thus, we have
1( ) (1) ( ) .ttPr X x x
u Γ P 1 (1.2.2)
Now call TL is the likelihood function of the model with T observe 1 2, ,..., Tx x x then
( ) ( )( )TL Pr
T T
X x . Derived from the simutaneous probability formula
( ) ( )
1
1 1
( , ) (C ) ( | ) ( | ),
T T
k k k k
k k
Pr Pr Pr C C Pr X C
T T 1X C (1.2.3)
We sum on all possible states of kC , then using the method as the fomula (1.2.2), we have
1 2( ) ( )... ( ) .T TL x x x P ΓP ΓP 1
If initial distribution δ is the stop distribution of Markov chain, then
21( ) ( )... ( ) .T TL x x x ΓP ΓP ΓP 1
To calculate likelihood easily by algorithm, reduce the number of operations that the
computer needs to perform, we define vector tα where 1,...,t T by
2
1 2 1( ) ( )... ( ) ( ) ( ),
t
t s
s
tx x x x x
P ΓP ΓP P ΓP (1.2.4)
Then we have
1, và ( ), 2.T T t t tL x t 1 ΓP (1.2.5)
It is easy to calculate TL by regression algorithm. To find the parameter set satisfies TL
maximal, we can perform two methods:
Direct estimation of extreme values function TL (MLE): Firsly,from equation (1.2.5) we
need to calculate logarit of TL effectively to advantageous to find the maximum based on the
progressive probabilities tα . For 0,1,..., ,t T we define the vector /t tt w , where
( )t tt
i
w i 1 , and ( )tx tB P . We have
00 ;w 1 1 1
0 ;
1 1 ;t t t t tw w B
( ) .T TtTL w w T1 1
Then
1
1
( / )
T
T T t t
t
L w w w
. From equation (1.4.13) we have 1t tw w tB 1 , then
1 1
1 1
log log / log 1 .
T T
T t t t t
t t
L w w B
EM Algorithm: This algorithm is called Baum-Welch algorithm[9] for consistent
Markov chain (Not necessarily Markov stop). The algorithm uses forward probabilities (FWP)
and backward probabilities (BWP) to calculate TL .
1.2.3. Forecasting distribution
For discrete observations, forecasting distribution ( ) ( )( | )n nn hPr X x X x is a ratio of TL
based on conditional probability
( ) ( )
( ) ( )
( ) ( )
( , )
( | )
( )
T T
T T T h
T h T T
Pr X x X x
Pr X x X x
Pr X x
( ) ... ( )
( ) ...
h
1 2 3 T
1 2 3 T
P x B B B Γ P x 1
P x B B B 1
(
.T
T
hΓ P x 1
1
By / 1 $,T T T we have
( ) ( )( | ) ( ) .T T hT h TPr X x X x
P x 1
Forecasting distribution can be written as probability of dependency random variables:
( ) ( )
1
( | ) ( ) ( ).
m
T T
T h i i
i
Pr X x X x h p x
where the weight ( )i h is the i component of vector .
h
T
1.2.4. Viterbi algorithm
The objective of Viterbi algorithm is to find the best of state sequences 1 2, ,..., Ti i i
corresponding to the observation sequence 1 2, ,..., Tx x x which maximizes the function .TL
Set 1 1 1 1 1( , ) ( ),i i iPr C i X x p x where
2,3,...,t T
1 2 1
( 1) ( 1) ( ) ( )
, ,...,max ( , , ).t
t t T T
ti c c c tPr C c C i X x
Then we can see that probability tj satisfies the following recursion process for
2,3,...,t T and 1,2,..., :i m 1,max ( ) ( ).tj i t i ij j tp x
The best state sequence 1 2, ,..., Ti i i is determined by regression from
1,..,
argmaxT Ti
i m
i
and
for 1, 2,...,1,t T T we have
1,
1,...,
argmax( ).
tt ti i i
i m
i
1.2.5. Status forecasting
For status forecasting, we only use the Bayes formula in classical.
For 1,2,..., ,i m
( ) ( )( | ) (, ) / (, )T T hT h T TPr C i X x L i
h
Tα Γ i
Note that, when ,h hnΓ moves towards the stop distribution of the Markov chain
1.3. Fuzzy time series
1.3.1. Some concepts
Suppose U be the discourse domain. This space determines a set of objects. If A is a
crisp set of U then we can determine exactly a feature function:
( ) {
Definition 1.3.1. [34]: Suppose U be the discource domain and 1 2{ , ,..., }nU u u u . A fuzzy set
A in U defined:
1 1 2 2 n n= ( )/ + ( )/ +...+ ( )/A A AA f u u f u u f u u
Af is membership function of fuzzy set A and Af : [0;1],U ( )A if u is a degree of
membership (the rank) of iu on A .
Definition 1.3.2. [34]: Let ( )( 0,1,2,...)Y t t be a time series that its values in the discource,