Tendency Mining in Dynamic Association Rules Based on SVM Classifier

A method of tendency mining in dynamic association rule based on compatibility feature vector SVM classifier is proposed. Firstly, the class association rule set named CARs is mined by using the method of tendency mining in dynamic association rules. Secondly, the algorithm of SVM is used to construct the classifier based on compatibility feature vector to classify the obtained CARs taking advantage when dealing with high complex data. It uses a method based on judging rules’ weight to construct the model. At last, the method is compared with the traditional methods with respect to the mining accuracy. The method can solve the problem of high time complexity and have a higher accuracy than the traditional methods which is helpful to make mining dynamic association rules more accurate and effective. By analyzing the final results, it is proved that the method has lower complexity and higher classification accuracy.


INTRODUCTION
Associative classification is an important prediction method in data mining of which the algorithm is to integrate the association rule mining and classification.Since the first classification algorithm named CBA [1] was introduced in 1998, it has been very active to design and apply the algorithm in addition to some new associative classification algorithm in research.In 2001, J. Li proposed the CMAR [2] algorithm based on multiple association rules which used the improved frequent item type growth method named FP-Growth method to mine inertia rules and used multiple strong association rules based on the weight to determine the new instance class label.The method was proved to have higher accuracy than the CBA algorithm.In 2003, Yin proposed the predict type classification algorithm CPAR [3] which used the greedy method to excavate smaller set of rules from the data set.In 2004, Antonio proposed an algorithm based on positive and negative rules [4].
Then in 2005, Wang proposed the HARMONY [5] algorithm which mined the highest confidence rules to cover the sample directly.Adriano Veloso proposed the Lazy classifier [6].But the method has performance problems when facing large data sets, so it cannot work in large data set to mine all the rules.Due to the associative classification algorithm considering all the possible relationships between the projects, there may be a large number of redundant rules.On the basis of the above, this paper proposes a method of tendency mining in dynamic association rule based on associative classification.Tendency mining was proposed by introducing a tendency threshold to improve the traditional association rules mining methods to mine those dynamic association rules under a certain trend on the basis of algorithm in support-confidence framework.In associative classification mining, there exist the characteristics of high precision, so by combining the two ways together, it can provide more accurate support for association rule mining method.

TENDENCY MINING IN DYNAMIC ASSOCIATION RULES
Tendency mining [7] is a method of dynamic association rule mining according to the characteristics of the rules changing over time on the basis of SV (support vector) and CV (confidence vector).It uses a tendency threshold to eliminate useless rules to reduce the candidate item set and then to generate the tendency rules to find valuable rules to improve the quality of mining.In this paper, it determines the dynamic tendency rules based on the confidence.In order to mine valuable dynamic association rules, the patterns [8,9] should be known at first: (1) Stable change.Some patterns of phenomenon do not have obvious changes over time.
(2) Strengthen trend change.Some patterns have obvious rising trend over time.
(3) Weaken trend change.Some patterns have obvious downward trend over time.
(4) Cyclical change.The identical pattern repeats under the time interval.
(5) Random change.Some patterns have no obvious regular change.
Here are some lemma and inference below: So dynamic association rule mining can have the following definitions on the basis above.
, then the rule is called confidence drop type dynamic association rule.
Definition4: If each element in SV does not meet the definition 1 to 3, but at a particular time period t = {t 1 ,t 2 ,…,t n } , the value of the elements in SV appears alternately.The rule is called support cycle type dynamic association rule.It is similar for CV .Definition 5: If a rule does not meet the definition 1 to 4, but for an 'n' length of time series, there exists an 'm' length If the support of the previous item is less than the after item, then U is called a rising time support vector sequence with the other conditions called drop type sequence.If the length of rising sequence (n) is not more than m, then it is called the biggest rising time support vector sequence.And in the drop type, it is similar to CV .
Definition 6: Setting L as the length of the biggest rising time support vector sequence and S as the length of the biggest drop time support vector sequence in SV , the trend degree based on confidence vector is described as follows.

SRI = max{L, S} / n
The trend degree based on support confidence is described as follows.

CRI = max{L, S} / n
Definition 7: If a rule meets the definition 1 to 4, it is called a high interesting rule.If its trend degree is above the given DRI threshold, it is called a strong rule.
So the strong rules can be described as follows.
For given data set D , min_ sup, min_ conf and DRI , when a rule meets the following conditions, it is called a strong rule.And s stands for its support, c stands for its confidence, SRI stands for its trend degree based on support vector and CRI stands for its trend degree based on confidence vector.(1) s !min_ sup ; (2) c ! min_ conf ; (3) SRI !DRI(CRI !DRI ) .

METHOD OF ASSOCIATIVE CLASSIFICATION
Mining association rules can generally be divided into two steps.The first step is to find all the frequent and accurate possible rule sets named categorical association rules (CARs) which is the first but most important step of associative classification.

Mining CARs
The traditional dynamic association rule mining algorithm is improved by introducing a tendency threshold to mine rules under a certain trend on the basis of support and confidence.The rules mined according to the definition 1 to 6 have the following conditions.
(1) s < min_ sup ; It means that the rules' importance is not strong and they need to be deleted.
(2) s !min_ sup, c < min_ conf ; It means that the rules have low accuracy and they need to be deleted.
(3) s !min_ sup, c ! min_ conf , SRI < DRI ; It means that the rules have no value and they need to be deleted.
(4) s !min_ sup, c ! min_ conf ,1 !SRI !DRI ; It means that the rules are valuable and they need to be reserved.
The algorithm of tendency mining in dynamic association rules is described as follows.
stands for the i th element value of frequent item set l j .s ( A!B) ij stands for the i th value of the support vector SV j and SRI j stands for the trend degree.(L, FV , s) =Dynamic-frequent-item-set-algorithm (FP- Growth) for each frequent-item-set and then build CV j ; Return;}

The SVM Classifier
The SVM classifier [10] based on compatibility feature vector needs to be constructed on the basis of class association rule sets.In the process, it should weight all the classification association rules according to certain strategy and then calculate their compatibility with the original data set to produce a feature vector collection.In this way, a pattern can be represented with a feature vector and a rule in CARs can be corresponded by a feature.The SVM classifier is constructed by a feature vector.This paper adds the tendency threshold to the method of weighting to construct the SVM classifier based on compatibility feature vector to improve classification method.Here are the rules of score metrics which can indicate the importance of a rule.So it can decide the weight of a rule to present its identification capability in the SVM classifier.The rules weighted formula is as follows.
In the formula, R i C stands for the rule whose label is C , its distribution of measures in the training set as is the number of rules whose label is C .In the process of weighting rules, the distribution of data set for each category should be calculated at first and then stored in d .And then according to the rules' weight, each rule with the weight is stored in a new list called w .The next step is to build the compatibility feature vector.In a distributed learning system, the feature vector is a key attribute to describe a data set.A feature vector is like which f i represents a feature with its value v i .
The compatibility of distribution of the class association rules in the original training set can be obtained by building new feature vectors to get the compatibility measurements between rules and patterns.A feature in the vector can describe the compatibility between a pattern vector and a class association rule.The number of features equal to the number of rules in w .In the given model, a pattern compatible with a rule must meet the following conditions. (1) Their class labels are the same; (2) The compatibility between them is above 0.
Setting the pattern is a continuous attributes space with the dimension of n and it includes m patterns with the same label in the training set in which c stands for the number of labels.The compatibility between the pattern x p = (x p1 , x p2 ,…, x pn ) and the rule R i in w can be calculated by the following formula.
In this, x pn (1 !p ! m) stands for the continuous attributes in the original database and µ ik (x px ) stands for the membership of the rules R i with x pn .So the new way of calculating the < f i , v i > can be described as follows.
When a rule and a pattern are compatible, then stands for the characteristic value of the pattern x p of rules and w i stands for the weight of rules.So the < f i , v i > can be rewritten as < x p R i , f p R i > .
For each pattern x p included in D , first its compatibility with the rules in w is calculated.Then the new characteristics are added to the feature vector to create a feature vector FV p with its description as follows.
At last, it uses the feature vectors to form a feature vector set FV s = {FV p | p = 1, 2,…, m} .
When the feature vectors have been built, the next step is to construct the SVM classifier.In the traditional classification algorithm, the purpose is to reduce the number of rules as far as possible.But it affects classification accuracy.The SVM algorithm has advantages when dealing with problem with high complexity.So it can be helpful to use as many rules to generate a classifier without effects on the results.While in reality, the labels of actual instance are unknown.The classifier is useful to predict the test cases with known labels.

EXPERIMENTAL RESULTS
To test and verify the classification results of the proposed method, the paper selected six data sets in UCI as the experimental objects.The algorithm carries out classification through a 10-cross validation method [11].The relevant attribute information of the six data sets is shown in the Table 1 below.The experiment is run in a computer with the windows XP operating system and the programming language is C#.The FP-Growth algorithm is used in generating CARs.In the case of the same experiment environment and data, the method is compared with the classic associative classification algorithm (CBA and CMAR).Firstly, it used the weighted rules to pruning rules and the results are shown in Table 2.It set the confidence, support and the trend degrees with the values 1%, 50% and 50%.As is shown in Table 3, the method is better than the two algorithms.Due to that, the CBA algorithm sorts the rules based on confidence and its classifier is based on a single rule matching.The method sorts the rules both based on confidence and support.So it is more efficient than the CBA.It establishes a pruning strategy more effective than the CMAR by considering the compatibility between rules and patterns.So the results are more accurate than the CMAR.In the process of experiments, it is found that the change of the support frequency is too random to get effective decision-making information.It can generate fewer rules in uniform distribution data set.But the result is not ideal when the data set is unbalanced while it is better for large data sets like papeBlokcks and waveform.The limitation of DRI and confidence threshold would affect the results a lot.But by improving the confidence threshold, the accuracy of the results is not improved.It may produce invaluable decision rules due to high degree of confidence.So the next study should consider both the confidence and support threshold to obtain a better classifier.

CONCLUSION
This paper presents a method of tendency mining in dynamic association rules based on SVM classifier based on compatible feature vector.First of all, it classifies the dynamic association rules according to the tendency mining method.And secondly, by improving the method of weighting rules to build the classifier, it describes the relevant algorithm.By comparing with the traditional mining algorithm with a sample application, it has been proven that the method can effectively improve the quality of dynamic association rule mining and the mining efficiency and accuracy.The algorithm of classification accuracy is affected by the classification frequency and the confidence.So in the future, the need is to improve the algorithm under the action of the confidence and support at the same time.But in fact, it is always pruning away too many valuable rules for users due to the limitation of tendency threshold.So it is necessary to consider to lighten the weight of trend to build the classifier.
stands for the length of the biggest rise of the child support vector sequence and K stands for the length of the biggest decline of the child support vector sequence in FV */ } callIntGenRule( L, SRI, min_ conf , DRI ); // Get the tendency association rules.Produce callIntGenRule( L, SRI, min_ conf , DRI ){ (R, c) =rule-generation-sub-algorithm ( L, SRI, min_ conf , DRI );

Definition 1 :
If each element in support vector ( SV ) of rule A => B meets the condition Sup ( A!B) > min_ sup , then the rule is called a stability rule based on support.In a similar way, if each element in confidence vector ( CV ) of rule A => B meets the condition Conf ( A!B) > min_ conf , then the rule is called a stability rule based on confidence.
association rule.It is similar in CV .If the elements in CV satisfy that Conf ( A!B) i < Conf ( A!B) i+1 , then the rule is called confidence rising type dynamic association rule.Definition 3: If the rule does not meet the definition 1, but the elements in SV satisfy that Sup ( A!B) i " Sup ( A!B) i+1 , the rule is called support drop type dynamic association rule.It is similar in CV .If the elements in CV satisfy that