A. Sharaf Eldin1, T. Hassan A. Soliman2, M. Mohamed M. Ghareeb3
1Information Systems Department Faculty of Computers and Information, Helwan University, Cairo, Egypt 2Information Systems Department Faculty of Computers and Information, Ain Shams University, Cairo, Egypt 3Information Systems Department, Modern Academy, Cairo, Egypt In the current work, Mutated Rules for Sequential Mining (MRS), an efficient evolutionary sequential pattern mining algorithm, is proposed for mining DNA sequences. MRS applies mutation genetic operator to generate new rules in the population. Pruning redundant rules, MRS efficiently get the new frequent patterns which will be mutated to iteratively generate new rules in the next population. Moreover, MRS does not apply database projection for mining, with high CPU,and memory utilization. However, in order to handle variants length of sequences without alignment, MRS is modified (MMRS) to handle sequences of variants lengths. MMRS outperforms MRS and Apriori algorithms when applied to DNA sequences from GENBANK when testing three main issues: number of bases per sequence, support threshold, and number of sequences in each trial. With the completeness of several genome projects, the number of DNA sequences has tremendously enlarged, which requires more complex techniques to be developed. The biggest excitement currently lies with the availability of complete genome sequences for different organisms. |