Dong Lin and Chu Xiangfeng
For the identification of the gene sequences of the different types of biological "key" to construct the gene sequence screening model based on two -way clustering algorithm. First, the establishment of the FCM algorithm based on the primary model solution similar to clustering samples using two-way clustering algorithm optimized to filter out the "key" gene sequence. The problem of inaccurate forecasts for the experience of the threshold, the introduction of boots with a sampling algorithm based threshold model obtained cluster of clusters. Confidence level α = 0.05 under the highest confidence, in order to solve the species optimal threshold value selected. Checksum achieve the classification of genes coding interval 90% of the validity and accuracy of 88%, a 50% increase compared to the experience threshold algorithm. As for the random noise covering part of intron fluctuations, interfere with gene identification, the wavelet transform function is introduced into the DNA coding region prediction to filter the genes noise. Therefore, In order to solve drawbacks of coding region prediction imprecise, we establish a DNA sequence coding region prediction model based on wavelet transform. Using this model, the detection rate reached to 81%, 27% increase from the neural network method, the prediction accuracy reached to75%, 36% higher than the Fourier analysis.
Поделиться этой статьей