چكيده لاتين
With the increase in the volume of electronic texts, especially in the field of news, the main goal of this research is to compare methods based on deep learning and Wordnet ontology for automatic classification of news texts. In terms of the research method, the current research is of the type of applied research. Since the current research describes how to automatically classify news texts based on methods based on deep learning and Wordnet ontology, and the performance of automatic classification of news texts is compared and evaluated with the mentioned methods, it is also included in the comparative research group in terms of the implementation method. Takes.
For this purpose, first, the English news data of 20 newsgroups, which were 19,997 thousand texts, were pre-processed in Python operating environment and classified based on methods based on deep learning (including convolutional neural network and long short-term memory), Wordnet ontology and hybrid approach. Using deep learning methods, first the important words of the text were determined with the token function. Then, the words were converted into feature vectors with the embedding function and considered as the input of deep learning methods. With the Verdent ontology method, first the most important words of the text were identified with the TF-EDF function, then their lexical ontology equivalents were added to the set of most important words of each text with Verdent and Synset block. In the end, with the combined approach of combining the two methods mentioned, the text became an ontology-based text, and it was generally determined which of the three methods of classifying news texts would improve the classification performance. Four criteria of accuracy, precision, coverage and F were used to evaluate the performance of all three mentioned methods. The findings showed that the performance of automatic classification of texts based on the convolutional neural network method with the criteria of accuracy, precision, coverage and F respectively 0.70, 0.92, 0.38 and 0.54 Verdent ontology method respectively 0.39, 71. 0, 0.39, 0.50, and the combined method was 0.79, 0.93, 0.42, and 0.58, respectively. Among the criteria of correctness, accuracy, coverage and F, convolutional neural network has higher performance and accuracy than Wordnet ontology. The accuracy rate of convolutional neural network of deep learning method is equal to 0.92, which is the lowest value compared to the accuracy of ontology method equal to 0.71. Perhaps the reason for this can be explained as deep learning-based methods avoid many problems such as dimension explosion, data scattering, and have strong learning ability and higher prediction accuracy. The methods based on neural networks are able to manage a huge amount of data, but the performance of Wordnet ontology is weakened with the increase in the amount of data, and its most important advantage is the increase in accuracy by avoiding the repetition of words, as well as paying attention to the semantic relationships of words. In order to determine the best performance with the 4 mentioned criteria, the combined method of the best performance and the highest accuracy of 93. compared to the other two methods. Perhaps the reason for this can be explained as the abilities of two methods have been used at the same time. In general, the importance of this research can be expressed as follows; Although machine learning methods, including deep learning, are considered a powerful method for classifying texts, they only use syntactic information and connections between words in the text in this regard. But using the knowledge of semantic connections between the words of the texts in these methods can lead to more accurate results in text classification.