چكيده لاتين
The present study aims to predict co-authorship among researchers in the field of knowledge and information science based on link prediction in the knowledge graph of this domain. In terms of purpose, this research falls into the category of applied studies, and in terms of approach, it is exploratory survey. The research population consists of journal articles ranked Q1-Q3 in the Scopus database in the field of knowledge and information science, published over a ten-year period (2015-2024). During the data analysis stage, particularly in the dimensional reduction process, articles authored by researchers without collaborators, authors with fewer than three articles, authors with fewer than three co-authorships, and authors who had not published an article since 2020 were excluded from the study population. This was done to focus on a stable network and active authors engaged in scientific collaboration. After preprocessing the data, a heterogeneous knowledge graph for this field was designed, consisting of entities such as authors, articles, journals, affiliations, keyword clusters, countries, and several features and relationships between these entities. Using the HeteroGNN module as a deep learning method to generate vector representations of the network entities, the data were processed. The output vectors from HeteroGNN were then used as input to the LinkPredictor module. By setting a threshold of 0.5 in the evaluation stage, potential co-authorship relationships with scores above this value were identified. The analysis of identified relationships revealed that authors share significant similarities in several entities, including affiliations, country, number of prior collaborations, number of published articles, journals in which they had published, journal quartile ranking and SJR, authors’ keywords, previous mutual collaborators, citation counts, and language of previous publications. These factors can be utilized in decision-making for forming research teams. To evaluate the accuracy of the prediction model for researchers in knowledge and information science, the dataset was divided into three subsets: training (70%), validation (15%), and testing (15%). The model was trained for 100 epochs, and the evaluation metrics Hits@3, Hits@5, Hits@10, Precision, Recall, F1-score, and AUC were calculated for the test set, yielding 0.0098, 0.0163, 0.0327, 0.8207, 1.0000, 0.9015, and 0.9416, respectively. Furthermore, 5-fold cross validation was applied to report the stability of the model across different data samples. The calculation of evalutaion metrics for all folds confirmed the model’s strong ability to distinguish between positive and negative data. The average values of Hits@3, Hits@5, Hits@10, Precision, Recall, F1-score, and AUC in this method were 0.0078, 0.0133, 0.0273, 0.8424, 0.9808, 0.9062, and 0.9525, respectively. It is expected that the designed model and the predicted potential relationships in this study will serve as a basis for recommending suitable scientific collaborators in the field of Knowledge and Information Science, and also be applied in decision-making processes related to the development of scientific collaborations and science policy-making.