چكيده لاتين
Abstract
Personality encompasses behaviors, thinking styles, speaking patterns, environmental perception, and interpersonal interactions that are consistently observed in an individual based on a recognizable pattern. An individualʹs personality influences all aspects of their life, affecting daily activities, emotions, preferences, and decisions. Therefore, identifying an individualʹs personality traits is of particular importance and can aid in creating many customized services or products. To understand personality, existing personality tests such as the Big Five Personality Traits or the Myers-Briggs Type Indicator (MBTI) can be used. A traditional approach to personality assessment involves the use of questionnaires, which can be time-consuming, and individuals may not be willing to spend significant time filling them out. Today, the use of social networks such as Facebook, Twitter, and YouTube is on the rise, and these platforms have become popular spaces for sharing thoughts, feelings, experiences, and personal information in the form of text, audio, images, or videos. Within this vast volume of generated data lies valuable information that, when extracted and analyzed, can be used for automatic personality prediction and, in turn, to assist businesses in understanding user needs. As a result, data from usersʹ social networks can be utilized to predict their personality.
Typically, current methods infer personality traits from static frames or short audio-visual segments, leading to unreliable results and a ill-posed machine learning problem. Clip-level models, by removing a large number of frames, overlook short-term cues related to personality and yield weaker results. Methods that retain all frames are also time-consuming. Additionally, deep learning-based methods neglect domain-specific knowledge related to personality, resulting in models that are not reliable. It is necessary to integrate psychological and physiological findings into model design. In this research, an attempt has been made to extract features from different modalities by incorporating psychological cues. Furthermore, graph fusion has been employed to explore the relationships and correlations between features of different modalities, with the final prediction made at the entire video level, ensuring the model is interpretable and explainable. The results show that the GATGraphV1 model, applied to four extracted modalities (audio, visual, and two textual modalities), achieved an average ACC of 0.8990 and an average CCC of 0.4104. On the VPTD dataset, this model achieved an average ACC of 0.8428 and an average CCC of 0.0491.
Keywords: Personality Traits Recognition, Multimodal Data, Deep Learning, Big Five Personality Traits