Title |
Comparison of Character Embedding-Based Deep Learning Models for Classifying the Difficulty Level of BIM Requirements in Bid Sentences |
DOI |
https://doi.org/10.12652/Ksce.2024.44.6.0843 |
Keywords |
문자 임베딩; 합성곱 신경망; 양방향 장단기기억 신경망; 입찰; BIM 요구사항 Character embedding; Convolutional neural network; Bi-directional long-short term memory neural network; Bid; BIM requirement |
Abstract |
With the mandatory application of Building Information Modeling (BIM) in construction projects, the analysis of BIM requirements from the bidding stage has become increasingly important. To effectively handle the vast amount of bid text information, research on natural language processing has been actively conducted. However, domestic bid documents often contain a mixture of Korean and English, and the inconsistent representation of identical expressions limits text quality improvement even after pre-processing, which in turn affects the performance of morphological analyzers. These challenges contribute to the degradation of analytical model performance. To mitigate performance degradation caused by pre-processing, this study proposes a deep learning model that classifies the difficulty level of BIM requirements in bid sentences using character embeddings. The study analyzes the impact of key parameters on three types of character embedding models: a simple character embedding, a character embedding combined with the Bi-directional long short-term memory layer, and a character embedding combined with the convolutional neural networks (CNN) layer. Based on this analysis, the optimal model was derived, and the performance of BIM requirement difficulty classification was compared across the character embedding models. Additionally, the results were compared with those of other models based on word embeddings. Among the three models, the character embedding model utilizing CNN achieved the highest F1 score (0.98), with the number and size of convolutional filters having the most significant impact on F1 score variation. Furthermore, it was confirmed that the use of character embedding models resulted in an approximately 15% improvement in the F1 score compared to word embedding models. |