Mobile QR Code QR CODE : Journal of the Korean Society of Civil Engineers
Title Comparison of Character Embedding-Based Deep Learning Models for Classifying the Difficulty Level of BIM Requirements in Bid Sentences
Authors 김정수(Kim, Jeongsoo)
DOI https://doi.org/10.12652/Ksce.2024.44.6.0843
Page pp.843-853
ISSN 10156348
Keywords 문자 임베딩; 합성곱 신경망; 양방향 장단기기억 신경망; 입찰; BIM 요구사항 Character embedding; Convolutional neural network; Bi-directional long-short term memory neural network; Bid; BIM requirement
Abstract With the mandatory application of Building Information Modeling (BIM) in construction projects, the analysis of BIM requirements
from the bidding stage has become increasingly important. To effectively handle the vast amount of bid text information, research on
natural language processing has been actively conducted. However, domestic bid documents often contain a mixture of Korean and
English, and the inconsistent representation of identical expressions limits text quality improvement even after pre-processing, which
in turn affects the performance of morphological analyzers. These challenges contribute to the degradation of analytical model
performance. To mitigate performance degradation caused by pre-processing, this study proposes a deep learning model that classifies
the difficulty level of BIM requirements in bid sentences using character embeddings. The study analyzes the impact of key parameters
on three types of character embedding models: a simple character embedding, a character embedding combined with the Bi-directional
long short-term memory layer, and a character embedding combined with the convolutional neural networks (CNN) layer. Based on
this analysis, the optimal model was derived, and the performance of BIM requirement difficulty classification was compared across
the character embedding models. Additionally, the results were compared with those of other models based on word embeddings.
Among the three models, the character embedding model utilizing CNN achieved the highest F1 score (0.98), with the number and
size of convolutional filters having the most significant impact on F1 score variation. Furthermore, it was confirmed that the use of
character embedding models resulted in an approximately 15% improvement in the F1 score compared to word embedding models.