CoAtNet for Chest X-Ray Report Generation with Bi-LSTM and Multi-Head Attention

Medical Image Captioning Chest X-Ray Report Generation Gamma Correction CoAtNet Bi-LSTM Multi-Head Attention Data Imbalance Handling

Authors

September 23, 2025
October 12, 2025
October 20, 2025

Downloads

In clinical environments, Chest X-Ray (CXR) represents the most prevalent diagnostic instrument, particularly facilitating diagnostic procedures through medical report. However, manual report preparation is time-consuming, highly dependent on the expertise of radiologists, and carries the risk of errors due to high workloads and limited expert staff. Therefore, an automated system based on artificial intelligence is needed to ease the workload of radiologists while increasing consistency. This study aims to develop an automated medical report generation system with balanced data distribution, reliable encoder, and bidirectional contextual understanding. The main contributions of this study include the implementation of an undersampling strategy based on majority captions followed by oversampling on minority labels while maintaining a proportion of labels with higher frequencies, the use of Bi-LSTM with Multi Head Attention (MHA) to strengthen text context understanding, and the use of CoAtNet as a visual encoder that combines the strengths of CNN and Transformer. The methodology incorporates image preprocessing via gamma correction for contrast improvement, data selection, balancing through combined undersampling and oversampling, and CoAtNet implementation as encoder paired with Bi-LSTM and MHA as decoder. Experimental execution employed the IU X-ray dataset, with assessment conducted using BLEU and ROUGE-L metrics. Outcomes revealed that the CoAtNet configuration with Bi-LSTM and MHA, coupled with the undersampling-oversampling strategy, delivered superior performance evidenced by a cumulative score of 1.642, with BLEU-1 to BLEU-4 and ROUGE-L achieving 0.480, 0.329, 0.245, 0.183, and 0.405, respectively. These findings prove that the combination of data balancing strategies with CoAtNet and Bi-LSTM is able to produce more accurate automated medical reports and reduce bias towards the majority label.

How to Cite

Akbar, R. A., Putra, R. E., & Yustanti, W. (2025). CoAtNet for Chest X-Ray Report Generation with Bi-LSTM and Multi-Head Attention. Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 7(4), 654-672. https://doi.org/10.35882/ijeeemi.v7i4.271

Similar Articles

1-10 of 173

You may also start an advanced similarity search for this article.