A Deep Learning Framework for Extracting and Summarizing Text from Images

Abbas EL DOR; Osama Emad Abdulhussein

doi:10.51173/ijds.v3i1.56

Authors

Abbas EL DOR Department of Computer Science, Faculty of Sciences, Lebanese University, Beirut, Lebanon https://orcid.org/0009-0006-2461-1801
Osama Emad Abdulhussein Department of Computer Science, Faculty of Sciences, Lebanese University, Beirut, Lebanon https://orcid.org/0009-0004-2410-3861

DOI:

https://doi.org/10.51173/ijds.v3i1.56

Keywords:

Text Summarization, Image Text Extraction, Natural Language Processing (NLP), BiLSTM, Optical Character Recognition (OCR)

Abstract

In the digital era, substantial amounts of textual information are embedded in images, especially across news outlets, social platforms, and scanned documents. This presents a significant technical challenge: efficiently extracting and summarizing text from images in an automated way that preserves context and meaning. Traditional text summarization techniques are not directly applicable to image-based content because they depend on pre-structured input text. In this paper, we propose a framework that integrates Optical Character Recognition (OCR) and advanced Natural Language Processing (NLP) models to address this challenge. The proposed method implements OCR to extract raw text from images, followed by deep learning-based summarization using models such as LSTM, Bi-LSTM, BERT and T5. These models are trained on large-scale news datasets to enhance their ability to generate coherent summaries from unstructured text. To ensure accessibility and practical usability, our framework is deployed via an interactive web-based interface that allows end-users to upload images and receive concise summaries in real time. Experimental evaluation demonstrates the efficacy of the proposed approach, particularly with transformer-based models, in delivering high-quality summarization from visual text sources

Downloads

Download data is not yet available.

References

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun ACM, vol. 60, no. 6, pp. 84–90, Jun. 2017, doi: 10.1145/3065386;CSUBTYPE:STRING:MAGAZINE;PAGE:STRING:ARTICLE/CHAPTER.

L. Deng and D. Yu, “Deep Learning: Methods and Applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3–4, pp. 197–387, Jun. 2014, doi: 10.1561/2000000039.

R. Paulus, C. Xiong, and R. Socher, “A Deep Reinforced Model for Abstractive Summarization,” 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, May 2017, Accessed: Jan. 07, 2026. [Online]. Available: https://arxiv.org/pdf/1705.04304

D. Karatzas et al., “ICDAR 2013 robust reading competition,” Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1484–1493, 2013, doi: 10.1109/ICDAR.2013.221.

A. Yadav, S. Singh, M. Siddique, N. Mehta, and A. Kotangale, “OCR using CRNN: A Deep Learning Approach for Text Recognition,” 2023 4th International Conference for Emerging Technology, INCET 2023, 2023, doi: 10.1109/INCET57972.2023.10170436.

W. Wu et al., “ICDAR 2023 Competition on Video Text Reading for Dense and Small Text,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14188 LNCS, pp. 405–419, 2023, doi: 10.1007/978-3-031-41679-8_23.

M. Hasan, E. Rundensteiner, and E. Agu, “DeepEmotex: Classifying Emotion in Text Messages using Deep Transfer Learning,” Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, pp. 5143–5152, 2021, doi: 10.1109/BIGDATA52589.2021.9671803.

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” Jul. 2019, Accessed: Jan. 07, 2026. [Online]. Available: https://arxiv.org/pdf/1907.11692

P. M. Lavanya and E. Sasikala, “Deep learning techniques on text classification using Natural language processing (NLP) in social healthcare network: A comprehensive survey,” 2021 3rd International Conference on Signal Processing and Communication, ICPSC 2021, pp. 603–609, May 2021, doi: 10.1109/ICSPC51351.2021.9451752.

G. Sharma and D. Sharma, “Automatic Text Summarization Methods: A Comprehensive Review,” SN Computer Science 2022 4:1, vol. 4, no. 1, pp. 33-, Oct. 2022, doi: 10.1007/S42979-022-01446-W.

S. Li and J. Xu, “HierMDS: a hierarchical multi-document summarization model with global–local document dependencies,” Neural Computing and Applications 2023 35:25, vol. 35, no. 25, pp. 18553–18570, Jun. 2023, doi: 10.1007/S00521-023-08680-0.

S. Narayan, S. B. Cohen, and M. Lapata, “Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 1797–1807, Aug. 2018, doi: 10.18653/v1/d18-1206.

F. Ladhak, E. Durmus, H. He, C. Cardie, and K. McKeown, “Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1410–1421, 2022, doi: 10.18653/V1/2022.ACL-LONG.100.

M. Zhong, D. Wang, P. Liu, Q. Xipeng, and H. Xuan-Jing, “A Closer Look at Data Bias in Neural Extractive Summarization Models,” pp. 80–89, Nov. 2019, doi: 10.18653/V1/D19-5410.

Y. Liu and M. Lapata, “Text Summarization with Pretrained Encoders,” EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 3730–3740, Aug. 2019, doi: 10.18653/v1/D19-1387.

S. Bae, T. Kim, J. Kim, and S. Lee, “Summary Level Training of Sentence Rewriting for Abstractive Summarization,” pp. 10–20, Sep. 2019, doi: 10.18653/v1/d19-5402.

D. Suleiman and A. Awajan, “Deep Learning Based Abstractive Text Summarization: Approaches, Datasets, Evaluation Measures, and Challenges,” Math Probl Eng, vol. 2020, no. 1, p. 9365340, Jan. 2020, doi: 10.1155/2020/9365340.

M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang, “Extractive Summarization as Text Matching,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 6197–6208, 2020, doi: 10.18653/V1/2020.ACL-MAIN.552.

M. Gambhir and V. Gupta, “Deep learning-based extractive text summarization with word-level attention mechanism,” Multimedia Tools and Applications 2022 81:15, vol. 81, no. 15, pp. 20829–20852, Mar. 2022, doi: 10.1007/S11042-022-12729-Y.

P. Mahalakshmi and N. S. Fatima, “Summarization of Text and Image Captioning in Information Retrieval Using Deep Learning Techniques,” IEEE Access, vol. 10, pp. 18289–18297, 2022, doi: 10.1109/ACCESS.2022.3150414.

D. Anand and R. Wagh, “Effective deep learning approaches for summarization of legal texts,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 5, pp. 2141–2150, May 2022, doi: 10.1016/J.JKSUCI.2019.11.015.

G. Karuna, M. Akshith, P. S. Dinesh, B. V. Vardhan, Y. S. Bisht, and M. N. Narsaiah, “Automated Abstractive Text Summarization using Deep Learning,” E3S Web of Conferences, vol. 430, p. 01021, Oct. 2023, doi: 10.1051/E3SCONF/202343001021.

S. Kumar and A. Solanki, “An abstractive text summarization technique using transformer model with self-attention mechanism,” Neural Computing and Applications 2023 35:25, vol. 35, no. 25, pp. 18603–18622, Jun. 2023, doi: 10.1007/S00521-023-08687-7.

K. Moritz et al., “Teaching machines to read and comprehend,” proceedings.neurips.ccKM Hermann, T Kocisky, E Grefenstette, L Espeholt, W Kay, M Suleyman, P BlunsomAdvances in neural information processing systems, 2015•proceedings.neurips.cc, Accessed: Jan. 08, 2026. [Online]. Available: https://proceedings.neurips.cc/paper/5945-teaching-machines-to-read-and-comprehend

O. M. Al-Janabi, O. M. Alyasiri, E. A. Jebur, and S. M. Nafl, “Evaluating AI Language Models in News Retrieval: A Comparative Study Of ChatGPT-Plus and DeepSeek (R1),” InfoTech Spectrum: Iraqi Journal of Data Science , vol. 2, no. 2, pp. 13–19, Jun. 2025, doi: 10.51173/IJDS.V2I2.33.

H. Fadhil Khalil, M. Fadhil Ibrahim, and H. Ataallah Hussein, “Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification,” InfoTech Spectrum: Iraqi Journal of Data Science , vol. 1, no. 1, pp. 42–54, Jun. 2024, doi: 10.51173/IJDS.V1I1.10.

F. M. Salem, “Gated RNN: The Long Short-Term Memory (LSTM) RNN,” Recurrent Neural Networks, pp. 71–82, 2022, doi: 10.1007/978-3-030-89929-5_4.

E. Lloret, L. Plaza, and A. Aker, “The challenging task of summary evaluation: an overview,” Language Resources and Evaluation 2017 52:1, vol. 52, no. 1, pp. 101–148, Sep. 2017, doi: 10.1007/S10579-017-9399-2.

M. Barbella and G. Tortora, “Rouge Metric Evaluation for Text Summarization Techniques,” SSRN Electronic Journal, May 2022, doi: 10.2139/SSRN.4120317.