Evaluating AI Language Models in News Retrieval: A Comparative Study Of ChatGPT-Plus and DeepSeek (R1)

Omar Al-Janabi; Osamah Mohammed Alyasiri; Elaf Ayyed Jebur; Shahad Mohgoob Nafl

doi:10.51173/ijds.v2i2.33

Authors

Omar Al-Janabi College of Medicine, University of Baghdad, Baghdad, 10047, Iraq https://orcid.org/0000-0002-1044-8234
Osamah Mohammed Alyasiri Karbala Technical Institute, Al-Furat Al-Awsat Technical University, Karbala, 56001, Iraq; School of Computer Sciences, Universiti Sains Malaysia, Penang, 11800, Malaysia https://orcid.org/0000-0002-2345-2443
Elaf Ayyed Jebur College of Medicine, University of Baghdad, Baghdad, 10047, Iraq https://orcid.org/0000-0002-7210-025X
Shahad Mohgoob Nafl College of Medicine, University of Baghdad, Baghdad, 10047, Iraq https://orcid.org/0000-0001-8746-1587

DOI:

https://doi.org/10.51173/ijds.v2i2.33

Keywords:

LLMs, Information Retrieval, News Accessing, ChatGPT, DeepSeek

Abstract

The increasing complexity of how humans interact with and process information has demonstrated significant advancements in Natural Language Processing (NLP), transitioning from task-specific architectures to generalized frameworks applicable across multiple tasks. Despite their success, challenges persist in specialized domains such as translation, where instruction tuning may prioritize fluency over accuracy. Against this backdrop, the present study conducts a comparative evaluation of ChatGPT-Plus and DeepSeek (R1) on a high-fidelity bilingual retrieval-and-translation task. A single standardize prompt directs each model to access the Arabic-language news section of the College of Medicine, University of Baghdad, retrieve the three most recent articles, and translate them into English. ChatGPT-Plus fulfilled the prompt successfully, extracting authentic Arabic content and delivering fluent, semantically accurate English translations. DeepSeek (R1), by contrast, failed to retrieve the requested articles and instead produced only generic procedural advice – evidence of its lack of real-time web access and a retrieval-augmented generation (RAG) mechanism.

Downloads

Download data is not yet available.

References

T. Gao, A. Fisch, and D. Chen, “Making pre-trained language models better few-shot learners,” arXiv (Cornell University), Jan. 2020, doi: 10.48550/arxiv.2012.15723.

X. Chen, T. Liu, P. Fournier-Viger, B. Zhang, G. Long, and Q. Zhang, “A fine-grained self-adapting prompt learning approach for few-shot learning with pre-trained language models,” Knowledge-Based Systems, vol. 299, p. 111968, Jun. 2024, doi: 10.1016/j.knosys.2024.111968

W. Lu, R. K. Luu, and M. J. Buehler, “Fine-tuning large language models for domain adaptation: exploration of training strategies, scaling, model merging and synergistic capabilities,” Npj Computational Materials, vol. 11, no. 1, Mar. 2025, doi: 10.1038/s41524-025-01564-y.

A. Matarazzo and R. Torlone, “A Survey on Large Language Models with some Insights on their Capabilities and Limitations,” arXiv (Cornell University), Jan. 2025, doi: 10.48550/arxiv.2501.04040.

L. Zangari, C. M. Greco, D. Picca, and A. Tagarelli, “A survey on moral foundation theory and pre-trained language models: current advances and challenges,” AI & Society, Mar. 2025, doi: 10.1007/s00146-025-02225-w.

Z. Cao, K. Wong, and C.-T. Lin, “Weak human preference supervision for deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5369–5378, Jun. 2021, doi: 10.1109/tnnls.2021.3084198.

R. Lou, K. Zhang, and W. Yin, “Large Language model instruction following: A survey of progresses and challenges,” Computational Linguistics, pp. 1–43, Jun. 2024, doi: 10.1162/coli_a_00523.

V. Iyer, P. Chen, and A. Birch, “Towards effective disambiguation for machine translation with large language models,” in Proc. Conf. Mach. Transl., 2023, pp. 482–495, doi: 10.18653/V1/2023.WMT-1.44.

A. Toral, S. Castilho, K. Hu, and A. Way, “Attaining the unattainable? Reassessing claims of human parity in neural machine translation,” in Proc. WMT 2018 - 3rd Conf. Mach. Transl., vol. 1, pp. 113–123, 2018, doi: 10.18653/V1/W18-6312.

S. A. Al Amer, M. G. Lee, and P. Smith, “Comparative Evaluation of Machine Translation Models Using Human-Translated Social Media Posts as References: Human-Translated Datasets,” Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025), pp. 1–9, 2025, doi: 10.18653/v1/2025.loresmt-1.1.

P. Savcı and B. Das, “Enhancing Text Summarization: Evaluating Transformer-Based Models and the Role of Large Language Models like ChatGPT,” 2023 4th International Informatics and Software Engineering Conference (IISEC), pp. 1–4, Dec. 2023, doi: 10.1109/iisec59749.2023.10391040.

S. H. Koenig and S. S. Hashemi, “Fine-tuning for Lesson Planning,” 2024, Accessed: Jul. 01, 2025. [Online]. Available: https://gupea.ub.gu.se/handle/2077/83633.

Y. K. Dwivedi, T. Malik, L. Hughes, and M. A. Albashrawi, “Scholarly discourse on GenAI’s impact on academic publishing,” J. Comput. Inf. Syst., Dec. 2024, doi: 10.1080/08874417.2024.2435386.

V. Du Preez et al., “From bias to black boxes: understanding and managing the risks of AI – an actuarial perspective,” British Actuarial Journal, vol. 29, p. e6, Apr. 2024, doi: 10.1017/S1357321724000060.

J. C. L. Chow and K. Li, “Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models.,” JMIR Bioinform Biotech, vol. 5, no. 1, p. e64406, Nov. 2024, doi: 10.2196/64406.

G. Fragiadakis, C. Diou, G. Kousiouris, and M. Nikolaidou, “Evaluating Human-AI Collaboration: A review and Methodological framework,” arXiv (Cornell University), Jul. 2024, doi: 10.48550/arxiv.2407.19098.

S. Mirza et al., “Global-Liar: Factuality of LLMs over Time and Geographic Regions,” ArXiv, p. arXiv:2401.17839, Jan. 2024, doi: 10.48550/ARXIV.2401.17839.

Z. Qi et al., “AI and cultural context: An empirical investigation of large language models’ performance on Chinese social work professional standards,” J. Soc. Social Work Res., Dec. 2024, doi: 10.1086/735590.

DeepSeek-AI et al., “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” Jan. 2025, Accessed: Jul. 01, 2025. [Online]. Available: https://arxiv.org/pdf/2501.12948.

G. Mondillo et al., “Comparative evaluation of advanced AI reasoning models in pediatric clinical decision support: ChatGPT O1 vs. DeepSeek-R1,” medRxiv, p. 2025.01.27.25321169, Jan. 2025, doi: 10.1101/2025.01.27.25321169.

A. David Mikhail et al., “Performance of DeepSeek-R1 in ophthalmology: An evaluation of clinical decision-making and cost-effectiveness,” medRxiv, p. 2025.02.10.25322041, Feb. 2025, doi: 10.1101/2025.02.10.25322041.

. Neha and D. Bhati, “A Survey of DeepSeek Models,” Authorea Preprints, Feb. 2025, doi: 10.36227/TECHRXIV.173896582.25938392/V1.

J. Zheng et al., “Fine-tuning Large Language Models for Domain-specific Machine Translation,” Feb. 2024, Accessed: Jul. 01, 2025. [Online]. Available: https://arxiv.org/pdf/2402.1506.

O. M. Alyasiri, Y. N. Cheah, H. Zhang, O. M. Al-Janabi, and A. K. Abasi, “Text classification based on optimization feature selection methods: A review and future directions,” Multimed. Tools Appl., pp. 1–47, Jul. 2024, doi: 10.1007/S11042-024-19769-6.

O. M. Al-Janabi, N. H. A. H. Malim, and Y. N. Cheah, “Aspect categorization using domain-trained word embedding and topic modelling,” Lect. Notes Electr. Eng., vol. 619, pp. 191–198, 2020, doi: 10.1007/978-981-15-1289-6_18.

O. M. Al-Janabi, N. H. A. H. Malim, and Y.-N. Cheah, “Unsupervised model for aspect categorization and implicit aspect extraction,” Knowl. Inf. Syst., vol. 64, no. 6, pp. 1625–1651, 2022, doi: 10.1007/s10115-022-01678-5.

O. M. Alyasiri, Y. -N. Cheah, A. K. Abasi and O. M. Al-Janabi, "Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review," in IEEE Access, vol. 10, pp. 39833-39852, 2022, doi: 10.1109/ACCESS.2022.3165814.

H. N. Abosaooda, S. B. Ariffin, O. M. Alyasiri, and A. A. Noor, “Evaluating the Effectiveness of AI Tools in Mathematical Modelling of Various Life Phenomena: A Proposed Approach,” InfoTech Spectrum: Iraqi Journal of Data Science , vol. 2, no. 1, pp. 16–25, Jan. 2025, doi: 10.51173/ijds.v2i1.16.

A. Hendy et al., “How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation,” Feb. 2023, Accessed: Jul. 01, 2025. [Online]. Available: https://arxiv.org/pdf/2302.09210.

T. Wang et al., “What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?,” Proc Mach Learn Res, vol. 162, pp. 22964–22984, Apr. 2022, Accessed: Jul. 01, 2025. [Online]. Available: https://arxiv.org/pdf/2204.05832.

A. Rahman et al., “Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects,” Feb. 2025, Accessed: Jul. 01, 2025. [Online]. Available: https://arxiv.org/pdf/2503.04783.

W. Lai, M. Mesgar, and A. Fraser, “LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback,” Jun. 2024, Accessed: Jul. 01, 2025. [Online]. Available: https://arxiv.org/pdf/2406.01771.