Materials and Methods: Five publicly available chatbots, GPT-5o, GPT-4o, GPT-4.1, Claude Opus- 4, and Gemini Pro were tested using a standardized prompt on postoperative care after lung resection. Each Chatbot’s response was independently assessed by a thoracic surgeon using two validated scoring systems: the Modified Ensuring Quality Information for Patients (mEQIP) and the Quality Analysis of Medical Artificial Intelligence (QAMAI). Readability was evaluated by the Average Reading Level Consensus (ARLC) index. Descriptive and comparative analyses were performed. As no human or patient data were used, ethical approval was exempt.
Results: The mean mEQIP score across models was 84.7 ± 5.5 %, indicating high content quality, and the mean QAMAI score was 27.2 ± 2.0 / 30, reflecting high accuracy and completeness. GPT-4.1 and GPT-5o achieved the highest scores, whereas Gemini Pro provided the least comprehensive content. The mean ARLC grade was 11.0 ± 0.6, corresponding to a college reading level.
Conclusion: AI chatbots can produce accurate, guideline-consistent postoperative information after thoracic surgery; however, their language complexity often exceeds that of most patients. Simplifying expressions and improving transparency are essential before chatbots can be safely integrated into postoperative patient education.
Keywords : artificial intelligence, chatbot, thoracic surgery, postoperative care, patient education, readability



