Medlinker MedGPT Achieves Groundbreaking 96% Diagnostic Consistency with Experts from 3A Hospitals – A Milestone in China’s Medical Generative AI

608

If you ask what the hottest topic of the first half of the year is, most people will probably include generative AI. On the last day of the first half of 2023, a heavy news of generative AI broke the medical ecosystem and became the best note for the continuous highlight of generative AI in the first half of the year.

On June 30, Medlinker held the first domestic AI doctor and real doctor conformance review in Chengdu and Beijing, with live 24/7 coverage. The results showed that the Medlinker MedGPT-powered AI doctors and physicians from 3A hospitals achieved a 96% agreement in the evaluation score.

Since Chat-GPT became popular at the beginning of the year, generative AI has shown great potential for application in many industries, and may even form disruptive innovations for many industries in the future. According to McKinsey’s report, generative AI may add $2.6-4.4 trillion to global GDP each year – for comparison, the UK’s GDP in 2021 will only be $3.1 trillion.

As generative AI has gradually shown potential to exceed expectations in the exploration of applications in various industries, research institutions have also increased their forecasts for the global market size of generative AI. According to a new report by MarketsandMarkets, the global generative AI market is expected to reach $11.03 billion in 2023 and $51.8 billion in 2028, representing a compound annual growth rate of 35.6%.

Based on this, primary market investment in generative AI is also becoming more active. In just the second half of May, Hyro, a conversational bot company, received $20 million in Series B funding, and Hippocratic AI, a medical-specific generative AI model, received $50 million in seed round funding.

In the medical field, generative AI is seen as a great enabler for healthcare. Applications are already being implemented on the ground in drug discovery and development, medical imaging and diagnostics.

In fact, generative AI has been used in new drug discovery and development for some time. It can learn the mapping relationship from the sequence of a protein to the structure of a protein and solve complex high-dimensional data mapping processing problems based on its powerful arithmetic, thus achieving protein structure prediction which was almost impossible to achieve before. Also, it can generate completely new proteins that do not exist in nature at all based on pre-defined properties and structures.

In combination with medical imaging, generative AI can be enhanced in several ways. One, generative AI can generate synthetic data based on raw data and apply it to the generation of final results for image enhancement, thus breaking the imaging principles and technical limitations of imaging devices and reducing the degradation of impact quality caused by improper operations. Second, generative AI can generate a large amount of synthetic image data for data expansion to be used for model training. This will play an important role in certain data deficient scenarios, such as rare diseases or areas with uneven data distribution. Third, generative AI can predict patient health status and disease risk based on existing data. The industry has already achieved to assess the future risk of cardiovascular and cerebrovascular diseases by observing the developmental changes of retinal vessels and nerves in the population, allowing the generative AI to self-learn and determine the next developmental changes of the subject. In addition, areas including Alzheimer’s risk prediction and myopia progression prediction have also been explored accordingly.

In addition to these two areas, generative AI is also exploring the whole process of clinical diagnosis and treatment, hoping to empower doctors’ diagnosis and treatment and improve patients’ experience. In the pre-consultation stage, generative AI can use its powerful data retrieval and reasoning capabilities to enhance the prediction of patients’ diseases, thus improving the accuracy of triage and guidance. In the mid-consultation stage, generative AI can provide doctors with assisted diagnosis, guidance on treatment and prognosis through data analysis and intelligent algorithms based on multimodal data such as patients’ medical history, symptoms and disease history. In the post-visit phase, generative AI can reduce the burden of medical staff by answering patients’ questions about their conditions, drug side effects, and preventive measures online 24/7; it can also be used as a missionary tool to teach patients proper health knowledge and preventive measures.

For physicians, generative AI also serves as a convenient repository of medical guidelines, helping them to keep abreast of the latest medical research advances, evidence-based medical evidence and clinical guidelines, thereby enhancing professionalism and promoting quality of care. In addition, generative AI is much more anthropomorphic than previous human-machine conversations and will greatly improve the patient experience.

However, these clinical visions are still some distance away from implementation. If you’ve used Chat-GPT, you’ll see that the biggest problem is “serious nonsense”; asking the exact same questions over and over again and giving different answers each time. The root of the problem lies in the fact that the current generative AI is mainly based on the generic large language model similar to GPT, which highly relies on the statistical probability of text to generate answers, and the accuracy of answers cannot be guaranteed.

This is undoubtedly unacceptable in medical application scenarios where accuracy and consistency are the bottom line. Solving this problem requires fine-tuning training and engineering optimization of existing generic big language models and establishing corresponding audit mechanisms to ensure that services with practicality and consistent disease diagnosis and treatment capabilities can be output.

With 96% consistency of diagnosis with experts from 3A hospitals, Medlinker leads generative AI breakthrough

In April 2023, Medlinker announced the launch of MedGPT, a large language model based on Transformer architecture and tuned for medical application scenarios. The model has up to 100 billion parameters, uses up to 2 billion medical text data and up to 8 million clinical consultation data, and is intensively tuned by 100 doctors. In order to address the shortcomings of the generic big language model in medical application scenarios, MedGPT provides several special optimizations for medical application scenarios.

First, MedGPT introduces a consistency check mechanism for the model algorithm. By adding a clinical medical rule checker, MedGPT will be checked by clinical medical rules to ensure medical accuracy before outputting formal answers for patients.

Secondly, Medlinker has established a multi-dimensional diagnosis and treatment accuracy evaluation system for MedGPT, for example, the focus in the consultation scenario is on the consultation accuracy rate, while the focus in the diagnosis scenario is on the diagnostic evidence sufficiency rate, disease accuracy rate and missed diagnosis rate. Through this evaluation system, the consistency and accuracy of MedGPT in the whole process of diagnosis and treatment can be analyzed and evaluated from various aspects.

This is not enough. To measure the output of MedGPT, a real-world physician consistency benchmarking mechanism based on expert review is also needed. This is the purpose of Medlinker’s consistency assessment, which is to assess the consistency of MedGPT with real doctors’ protocols through a single-blind test, and to evaluate the results by a committee of experts.

To this end, Medlinker held the first-ever concordance test between AI doctors and real doctors in China on June 30 in Chengdu, with a live broadcast around the clock. More than 120 real patients and 10 attending physicians and above from the departments of cardiology, gastroenterology, respiratory medicine, endocrinology, nephrology, orthopedics, and urology of West China Hospital of Sichuan University participated in the one-day evaluation study.

In order to ensure the rationality and scientificity of the evaluation, the consultation session of the test was specially designed: after entering the consultation room, the patient will communicate with the medical assistant about his or her condition, and the medical assistant will communicate the patient’s complaints to the real doctor and the AI doctor respectively through online text input, and assist the doctor and patient to complete multiple rounds of communication.

After collecting enough decision factors, the real doctor and the AI doctor will issue a checklist or diagnosis for the patient, and the patient can complete the examination directly at the hospital site. Subsequently, the patient can bring the examination results for a follow-up consultation, and the AI doctor and the real doctor will provide clinical diagnosis and treatment plan respectively and summarize them. Through the above process, it is possible for real doctors and AI doctors to make independent diagnoses without interfering with each other under basically the same conditions. Of course, if the participating patients still have doubts about the results, they can directly communicate with the attending physicians from Huaxi Hospital stationed at the site face to face to ensure patient satisfaction.

After the consultation, seven expert professors from Peking University People’s Hospital, China-Japan Friendship Hospital and other hospitals reviewed the 91 valid cases formed by the evaluation and assessed the AI doctors’ accuracy of consultation, diagnosis, treatment recommendation, and auxiliary diagnosis. The AI doctors were scored on seven evaluation dimensions: accuracy of consultation, accuracy of diagnosis, accuracy of treatment recommendations, accuracy of auxiliary examination protocols, accuracy of data analysis, provision of interpretable information, and natural language consultation and interaction. After three hours of comparative analysis and judgment, and after combining the judgments and scores of all the judges from the expert panel, the real doctor scored 7.5 and the AI doctor scored 7.2. The consistency of the score between the AI doctor and the tertiary care doctor reached 96%.

This result exceeded everyone’s expectation and was highly recognized by the review experts. The reviewers generally agreed that MedGPT collects enough information through multiple rounds of questioning and advances the consultation process with the premise of ensuring medical accuracy, so the probability of misdiagnosis and missed diagnosis is smaller.

Surprisingly, MedGPT also diagnosed diseases that did not belong to the department in question based on the patient’s complaints, and gave other possible judgments. This is not easy to do in a routine specialty consultation. According to the reviewers, the MedGPT’s knowledge coverage exceeds that of some less experienced real doctors. What’s more, MedGPT not only achieves a certain level of consistency, but also enables for the first time to prescribe necessary medical tests to patients when the diagnosis is still unclear, and to make accurate disease diagnosis and design subsequent disease treatment plans based on the medical test data returned by patients. This is already a routine operation for real doctors, but it is a huge breakthrough for AI.

As early as May, MedGPT already has a variety of medical test modal capabilities, which can be used with Medlinker’s various cloud-based capabilities (e.g. “cloud test”) to conduct tests, enabling patients to complete the entire process of consultation, test, diagnosis and drug purchase without leaving home. In addition, MedGPT will actively provide patients with medication guidance and management, intelligent follow-up and rehabilitation guidance, and other intelligent disease treatment actions after patients receive their medications.

At present, Medlinker MedGPT plugin application platform has integrated more than 1,000 medical multimodal capabilities of its own and third parties, which greatly enriches and improves the whole process of intelligent diagnosis and treatment experience. In addition, Medlinker is also rapidly iterating in the area of covered diseases – by the end of this year, MedGPT will increase the number of covered diseases (ICD10 subheadings) from the current 100 categories to 300 categories, and the percentage of patient visits that can be covered will increase from 60% to 80%. Although MedGPT is still in the testing phase, the progress so far shows that it is getting closer to the first live installation to assist doctors.

With each step forward, MedGPT is making new history, as it releases its first medical-specific big model, makes the first AI leap from online consultation to medical examination, and completes the first diagnostic consistency assessment between AI doctors and real doctors with outstanding results. With a series of “firsts” in the field of medical-specific generative AI, Medlinker has become a leader in medical generative AI.