Using ChatGPT for PubMed Searches: Be Smart!

With the advent of new AI technologies geared at making academic life simpler, it can be tempting to try to use these tools to help when in unchartered territory. And while some of these tools like ChatGPT can be helpful starting points, they should not be relied on entirely for your medical research. 

PubMed is a freely available biomedical database that has over 36 million records from leading journals in the medical field. It is usually the starting point for most medical research. However, it is not always easy to navigate, and some researchers struggle with building a comprehensive search. Although PubMed is taught to medical students at the undergraduate level and other to students in the health sciences, it can take a while before you feel comfortable searching on your own. 

Searching in PubMed requires using a balance of the controlled vocabulary terms, MeSH terms, and keywords. These are meant to complement each other and allow you to find the most complete set of results.

ChatGPT can definitely help you get started, and even make helpful suggestions, but we want to make sure you’re using it properly! 

When doing any kind of searching, it is important to break down your research question to its basic searchable components. In the health sciences, we point people to the PICO model, which allows you to identify the patient population (P), the intervention (I), the comparator (C) and the patient outcomes (O). We would then combine these components to come up with a searchable question. 

For the purposes of this blog post, we will be using the following research question as an example: In former smokers with chronic obstructive pulmonary diseases, is pulmonary rehabilitation an effective treatment method?

We can start by asking ChatGPT to tell us the MeSH terms that are appropriate for use in this question:

Looks good, right? Wrong! 

Let’s talk about what works, first. ChatGPT does a great job of breaking down the question and telling you what concepts should be searched. The way the search is entered into the database is also correct. In PubMed, search terms should always be entered between quotation marks and the search fields should be entered in square brackets.

The most glaring problem is that ChatGPT has made up MeSH terms. Pulmonary Rehabilitation is not a MeSH term! Neither is Former Smokers!  If you enter “Pulmonary Rehabilitation”[mesh] or “Former Smokers”[mesh] into PubMed as ChatGPT suggests, you would get zero results. The closest MeSH term for Former Smokers is Ex-Smokers, but there is no close MeSH term for Pulmonary Rehabilitation.

Although ChatGPT’s suggestions are valuable, you always need to check the MeSH database for the accuracy of the terms provided. 

Next, let’s take a look at what ChatGPT will generate when asked to build a keyword search: 

(“Former Smokers” OR “Smoking Cessation” OR “Tobacco Use Cessation”) AND (“Chronic Obstructive Pulmonary Disease” OR COPD) AND (“Pulmonary Rehabilitation” OR “Respiratory Therapy” OR “Exercise Therapy for Lungs”) AND (“Treatment Effectiveness” OR “Therapeutic Efficacy” OR “Outcome Assessment”)

One of the first things that jumps out at me, and what I’ve written in red, is the acronym COPD. While it is not incorrect to enter the acronym as a keyword, the lack of quotation marks is what worries me. In failing to add quotation marks, PubMed triggers Automatic Term Mapping, a problematic feature that adds unnecessary search terms and results based on what the database thinks you are searching for. 

The image below will show you the difference between searching with and without quotation marks:

PubMed has translated the search to include the correct MeSH term, but also include the individual words (disease, pulmonary, obstructive, chronic) as their own, stand-alone keywords. Are you still going to find articles related to chronic obstructive pulmonary disease? Sure. But there are certainly going to be more irrelevant articles for you to sift through. Just look at the difference in the numbers – 109 thousand as compared to 62 thousand! 

Another problem with ChatGPT is its inclusion of outcome search terms. We don’t normally build a search with outcomes, especially such generic ones. Instead, outcomes are screened for once we have our set of results. You will certainly find articles about pulmonary rehabilitation in former smokers with COPD that don’t use terms like “treatment effectiveness” and “outcome assessment” in the title and abstract. By putting these terms in the search, you are forcing the database to look for them and consequently eliminating relevant results. 

When asked to generate a search with MeSH terms and keywords for our initial question, ChatGPT combines what we’ve seen above and gives you one big search string to enter into the database: 

(“Former Smokers”[MeSH] OR “Smoking Cessation”[MeSH] OR “Tobacco Use Cessation”[MeSH] OR “Former Smokers” OR “Smoking Cessation” OR “Tobacco Use Cessation”) AND (“Chronic Obstructive Pulmonary Disease”[MeSH] OR COPD OR “Chronic Obstructive Pulmonary Disease”) AND (“Pulmonary Rehabilitation”[MeSH] OR “Respiratory Therapy”[MeSH] OR “Exercise Therapy for Lungs”[MeSH] OR “Pulmonary Rehabilitation” OR “Respiratory Therapy” OR “Exercise Therapy for Lungs”) AND (“Treatment Effectiveness”[MeSH] OR “Therapeutic Efficacy”[MeSH] OR “Outcome Assessment”[MeSH] OR “Treatment Effectiveness” OR “Therapeutic Efficacy” OR “Outcome Assessment”) 

At the time of this writing, this search yields two results. TWO! In fact, PubMed isn’t too happy with our search either, and issues the following warning: 

Not only has it kept the two made-up MeSH terms that we already saw, but it’s created a few new ones, too, including, Exercise Therapy for Lungs, Treatment EffectivenessTherapeutic Efficacy and Outcome Assessment. And Chronic Obstructive Pulmonary Disease is not technically a MeSH term. The correct MeSH term is Pulmonary Diseases, Chronic Obstructive. MeSH terms need to be exact.

Do better, ChatGPT…

Now, let’s take a look at a search that I, a health sciences human librarian made for the same research question:

(“Ex-Smokers”[MeSH] or “Tobacco Use Cessation”[mesh] or “ex smoker*” or “exsmoker*” or “former smoker*” OR ((“history” AND (“smoking” or “cigarette*” or “tobacco”))) AND (“Breathing Exercises”[Mesh] or “breathing exercise*” or “pulmonary rehab*” or “respirat* rehab*” or “respiratory muscle training” or “breath* control*” or “lung rehab*” or “lung exercise*” or “respiratory exercise*”)) AND (“Pulmonary Disease, Chronic Obstructive”[mesh] or “Chronic Obstructive Lung Disease*” or “Chronic Obstructive Pulmonary Disease*” or “COPD” or “COAD” or “Chronic Obstructive Airway Disease*” or “Chronic Airflow Obstruction*” or “Chronic Bronchitis” or “Pulmonary Emphysema*” or “Centrilobular Emphysema*” or “Panlobular Emphysema*” or “Focal Emphysema*”)

This search combines the correct MeSH terms with keywords and synonyms. I brainstormed different terms for all of the key concepts, and included conditions that fall under the umbrella term of chronic obstructive pulmonary diseases. All search terms are nicely encased in quotation marks to avoid automatic term mapping and I used truncation to account for different spellings. This search yielded 157 results.

Eat your heart out, ChatGPT!

While ChatGPT may not be great at making your search strategy, it can be useful. It can help you break down your question into concepts and offer suggestions during the brainstorming process. Try asking the program to generate synonyms for words – it might bring up things that you never thought of before. 

For example, I asked ChatGPT to generate a list of synonyms for cancer: 

They’re not all winners, and I wouldn’t enter them all into a search engine, but maybe I didn’t think to include malignancy and this was a great reminder. Or maybe I didn’t think to truncate the word cancer as cancer* to include terms such as cancers or cancerous, or to truncate malignan* to account for malignant, malignancy or malignancies. Thanks, ChatGPT! While I don’t recommend using ChatGPT for everything, using it as a thesaurus can be quite fruitful.

Try these tips out the next time you use ChatGPT, or any other AI program, and see the difference in your searches. Don’t forget to contact your librarians for specific questions related to PubMed. You can find a list of librarians by subject matter here.

Introducing the Health Sciences FAQ!

The team of Health Sciences librarians is pleased to announce the launch of the new Health Sciences FAQ. We have put together a list of 22 of the most common questions we’ve seen across the various health sciences fields and provided in-depth answers, as well as resources to help you.

Questions cover topics related to knowledge synthesis, including different types of reviews, foreground vs. background questions, the evidence pyramid, searching, medical databases and more! The FAQ is for anyone thinking about or currently undertaking research in the health sciences, including students in the disciplines of medicine, dentistry and nursing. Does it explain the difference between subject headings and keywords? You bet! Does it answer your PICO assignment? No (sorry!), but it does explain PICO and other question formulation frameworks. 

Still have questions? No problem! Feel free to submit a question for our consideration or leave a comment on an already-published post. Remember, for more immediate assistance during the semester, you can chat or text with a librarian from 10 am to 6 pm, Monday through Friday, and from 12 pm to 5 pm on Saturday and Sunday. Find more information about our Ask a Librarian service here