How Virgo is using DINOv2 to analyze endoscopy videos for precision medicine

Endoscopic procedures, which involve a doctor using a flexible camera to look at an area of interest in a patient’s body, are important for the early detection of diseases. In the US, it’s estimated that more than 20 million gastrointestinal (GI) endoscopies are performed annually, for indications ranging from colorectal cancer screening to diagnosis of inflammatory bowel disease (IBD). Yet despite being one of the most common medical procedures, the medical community has traditionally rarely saved endoscopy video for further analysis.

Virgo, a company based in San Diego, California, believes endoscopy video and AI are a powerful combination that could lead to improved patient outcomes, including by helping patients match with clinical trials and predicting the likelihood of response to a particular drug. Virgo’s aim is to build the future of endoscopy through scalable and secure data capture, workflow automation tools for clinical trials, and AI foundation models to advance precision medicine.

“We’ve captured over 1.75 million procedure videos, which to our knowledge is the largest dataset of its kind,” says Matt Schwartz, founder and CEO of Virgo. “This massive, diverse, and growing repository of endoscopy videos is now enabling us to develop next-generation foundation models specifically for endoscopy.”

Schwartz and his team developed VirgoCloud, a hardware and software solution that can connect to an existing endoscopic video processor to capture, compress, and encrypt procedure videos to send to a HIPAA-compliant web portal. While collecting and securely processing the video was the first step of the process, the team also built AutoIBD, its first AI model that could learn from the repository of videos and flag potentially eligible candidates for enrollment in IBD clinical trials.

When the team learned about Meta’s DINOv2, which provides practitioners with a state-of-the-art, generalized feature extractor for computer vision, Schwartz says they were eager to find out if the open source technology could supercharge their AI model.

“We’ve always been big fans of and believers in self-supervised learning techniques, in large part because we capture an incredible amount of video data and it’s impractical for us to get that data fully labeled,” he says. “We saw the DINOv2 open sourcing announcement and just dove right in.”

As a first step, the Virgo team tested whether they could use DINOv2 as a feature extractor for endoscopy videos. They found that it performed well across a number of tasks and decided to update their AutoIBD model with DINOv2 as the new backbone. The team saw an immediate performance boost and quickly worked to build DINOv2 feature extraction into the core architecture of VirgoCloud.

Inspired by the strong out-of-the-box performance of DINOv2, Virgo set its sights on developing a state-of-the-art AI foundation model specifically for endoscopy. Thanks to DINOv2’s open source license, Virgo jumpstarted their efforts by using the DINOv2 architecture and code repository. With that in place, the team prepared their industry leading dataset and successfully trained their first model, which they’re calling EndoDINO.

In their first research paper, the Virgo team demonstrated that EndoDINO achieves state-of-the-art performance in a wide range of AI benchmarks for endoscopy. These benchmarks include tasks such as anatomical landmark classification, disease severity scoring for ulcerative colitis, and polyp segmentation. The team is now working to demonstrate that their model is capable of analyzing a patient’s baseline colonoscopy and predicting things like age, sex, BMI, and, most importantly, likelihood of achieving clinical remission from IBD on a particular drug.

“DINOv2 was a game changer for us,” Schwartz says. “The overall DINOv2 framework helps us experiment faster because training for downstream tasks is relatively efficient after features are extracted. When it was first released, we were able to quickly leverage the models to improve patient recruitment for inflammatory bowel disease trials. We’re a relatively small team, but we have a massive dataset. With DINOv2 being open source, we were perfectly situated to use the architecture to accelerate our own self-supervised model development. I’m not sure we could have trained EndoDINO and achieved these breakthrough results without DINOv2.”

Virgo is making EndoDINO available to pharmaceutical companies and academic medical centers through a development platform called EndoML. Physician leaders from these medical centers will collaborate with Virgo to explore new AI applications with EndoDINO. Dr. Ali Soroush, assistant professor of Data-Driven and Digital Medicine (D3M) and Gastroenterology at the Icahn School of Medicine at Mount Sinai is one of the first collaborators. (Soroush holds equity in Virgo Surgical Video Solutions through his role as an advisor.)

“EndoDINO advances beyond traditional task-specific models to deliver a foundation model with a broad knowledge base and multitask capabilities,” Soroush says. “This model can help uncover patterns and predictions previously beyond reach, paving the way for more precise and innovative applications in endoscopy and gastroenterology, particularly for less common conditions.”

Schwartz sees this as the start of a new era for AI in gastroenterology. “Self-supervised and unsupervised learning are clearly the future in machine learning and computer vision,” he says. “With EndoDINO, we’re enabling other groups to rapidly train powerful AI applications. This opens up all sorts of new opportunities to leverage endoscopy for precision medicine. We even see a world where hospitals can train and deploy their own models for things like real-time polyp detection and classification, tailored to their specific patient population.”