Google Health and Stanford Medicine introduce the SCIN dataset, crowdsourced real-world dermatology images for inclusive healthcare research

Written by Katie McCool

An close-up image of a woman's face. A magnifying glass is held up to her cheek, highlighting otherwise unseen red lumps. To represent crowdsourced real-world skin images create the SCIN dataset

Google Health and Stanford Medicine have collaborated to create the Skin Condition Image Network (SCIN) dataset. With a focus on democratizing access to diverse dermatology images, the SCIN dataset represents a novel approach to data collection, leveraging crowdsourcing to gather real-world skin images from internet users across the US.

Traditional health datasets often fall short in representing the breadth and diversity of real-world conditions, posing challenges for research, medical education, and the development of AI tools. Dermatology, with its wide array of conditions and manifestations across different skin tones, is an area that lends itself well to creating a crowdsourced dataset, making it an ideal field to pioneer new methods for creating representative health datasets.

The SCIN dataset comprises over 10,000 images of skin, nail, or hair conditions contributed by thousands of individuals. By inviting voluntary contributions through Google Search advertisements, the initiative reached individuals at various stages of their health concerns, potentially capturing conditions before formal medical intervention. Up to three dermatologists then labeled each contribution, providing valuable insights into the diverse array of dermatological concerns.

Dr Karen DeSalvo, Chief Health Officer at Google Health, emphasized the significance of this database for AI, highlighting the importance of integrating technology into everyday healthcare experiences. However, she acknowledged the irreplaceable role of human expertise in health care, stating, “We must remember that AI is just a tool and at the end of the day, health is human.”

One of the key strengths of the SCIN dataset is its diversity, encompasses a wide range of skin tones, ages, settings, and condition severities. This inclusivity is vital for developing tools that cater to the needs of all individuals, regardless of demographic background.

Dr Ivor Horn, Chief Health Equity Officer at Google, echoed DeSalvo’s sentiments, emphasizing the criticality representative data in AI model development, saying, “Without diverse, representative data, AI models can do more harm than good.”

The SCIN dataset not only addresses the need for representative dermatology images but also establishes a model for future healthcare data collection efforts. By leveraging crowdsourcing instead of standard electronic health record-based data extraction, Google Health and Stanford Medicine have introduced a scalable method for creating inclusive and diverse health datasets, enabling the potential for more equitable healthcare solutions.

As the SCIN dataset becomes accessible to researchers, medical professionals, and educators worldwide, it offers the prospect of advancing dermatology research, enhancing medical education, and supporting the development of AI-driven healthcare tools.

The use of search ads for crowdsourcing also presents opportunities beyond dermatology. This method could also be applied to other health data types, particularly those that internet users commonly possess or can self-label. It holds promise for enhancing rare disease registries, diversifying clinical trial recruitment, and facilitating fully digital clinical studies.

Want regular updates on the latest real-world evidence news straight to your inbox? Become a member on The Evidence Base® today>>>