Data Description

The Mycetoma MicroImage Challenge leverages a rigorously curated dataset of histopathological images meticulously collected and annotated to support the development of automated diagnostic tools. These images are central to advancing the diagnosis of Mycetoma, a neglected tropical disease that presents significant diagnostic challenges. The data consists of 863 images, with 70% allocated to the training set, 10% to the validation set, and 20% to the testing set.

Data Acquisition

  • Device Used: Nikon Eclipse 80i digital microscope.
  • Resolution and Format: Images are captured at 800×600, utilizing a 10X magnification in RGB colour space.
  • Data Collection Site: The data was from mycetoma grains from mycetoma patients’ surgical biopsies obtained from the Mycetoma Research Center (MRC) in Khartoum, Sudan, from.

Annotations and Labels

  • Annotation Process:: Expert pathologists with specialised experience in mycetoma identified mycetoma infection and its types from the images. Each image was manually annotated to mark mycetoma grains, distinguishing between Actinomycetoma (bacterial) and Eumycetoma (fungal).
  • Annotation Details: Each grain within an image is labelled with its respective type. Manual segmentation is provided for grains, outlining their precise boundaries within the images.

Data Integrity and Bias Mitigation

  • Splitting Strategy: To prevent statistical bias, images from the same patient are exclusively assigned to one of the sets (training, validation, or testing), ensuring that no patient’s images appear across multiple sets.
  • Class Distribution: The dataset reflects the diversity and variability of Mycetoma cases, with a balanced representation of both types of infections. This ensures models are well-adapted to the real-world prevalence and variations of the disease.

Data Accessibility

  • The training and validation datasets will be accessible after registration.
  • All registered participants will gain access to the data via the AfricaAI repository, ensuring secure and equal access for all.
  • Usage Conditions: Data usage is governed by the Creative Commons Attribution License (CC BY), allowing participants to use, distribute, and build upon the data, provided that proper attribution is given. Mention above.