π Resources
Table of contents
- π Dataset Search and Aggregators
- π Social Good, Open Data, and Benchmarks
- π₯ Health, Environment, and Government Data
- π§ Domain-Specific and Educational ML Data
- β οΈ Reminders for Students
π Dataset Search and Aggregators
- Kaggle
Huge library of datasets from many fields; ready for ML projects. - UCI Machine Learning Repository
Classic, academic machine learning datasets. - Google Dataset Search
Search engine for public datasets worldwide. - Awesome Public Datasets (GitHub)
Curated, topic-based collection of public datasets. - Google Research Datasets
High-quality datasets across many ML domains.
π Social Good, Open Data, and Benchmarks
- AI for Good SDG Data Catalog
Datasets mapped to the UN Sustainable Development Goals. - SustainBench
Benchmarks for ML in sustainability, climate, agriculture, and development. - Datasets-for-Good (GitHub)
Social and environmental impact datasets.
π₯ Health, Environment, and Government Data
- World Health Organization (WHO)
Global health statistics and disease data. - CDC Environmental Public Health Tracking
US health and environmental data. - UN Population Division
Global demographic and social data. - EPA Outdoor Air Quality Data
Download daily US air quality data. - NOAA Climate Data Online (CDO)
US and global weather/climate records. - FBI Uniform Crime Reporting (UCR)
US crime statistics by type and region. - GeoCodes / EarthCube
Geospatial data for Earth and environmental science. - Data.gov
Large US government open data portal. - San Diego Open Data Portal
Local government data for community-focused projects.
π§ Domain-Specific and Educational ML Data
- Retiring Adult / Folktables
US Census-based datasets for income, housing, and equity studies. - HumAID
Labeled disaster and crisis-related social media data. - #TidyTuesday
Weekly real-world datasets, often on social topics.
β οΈ Reminders for Students
- Aggregate-only datasets may not be suitable for machine learning modeling.
If unsure, consult cluster staff. - If you donβt see your topic here, try Google Dataset Search or ask GenAI for suggestions!