This post introduces publicly available datasets in healthcare for those who are interested in learning statistical analysis and machine learning.
National Heart, Lung, and Blood Institute (NHLBI) provides three datasets for educational purposes. You need to create an account and submit a request for accessing those datasets from here. The teaching datasets include:
- A subset of the Framingham Heart Study
- The Digitalis Investigation Group (DIG)
- The Childhood Asthma Management Program (CAMP
Surveillance, Epidemiology, and End Results Program (SEER) Incidence Data, 1975 – 2017
SEER collects cancer incidence data from population-based cancer registries covering approximately 34.6 percent of the U.S. population. The SEER registries collect data on patient demographics, primary tumor site, tumor morphology, stage at diagnosis, and first course of treatment, and they follow up with patients for vital status.
Note that you are going to install the dataset using SEER*Stat software, which runs on Windows only.