Publicly available datasets in healthcare

This post introduces publicly available datasets in healthcare for those who are interested in learning statistical analysis and machine learning.

National Heart, Lung, and Blood Institute (NHLBI) provides three datasets for educational purposes. You need to create an account and submit a request for accessing those datasets from here. The teaching datasets include:

Surveillance, Epidemiology, and End Results Program (SEER) Incidence Data, 1975 – 2017

SEER collects cancer incidence data from population-based cancer registries covering approximately 34.6 percent of the U.S. population. The SEER registries collect data on patient demographics, primary tumor site, tumor morphology, stage at diagnosis, and first course of treatment, and they follow up with patients for vital status.

Note that you are going to install the dataset using SEER*Stat software, which runs on Windows only.


Copied title and URL