How to split data into training, validation, and test sets

In this post, I am going to introduce several ways to split data into training, validation, and test sets for your machine learning project.

scikit-learn

Assuming that you have lists of paths to images (image_paths) and corresponding masks (mask_paths) and want to split each list to training (70%), validation (15%), and test sets (15%).

### sklearn v.0.24.2
from sklearn.model_selection import train_test_split

# First, split data into train and test sets
train_image_paths, test_image_paths, train_mask_paths, test_mask_paths = train_test_split(image_paths, mask_paths, test_size=0.3, random_state=0)

# And then split the test set into validation and test sets
val_image_paths, test_image_paths, val_mask_paths, test_mask_paths = train_test_split(test_image_paths, test_mask_paths, test_size=0.5, random_state=0)

to be updated

Comments

Copied title and URL