Splitting a dataset into 2 partitions
- 1 minWe may sometimes need to split a dataset into 2 partitions, for example to create a training and testing dataset to fit a regression model and then perform internal validation.
#Create simulated data
set.seed(10)
dat <- data.frame(col1 = rnorm(n = 100, mean = 1, sd = 1), col2 = runif(n = 100, min = 0, max = 1))
#Create partitions
set.seed(20)
x <- sample(length(dat$col1), size = length(dat$col1)/3)
testing <- dat[x, ]
training <- dat[-x, ]
nrow(training)
## [1] 67
nrow(testing)
## [1] 33
head(training)
## col1 col2
## 2 0.8157475 0.5358950
## 4 0.4008323 0.8480705
## 8 0.6363240 0.9986345
## 11 2.1017795 0.7133420
## 12 1.7557815 0.6706220
## 13 0.7617664 0.7456896
head(testing)
## col1 col2
## 88 3.2205197 0.5554849
## 77 -0.4798270 0.7477093
## 28 0.1278412 0.8037932
## 52 0.6654434 0.8751127
## 93 1.0695448 0.7021693
## 94 2.1553483 0.9937923