Splitting a dataset into 2 partitions Wednesday. April 06, 2016 - 1 min We may sometimes need to split a dataset into 2 partitions, for example to create a training and testing dataset to fit a regression model and then perform internal validation.
#Create simulated data
set.seed ( 10 )
dat <- data.frame ( col1 = rnorm ( n = 100 , mean = 1 , sd = 1 ), col2 = runif ( n = 100 , min = 0 , max = 1 ))
#Create partitions
set.seed ( 20 )
x <- sample ( length ( dat $ col1 ), size = length ( dat $ col1 ) / 3 )
testing <- dat [ x , ]
training <- dat [ - x , ]
nrow ( training )
## [1] 67
nrow ( testing )
## [1] 33
head ( training )
## col1 col2
## 2 0.8157475 0.5358950
## 4 0.4008323 0.8480705
## 8 0.6363240 0.9986345
## 11 2.1017795 0.7133420
## 12 1.7557815 0.6706220
## 13 0.7617664 0.7456896
head ( testing )
## col1 col2
## 88 3.2205197 0.5554849
## 77 -0.4798270 0.7477093
## 28 0.1278412 0.8037932
## 52 0.6654434 0.8751127
## 93 1.0695448 0.7021693
## 94 2.1553483 0.9937923
Danny Wong Anaesthetist & Health Services Researcher