R read dataset

Usually, I store my datasets in an ASCII/CSV file where the first column is the output or response and the subsequent columns are the input variable, with on row per pattern/observation. In order to load those datasets in R, I'll often find myself separating the input from the output into two variables to feed them into some algorithm. Therefore I created the following function, that can be added to the .Rprofile

read.dataset <- function(file, response=1, ...) {

   data <- read.table(file, ...)

   x <- as.matrix(data[,-response])
   y <- data[,response]

   dataset <- list(x=x, y=y)
   return(dataset)
}

With the previous function I can read the dataset in one line, and access separatly the input variables and the output

    train <- read.dataset("somedata.train")
    fit <- lm(train$y ~ train$x)

The function also works if the output is not in the first column, changing the optional parameter response. Optional parameters are also passed along to R function read.table, for instance if the columns are delimited by commas instead of spaces.

Twitter Facebook LinkedIn

Alberto Torres Barrán

R read dataset

You May Also Enjoy

Git prompt with conda and conda-auto-env

Benchmark adding together multiple columns in dplyr

Compute correlations using the tidyverse

Equivalence between distribution functions in R and Python