How to read multiple csv files

Downloading Data

For this session we are going to utilise the dataset from the https://encode.project.org/.

You can download the txt file here:

Install Packages and Loading Libraries

First step is to install packages either from CRAN, Bioconductor or from Github and to load the libraries.

library(data.table)
library(dplyr)
library(tidyverse)
library(plyr)
library(fs)

Importing Data

Navigate your directory to your folder where you have saved the data set. Now we will go ahead to import our data set using the fread command.

my_data <- fread(filename)

Data wrangling

In order to have a clean data set we need to add column names and to remove the columns that are not needed for our final heatmap output.

colnames(my_data)[1:4] <- c("chrom","start","stop","type")
my_data <- my_data[,c(1:4)] #Keeping only the columns 1 to 4
my_data$type <- gsub('[[:digit:]]+_','',my_data$type) #removing the numbers and underscores from our type column
glimpse(my_data)
## Rows: 571,339
## Columns: 4
## $ chrom <chr> "chr1", "chr1", "chr1", "chr1", "chr1", "chr1", "chr1", "chr1", …
## $ start <int> 10000, 10600, 11137, 11737, 11937, 12137, 14537, 20337, 22137, 2…
## $ stop  <int> 10600, 11137, 11737, 11937, 12137, 14537, 20337, 22137, 22937, 2…
## $ type  <chr> "Repetitive/CNV", "Heterochrom/lo", "Insulator", "Weak_Txn", "We…

Creating a new directory

We can now create a new directory where we can save our .csv files.

new_directory <- "~/Documents/MyWebPage/content/blog/example/my_csv_files/"
dir_create(new_directory)

Spliting our dataframe

We are now ready to split our data frame into multiple .csv files and we can save them under the directory /my_csv_files.

setwd("~/Documents/MyWebPage/content/blog/example/my_csv_files/")
my_data %>%
    group_by(chrom) %>%
    group_split() %>%
    map(
        .f = function(data) {
            write_csv(data, path = unique(data$chrom))
        }
    )

Importing multiple csv files [1]

We can do this in multiple ways, either using the lapply function after generatiing a list or the map function. Let’s go ahead and use the first way :

setwd("~/Documents/MyWebPage/content/blog/example/my_csv_files/")
fileslist = list.files(pattern = "")
csvFiles = lapply(fileslist, function(x)read.table(x, header = T, sep = ","))
csvFiles = do.call("rbind", csvFiles)
csvFiles |> head()
##   chrom start  stop           type
## 1  chr1 10000 10600 Repetitive/CNV
## 2  chr1 10600 11137 Heterochrom/lo
## 3  chr1 11137 11737      Insulator
## 4  chr1 11737 11937       Weak_Txn
## 5  chr1 11937 12137  Weak_Enhancer
## 6  chr1 12137 14537       Weak_Txn

Importing multiple csv files [2]

And the second way is to import the multiple csv files in a list from our directory using the .map function.

directory_that_holds_files <-("~/Documents/MyWebPage/content/blog/example/my_csv_files/")

chromosomes_list <- directory_that_holds_files %>%
  dir_ls() %>%
  map(
    .f = function(path)read.table(path, header = T, sep = ","))

Binding Rows

And the final step is use bind_rows function to make a final data frame.

chromosomes_tbl <- chromosomes_list %>%
    set_names(dir_ls(directory_that_holds_files)) %>%
    bind_rows(.id = "file_path")
head(chromosomes_tbl)
##                                                                            file_path
## 1 /Users/andreasvenizelos/Documents/MyWebPage/content/blog/example/my_csv_files/chr1
## 2 /Users/andreasvenizelos/Documents/MyWebPage/content/blog/example/my_csv_files/chr1
## 3 /Users/andreasvenizelos/Documents/MyWebPage/content/blog/example/my_csv_files/chr1
## 4 /Users/andreasvenizelos/Documents/MyWebPage/content/blog/example/my_csv_files/chr1
## 5 /Users/andreasvenizelos/Documents/MyWebPage/content/blog/example/my_csv_files/chr1
## 6 /Users/andreasvenizelos/Documents/MyWebPage/content/blog/example/my_csv_files/chr1
##   chrom start  stop           type
## 1  chr1 10000 10600 Repetitive/CNV
## 2  chr1 10600 11137 Heterochrom/lo
## 3  chr1 11137 11737      Insulator
## 4  chr1 11737 11937       Weak_Txn
## 5  chr1 11937 12137  Weak_Enhancer
## 6  chr1 12137 14537       Weak_Txn

You can find the csv split files here :

Download FILES

Previous
Next