Wednesday, July 16, 2014

Get a custom file list with R

I'm working on a way to better manage my files on my computer.  I've got tons of duplicate photos and mp3s that I've copied to remote drives.  Additionally, I've got folders with hundreds of poorly named files.  Opening one of these folders is a nightmare if I happen to open one with a thumbnail view!

Because I've been playing around with it recently, I thought I would see what R could do to help me.  I've got a ways to go, but it looks like R has a great function that can put the specifics of a file within a data set, file.info.

file.info("C:\\users\\doug shartzer\\messy folder\\song1.mp3")

Additionally, you can provide file.info with more than one file at a time within a vector.  In fact, you can pass it an entire folder using the dir function on a folder, which will create a vector containing all the files within the provided folder.  Be sure to use the full.names argument on the dir function to get the full path, which file.info needs.

file.info(dir("c:\\users\\doug shartzer\\messy folder\\",full.names=T))

What's even better?  You can provide the dir function with a vector of folder names to get a giant dataset of file details with just one line of code.  It also has a recursive argument that'll include files within each folder's subfolder.  

file.info(dir(c("c:\\","d:\\","e:\\"), full.names=T, recursive=T)

Lastly, dir also allows you to limit your list further by accepting regular expressions in its pattern argument to limit the list of files returned. I can limit my list of files to just pictures and music files. 

 file.info(dir(c('c:\\','d:\\','e:\\'),recursive=T,full.names=T,pattern='+mpg$|+mp3$|+jpg$'))

So, building a list of the files I've got to go through appears to be a breeze!  Now I've got to figure out where to go from here...