Thursday, September 4, 2014

Collecting Camera (EXIF) Data through R

Slowly, but surely, I'm working on a way to organize all my photos on the various folders on my computer.  It's a mess.  I've got folders with thousands of pictures, copies of folders, files with different names, etc.

I've been trying to come up with a way to create a nice, organized folder with only one copy of each of my photos with them all organized into subfolders.  It's tricky, though, because I don't really know what I have.

Enter EXIF data.  This is data that the camera applies to each file it creates when it captures a photo.  There's potentially a lot of data available, but it depends on the camera manufacturer.  This can include the photos creation date, the dimensions of the photo, the camera make and model, and gps data, such as latitude, longitude and altitude.

There's supposedly a unique ID value cameras can assign, which is supposed to be globally unique.  Unfortunately, most of the pictures I took didn't have that available.  However, there's plenty of other data that I can combine to check for uniqueness.

There's this great, free tool that you can use to read EXIF data.  You use the tool through a command prompt window.  This really works great since you can interact with a command prompt window using R.

Building on what I learned from this blog post, I built a function that I can utilize to go through a bunch of my photo files.  It's pretty basic.

#This function calls exiftool, a command line application that returns exif data from photo files, searches the file for pertinent data, and returns those values in a

#vector.  If nothing is found, ‘UNKNOWN’ is returned.


getexifdata <- function(filename){

cmd <- paste('exiftool -c ' ,shQuote('%.6f'), shQuote(filename)) #create the MSDOS command we’ll be using.

exifdata <-  system(cmd,intern=T)  #use the MSDOS command using the system function.


#the system command that calls exiftool returns a vector.  Each line consists of one property value of the photo.  It’s starts with the name of the property and the actual value starts at the 35th character.

#these next few lines are searching the returned vector for specific property value using the name found at the beginning.  

#If found, collect the property value starting at character 35.  Since there can be multiple matches, collect only the first item found. Search different possible labels as camera companies name stuff differently.


imageheight <- substring(exifdata [grep('^Exif Image Height        |Image Height  ',exifdata )[1]],35,nchar(exifdata [grep('^Exif Image Height       |Image Height  ',exifdata )[1]]))

imagewidth <- substring(exifdata [grep('^Exif Image Width      |Image Width ',exifdata )[1]],35,nchar(exifdata [grep('^Exif Image Width      |Image Width ',exifdata )[1]]))

gpslatitude <- substring(exifdata [grep('^GPS Latitude      ',exifdata )[1]],35,nchar(exifdata [grep('^GPS Latitude       ',exifdata )[1]]))

gpslongitude <- substring(exifdata [grep('^GPS Longitude      ',exifdata )[1]],35,nchar(exifdata [grep('^GPS Longitude       ',exifdata )[1]]))

cameramodel <- substring(exifdata [grep('^Camera Model Name      ',exifdata )[1]],35,nchar(exifdata [grep('^Camera Model Name       ',exifdata )[1]]))

createdate <- substring(exifdata [grep('^Create Date      |File Creation Date',exifdata )[1]],35,nchar(exifdata [grep('^Create Date      |File Creation Date',exifdata )[1]]))


#If no value is found, NA is returned. Set to ‘UNKNOWN’


if (is.na(imagewidth)){imagewidth <- 'UNKNOWN'}

if (is.na(imageheight)){imageheight <- 'UNKNOWN'}

if (is.na(gpslatitude)){gpslatitude <- 'UNKNOWN'}

if (is.na(gpslongitude)){gpslongitude <- 'UNKNOWN'}

if (is.na(cameramodel)){cameramodel <- 'UNKNOWN'}

if (is.na(createdate)){createdate <- 'UNKNOWN'}

#return values as a vector.

return(c(imagewidth,imageheight, gpslatitude, gpslongitude,cameramodel, createdate))


}


NOTE: Your mileage may definitely vary on this one.  Camera models assign labels for their data differently.  If you're like me and you have pictures in your folders from lots of different camera manufacturers, you've got to take that into account and change the labels you are looking for. Some also put in their EXIF data values that others don't.

Another good place to get data on photo files is with the file.info function.  I talk about that here.