FAQ’s

Introduction

Below is a list of common issues/errors that you might come across when using the workflow we provide. It’s not exhaustive, but hopefully helpful for solving problems that are likely to occur if you are analysing your own data.

1 Getting the workflow up and running

GitHub issues

Using GitHub is not essential for using this workflow — you can download the repo from > Download Zip, unzip to your machine, and start by opening the .Rproj file.

However, if you want to contribute improvements to the repo, or are just interested in getting to grips with GitHub for your own research projects, there are great resources online for getting started (see: setting up GitHub | cloning a repo | using GitHub with RStudio).

Using RStudio projects and `here` for reproducible filepaths

When we open a .Rproj file (which should be the starting point for this workflow), this opens up a fresh instance of RStudio, with a dedicated project environment and access to all the folders and files contained inside that project folder. If you’re not familiar with using RStudio projects, have a look at this section of R for Data Science for a quick explained.

By combining Rstudio projects with the here package, we can create relative filepaths that work across machines, so you won’t have to edit a complicated filepath to get the code to run (for more info see the ‘Getting started’ section of the here documentation).

2 Reporting in a paper

How to report usage of this workflow in a paper

If you have used this workflow and you wish to report this in a scientific publication then we provide the text below as a suggestion as to how the user might do so.

“Animal tracking data in this study was cleaned following the key principles of clean data in Langley et al. (2023). Duplicated and erroneous data points were removed, a maximum speed filter of XXm/s applied and data YYhours after initial deployment removed”

3 Data problems

What if I have multiple data files for an individual?

Lots of data sets include multiple files for a single individual, when individuals are tracked more than once. The workflow can filter raw tracking data over multiple deployments. For it to work, make sure to check:

The raw data contain individual ID, read in from the file name
The metadata includes one row per unique tag deployment per individual
The metadata contains a column with a unique value for each deployment (e.g., DeployID/Year)
This identifying column is included in df_metadataslim

When joining the raw data and metadata, the raw data will be duplicated for each row of ID in the metadata (i.e., for each DeployID/Year). When we filter the data to the deployment period only (df_clean) the duplicated rows, where tracking DateTimes are outside of the individual’s deployment period, are removed.

An example of how to use multiple ID columns in this way is included in the troubleshooting script, named Troubleshoot - multiple ID columns.R.

What if my tags have been re-deployed on multiple individuals?

Sometimes tags are used multiple times on different individuals, but data are downloaded in a single file per tag. It is much better practice to run all processing in R, rather than splitting the files prior to this workflow. With a couple of tweaks, the workflow can filter raw tracking data files containing multiple deployments:

Make sure TagID is included in the file name, instead of individual ID
When reading in the raw data, replace ID with TagID
Make sure the metadata includes one row per unique tag deployment per individual
Make sure the metadata file and df_metadataslim contain both ID and TagID
When combining raw and metadata, left_join by TagID, instead of ID:

df_metamerged <- df_raw %>%
  left_join(., df_metadataslim, by="TagID")

When joining the raw data and metadata, the raw data will be duplicated for each row of TagID in the metadata (i.e., for each ID). When we filter the data to the deployment period only (df_clean) the duplicated rows, where tracking DateTimes are outside of the individual’s deployment period, are removed.

An example of how to use multiple ID columns in this way is included in the troubleshooting script, named Troubleshoot - multiple ID columns.R.

4 Geographic projection issues

What is a CRS?

Coordinate Reference Systems (or CRS) are the framework we use to represent locations from a spherical earth on a two-dimensional plot. Usually we want to “project” these 3D points to a 2D plot in different ways, depending on the range and scale of our spatial data, and we use CRS codes to do this easily. Two common ESPG codes we use in our workflow are LatLon (4326) and Spherical mercator/WGS (3857), but you can also use the crssuggest package to automatically determine the appropriate CRS for your data.

CRS for sf objects are usually defined at creation (e.g. the crs = argument of st_as_sf), but can also be retrieved or replaced with st_crs, and transformed from one projection to another with st_transform.

More information on these concepts can be found in the sf documentation (see: The flat earth model | spherical geometry | miscellaneous)

What does this st_intersects warning mean?

## although coordinates are longitude/latitude, st_intersects assumes that they are planar

This warning message relates to the way that sf deals with geographic coordinates and how it assumes they are projected. For some types of geometric operations, it is very important to be aware of the current projection (see this r-spatial post for a nice explanation of the issue). For more general information on sf error messages, see the sf misc page.

5 Timezones

How do I find a list of timezone codes?

To find the code for a specific timezone, you can search the full list of tz’s that R supports by using Olsonnames():

head(OlsonNames())

[1] "Africa/Abidjan"     "Africa/Accra"       "Africa/Addis_Ababa"
[4] "Africa/Algiers"     "Africa/Asmara"      "Africa/Asmera"

Lists of supported timezone codes can also be found online (e.g. wikipedia

Converting between timezones

If you have data from different timezones and you want to convert everything to a common format, you can use with_tz from the lubridate package (example below taken from the function page):

x <- ymd_hms("2009-08-07 00:00:01", tz = "America/New_York")
with_tz(x, "GMT") #converts US DatTime to UK DateTime

[1] "2009-08-07 04:00:01 GMT"

6 Resource limitations

Vector memory exhausted

When you’re working with thousands or even millions of tracking data points, sometimes you might come across the following message:

Error: vector memory exhausted (limit reached?)

This essentially means you’ve run out of memory (RAM), and R like a big baby is refusing to carry on. While we’d recommend first trying code testing and optimization, an easy brute-force solution is to increase the amount of memory R can access, by allocating some of your disk alongside existing RAM. We can do this by editing the .Renviron file (which R reads when it starts up):

library(usethis) #usethis is a package with a selection of handy functions
usethis::edit_r_environ() #this function allows us to directly edit the .Renviron file from R Studio

Then add R_MAX_VSIZE=32Gb as a line to the .Renviron file that opens up (32Gb can be changed to whatever you want, but make sure this value includes the amount of RAM on your machine), then save it and restart the R session for changes to take effect. To return to default allocation, run the code a second time and remove the line you added. More detail (and the source of the above solution) can be found on Stackoverflow.

7 P.S: Why isn’t ExMove a package?

As sp is gradually replaced by sf, and tidyverse gains traction in the R community, researchers (ourselves included) working with animal movement data are in a position where they need to either update their existing code, or find access to modern learning resources. Unfortunately, there are only limited resources that show how use sf and tidyverse for animal movement in this way. Rather than create a new package (given that sf provides nearly all the required functions already), we opted to provide this code directly, in the same workflow format that we’d use it. This is so users can see and understand what is happening at each step, and can modify the workflow to suit their own needs. With the increasing importance of reproducibility in the scientific community, we hope that an open-access approach will help to make animal movement analysis more accessible and transparent.