<- df_raw %>%
df_metamerged left_join(., df_metadataslim, by="TagID")
FAQ’s
Introduction
Below is a list of common issues/errors that you might come across when using the workflow we provide. It’s not exhaustive, but hopefully helpful for solving problems that are likely to occur if you are analysing your own data.
1 Getting the workflow up and running
GitHub issues
Using GitHub is not essential for using this workflow — you can download the repo from > Download Zip
, unzip to your machine, and start by opening the .Rproj
file.
However, if you want to contribute improvements to the repo, or are just interested in getting to grips with GitHub for your own research projects, there are great resources online for getting started (see: setting up GitHub | cloning a repo | using GitHub with RStudio).
Using RStudio projects and here
for reproducible filepaths
When we open a .Rproj
file (which should be the starting point for this workflow), this opens up a fresh instance of RStudio, with a dedicated project environment and access to all the folders and files contained inside that project folder. If you’re not familiar with using RStudio projects, have a look at this section of R for Data Science for a quick explained.
By combining Rstudio projects with the here
package, we can create relative filepaths that work across machines, so you won’t have to edit a complicated filepath to get the code to run (for more info see the ‘Getting started’ section of the here
documentation).
2 Reporting in a paper
How to report usage of this workflow in a paper
If you have used this workflow and you wish to report this in a scientific publication then we provide the text below as a suggestion as to how the user might do so.
“Animal tracking data in this study was cleaned following the key principles of clean data in Langley et al. (2023). Duplicated and erroneous data points were removed, a maximum speed filter of XXm/s applied and data YYhours after initial deployment removed”
3 Data problems
What if I have multiple data files for an individual?
Lots of data sets include multiple files for a single individual, when individuals are tracked more than once. The workflow can filter raw tracking data over multiple deployments. For it to work, make sure to check:
- The raw data contain individual
ID
, read in from the file name - The metadata includes one row per unique tag deployment per individual
- The metadata contains a column with a unique value for each deployment (e.g.,
DeployID
/Year
) - This identifying column is included in
df_metadataslim
When joining the raw data and metadata, the raw data will be duplicated for each row of ID in the metadata (i.e., for each DeployID
/Year
). When we filter the data to the deployment period only (df_clean
) the duplicated rows, where tracking DateTimes are outside of the individual’s deployment period, are removed.
An example of how to use multiple ID columns in this way is included in the troubleshooting script, named Troubleshoot - multiple ID columns.R
.
4 Geographic projection issues
What is a CRS?
Coordinate Reference Systems (or CRS) are the framework we use to represent locations from a spherical earth on a two-dimensional plot. Usually we want to “project” these 3D points to a 2D plot in different ways, depending on the range and scale of our spatial data, and we use CRS codes to do this easily. Two common ESPG codes we use in our workflow are LatLon (4326
) and Spherical mercator/WGS (3857
), but you can also use the crssuggest
package to automatically determine the appropriate CRS for your data.
CRS for sf
objects are usually defined at creation (e.g. the crs =
argument of st_as_sf
), but can also be retrieved or replaced with st_crs
, and transformed from one projection to another with st_transform
.
More information on these concepts can be found in the sf documentation (see: The flat earth model | spherical geometry | miscellaneous)
What does this st_intersects warning mean?
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
This warning message relates to the way that sf
deals with geographic coordinates and how it assumes they are projected. For some types of geometric operations, it is very important to be aware of the current projection (see this r-spatial post for a nice explanation of the issue). For more general information on sf
error messages, see the sf misc page.
5 Timezones
How do I find a list of timezone codes?
To find the code for a specific timezone, you can search the full list of tz’s that R supports by using Olsonnames()
:
head(OlsonNames())
[1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
[4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"
- Lists of supported timezone codes can also be found online (e.g. wikipedia
Converting between timezones
If you have data from different timezones and you want to convert everything to a common format, you can use with_tz
from the lubridate
package (example below taken from the function page):
<- ymd_hms("2009-08-07 00:00:01", tz = "America/New_York")
x with_tz(x, "GMT") #converts US DatTime to UK DateTime
[1] "2009-08-07 04:00:01 GMT"
6 Resource limitations
Vector memory exhausted
When you’re working with thousands or even millions of tracking data points, sometimes you might come across the following message:
Error: vector memory exhausted (limit reached?)
This essentially means you’ve run out of memory (RAM), and R like a big baby is refusing to carry on. While we’d recommend first trying code testing and optimization, an easy brute-force solution is to increase the amount of memory R can access, by allocating some of your disk alongside existing RAM. We can do this by editing the .Renviron file (which R reads when it starts up):
library(usethis) #usethis is a package with a selection of handy functions
::edit_r_environ() #this function allows us to directly edit the .Renviron file from R Studio usethis
Then add R_MAX_VSIZE=32Gb
as a line to the .Renviron file that opens up (32Gb can be changed to whatever you want, but make sure this value includes the amount of RAM on your machine), then save it and restart the R session for changes to take effect. To return to default allocation, run the code a second time and remove the line you added. More detail (and the source of the above solution) can be found on Stackoverflow.
7 P.S: Why isn’t ExMove a package?
As sp
is gradually replaced by sf
, and tidyverse gains traction in the R community, researchers (ourselves included) working with animal movement data are in a position where they need to either update their existing code, or find access to modern learning resources. Unfortunately, there are only limited resources that show how use sf
and tidyverse for animal movement in this way. Rather than create a new package (given that sf
provides nearly all the required functions already), we opted to provide this code directly, in the same workflow format that we’d use it. This is so users can see and understand what is happening at each step, and can modify the workflow to suit their own needs. With the increasing importance of reproducibility in the scientific community, we hope that an open-access approach will help to make animal movement analysis more accessible and transparent.