Appendix A — Resources

There are many approaches to Bat Data Science, this section references the resources utilized to make these web pages. The references provide the background to the example code and with the web pages reveal how it can be extended or adapted to generate your own reports on bat surveys. All this can be undertaken with the R statistical programming language (R Core Team 2023) through RStudio (Posit team 2022); the materials applied, software and data, are open source.

A prime resource for learning Bat Data Science with R are the the online books1, a comprehensive guide to these books and other R-resources is the Big Book of R by Oscar Baruffa. This page references the online books, packages, websites and other resources with a focus on Bat Data Science:

A.1 General

R for Data Science (R4DS) is an excellent overview of data science with R; it introduces the tidyverse a collection of packages providing essential data science tools (Wickham et al. 2019; Wickham 2023b); many of these individual packages are referenced below. The tidyverse packages have been widely adopted by R Data Scientists; all packages share an underlying design philosophy, grammar, and data structures.

There are many free and other learning resources online; well structured courses are:

Other references:

Modern Data Science with R by Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton; a comprehensive guide to data science with R.

For an understanding of the Data Science versus Statistics (many argue they are the same) see David Donoho paper 50 Years of Data Science (Donoho 2017).

Understanding the link between digital skills and data science see the Royal Statistical Society article.

Why spreadsheets aren’t great for data science listen to Tim Harford’s More or less on BBC sounds. For a litany of mathematical mistakes, many involving spreadsheets, see Matt Parker’s book Humble Pi A comedy of Maths Errors (Parker 2019).

A comparison between R and Excel for data wrangling, conveying the advantages of R, has been undertaken by jumping rivers. Interestingly their blog also has a post on learning Excel as an R user; a good read for Excel users.

A.2 Tidy Data

Getting data into R from csv and Excel files can be done with readr (Wickham, Hester, and Bryan 2023) and readxl (Wickham and Bryan 2023) packages respectively. See also Data import in R4DS.

Once loaded in the R environment it is stored as a tibble (Müller and Wickham 2023). A Tibble is tabulated data, in R terms a simplified data frame, making working in the tidyverse a little easier.

Data wrangling is made easy with functions from the dplyr (Wickham, François, et al. 2023) and tidyr (Wickham, Vaughan, and Girlich 2023) packages. See also Tidy Data and Data transformation in R4DS.

The manipulation of text was through the stringr (Wickham 2023a) package. See also Strings in R4DS.

The philosophy of tidy data is described by (Wickham 2014) (Tierney and Cook 2023).

Data validation is made effective through the validate (van der Loo and de Jonge 2023) package.

A.3 Meta Data

The computation with dates and times can be accomplished with lubridate (Spinu, Grolemund, and Wickham 2023) package. See also Dates and times in R4DS.

Suntimes can be obtained with the suncalc (Thieurmel and Elmarhraoui 2022) package.

The hms (Müller 2023) package provides a simple class for storing durations or time-of-day values and displaying them in the hh:mm:ss format.

The rnrfa (Vitolo 2022; Vitolo et al. 2016) package has a useful function osg_parse() for converting British National Grid (BNG) to latitude and longitude in the WSGS84 (Google Earth) coordinate system (EPSG code: 4326).

A.4 Aggregation

Tables have been produced with gt (Iannone et al. 2023), gtExtras (Mock 2023) and the flextable (Gohel and Skintzos 2023) packages.

The broman (Broman 2022) package provided some useful R functions.

The glue (Hester and Bryan 2022) package that allows variables to be passed directly into strings.

A.5 Visualisation

The graphics have been produced using the R package ggplot2 (Wickham, Chang, et al. 2023).

There are many packages that extend ggplot’s capability:

Online books:

  • ggplot2 by Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen; helps understand how ggplot works, giving the power to tailor any plot specifically.
  • Fundamentals of Data Visualization by Claus O. Wilke; aims to provide a guide to making visualizations that reflect the data, tell a story, and look professional.

See also Graphics for communication in R4DS.

Colour can play a large part in visualisation and colours are easily misused; for an understanding of the issues see the paper Misuse of Colour in Science Communication (Crameri, Shephard, and Heron 2020).

The UK civil servants working in government analysis have produced constructive guidance on data visualisation through charts.

A.6 Maps

The excellent online book Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow. It teaches a range of spatial skills, including: reading, writing and manipulating geographic data; making static and interactive maps; applying geocomputation to solve real-world problems; and modelling geographic phenomena.

sf (Pebesma 2018, 2023) provides support for simple features, a standardized way to encode spatial vector data2.

ggspatial (Dunnington 2023) allows spatial data to be plotted with the power of the ggplot2. It also gives access to Open Street Map tiles.

osmdata (Padgham et al. 2023) is an R package for downloading and using data from OpenStreetMap (OSM). Unlike the ggspatial package, which facilitates the download of raster tiles, osmdata provides access to the vector data underlying OSM.

elevatr (Hollister 2023) a package for accessing elevation data from various sources.

terra (Hijmans 2023b) a package of methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data.

tanaka (Giraud 2022) a package the performs the Tanaka method enhancing the representation of topography on a map using shaded contour lines.

metR (Campitelli 2023) a package with several functions and utilities that make R better for handling meteorological data; used here for conour plots.

raster (Hijmans 2023a) a package for reading, writing, manipulating, analyzing and modeling of spatial data.

rnaturalearth (Massicotte and South 2023) A package with Natural Earth data including world and country maps.

A.7 Statistics

R, specifically base R (R Core Team 2023) is a comprehensive software environment for statistical computing and graphics.

Summary statistics have been produced with the mosaic (Pruim, Kaplan, and Horton 2023) package.

broom (Robinson, Hayes, and Couch 2023) a package that takes the messy output of built-in functions in R and turns them into tidy tibbles; these can be easily tabulated.

dunn.test (Dinno 2017) a package that performs Dunn’s test of multiple comparisons using rank sums.

infer (Bray et al. 2023) a package for statistical inference that coheres with the tidyverse design framework, for example bootstrapping.

vegan (Oksanen et al. 2022) package of ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Online books:

Introduction to Modern Statistics by Mine Çetinkaya-Rundel and Johanna Hardin; an contemporary guide to statistical thinking and methods.

Statistical Inference via Data Science by Chester Ismay and Albert Y. Kim; Statistical Inference via Data Science: A ModernDive into R and the Tidyverse.

Modern Statistics with R by Måns Thulin; covers wrangling and exploring data to inference and predictive modelling.

Other references:

The Office for Statistics Regulation the independent regulatory arm of the UK Statistics Authority has produced two key reference documents that have relevance for data scientists who publish in the public domain3

A.8 Reporting

Reports can be produced through literate programming (Knuth 1984) with R Markdown (Allaire et al. 2023; Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020) and Quarto®; to use Quarto with R, the rmarkdown R package should be installed; the rmarkdown package will also install the knitr package (Xie 2014, 2015, 2023) to ensure documents render containing R code.

Rending reports into Microsoft Word or PowerPoint can be greatly enhanced by:

officedown (Gohel and Ross 2023) a package facilitating the formatting of Microsoft Word documents produced by R Markdown.

officer (Gohel 2023) a package that lets R users manipulate Word .docx and PowerPoint *.pptx documents.

Online books:

officeverse by David Gohel; reporting from R with the packages officer, officedown}, flextable.

R Markdown Cookbook by Yihui Xie, Christophe Dervieux, Emily Riederer; a book designed to provide a range of examples on how to extend the functionality of R Markdown documents.

R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund; details the large number of tasks that you could do with R Markdown.

A special mention should go to John MacFarlane who created Pandoc a package to convert Markdown/RMarkdown documents (and many other types of documents) to a large variety of output formats.

Online videos:

R Markdown Advanced Tips to Become a Better Data Scientist… | With Tom Mock

Welcome to Quarto Workshop! | Led by Tom Mock, RStudio

A.9 Interactive

leaflet (Cheng et al. 2023) one of the most popular open-source JavaScript libraries for interactive maps.

plotly (Sievert et al. 2023) R graphing library that makes interactive, publication-quality graphs.

DT (Xie, Cheng, and Tan 2023) DataTables displaying R matrices or data frames as interactive HTML tables that support filtering, pagination, and sorting.

reactable (Lin 2023) nteractive data tables for R.

See also htmlwidgets for R

References


  1. Nearly all the online reference books on R are created within the R environment; most commonly with RMarkdown or Quarto.↩︎

  2. see https://en.wikipedia.org/wiki/Simple_Features↩︎

  3. In support of a planning application for example.↩︎