4 Aggregation
I am the wisest man alive, for I know one thing, and that is that I know nothing.1
Data aggregation stands as a fundamental aspect of data analysis, representing the process by which tidy data (along with metadata) is collected or grouped and subsequently presented in a summarized format. In the realm of bat survey data, aggregation typically occurs through two primary methods:
- Aggregation over a specified time period.
- Spatial aggregation for locations or factors associated with a particular location (e.g., woodland).”
The bat survey data statics in the iBats package has 6,930 bat observations; a random selection of 5 rows is shown in Table 4.1.
Show the code
statics %>% # statics is a tidy data set from the iBats package
sample_n(5) %>%
arrange(DateTime) %>%
# Table made with gt
gt() %>%
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_labels(
columns = c(everything())
)
) %>%
# Make bat scientific name italic
tab_style(
style = list(
cell_text(style = "italic")
),
locations = cells_body(
columns = c(Species)
)
) %>%
tab_options(data_row.padding = px(2)) %>%
cols_align(
align = "left",
columns = DateTime)| Description | DateTime | Species | Longitude | Latitude |
|---|---|---|---|---|
| Static 4 | 2016-06-13 02:04:27 | Pipistrellus pipistrellus | -3.591858 | 50.33123 |
| Static 4 | 2016-08-05 00:09:43 | Pipistrellus pipistrellus | -3.591748 | 50.33136 |
| Static 2 | 2016-08-12 22:23:46 | Pipistrellus pipistrellus | -3.592583 | 50.33323 |
| Static 4 | 2016-08-24 00:12:15 | Pipistrellus pipistrellus | -3.592228 | 50.33091 |
| Static 2 | 2016-08-29 21:37:44 | Pipistrellus pipistrellus | -3.592583 | 50.33323 |
4.1 Count of Bats
The simplest form of aggregation is a count of bats2; as shown in Table 4.2
Show the code
# Libraries (Packages) used
library(tidyverse)
library(iBats)
library(gt)
statics %>%
group_by(Species) %>%
count() %>%
#arrange descending
arrange(desc(n)) %>%
# rename n as count
rename(`Bat Species` = Species, Count = n) %>%
# so table is produced with individual species on one row
ungroup() %>%
# Table made gt()
gt() %>%
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")
),
locations = cells_column_labels(
columns = c(everything())
)
) %>%
# Make bat scientific name italic
tab_style(
style = list(
cell_text(style = "italic")
),
locations = cells_body(
columns = c(`Bat Species`)
)
) %>%
tab_options(data_row.padding = px(2)) | Bat Species | Count |
|---|---|
| Pipistrellus pipistrellus | 4972 |
| Myotis spp. | 435 |
| Barbastella barbastellus | 419 |
| Nyctalus noctula | 315 |
| Pipistrellus spp. | 283 |
| Rhinolophus ferrumequinum | 160 |
| Plecotus spp. | 136 |
| Pipistrellus pygmaeus | 86 |
| Rhinolophus hipposideros | 73 |
| Pipistrellus nathusii | 32 |
| Eptesicus serotinus | 12 |
| Nyctalus leisleri | 3 |
| Myotis nattereri | 2 |
| Nyctalus spp. | 2 |
More informative tables can be made by adding meta data such as the Month and Year of the observations. This is easy done on a tidy data conforming to the minimal data requirement with the date_time_info function in the iBats package. Table 4.3 shows the monthly count (bat passes) of Annex II species3; .
Show the code
# Libraries (Packages) used
library(tidyverse)
library(iBats)
library(flextable)
library(officer)
# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)
# Vector of month names used in the factor function
month_levels_full <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
# re-order months column with rev so months run top to bottom of the table
statics_plus <- statics_plus %>%
mutate(MonthFull = factor(MonthFull, levels = rev(month_levels_full), ordered = T))
AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
table_border <- fp_border(color = "black", width = 1) # from library(officer)
statics_plus %>%
filter(Species %in% AnnexII) %>%
group_by(MonthFull, Species) %>%
count() %>%
# arrange descending
arrange(desc(MonthFull)) %>%
# rename n as count
rename(Month = MonthFull, `Bat Species` = Species, Count = n) %>%
# Table made with flextable
flextable(col_keys = colnames(.)) %>%
italic(j = 2, italic = TRUE, part = "body") %>%
fontsize(part = "header", size = 12) %>%
fontsize(part = "body", size = 12) %>%
colformat_double(j = "Count", digits = 4, big.mark = ",") %>%
width(j = 1, width = 2) %>%
width(j = 2, width = 2.5) %>%
width(j = 3, width = 1) %>%
merge_v(j = 1) %>%
border_inner_h(part = "body", border = table_border) %>%
hline_bottom(part = "body", border = table_border) %>%
bg(bg = "black", part = "header") %>%
color(color = "white", part = "header")Month | Bat Species | Count |
|---|---|---|
May | Barbastella barbastellus | 7 |
Rhinolophus ferrumequinum | 9 | |
Rhinolophus hipposideros | 10 | |
June | Barbastella barbastellus | 198 |
Rhinolophus ferrumequinum | 47 | |
Rhinolophus hipposideros | 11 | |
July | Barbastella barbastellus | 49 |
Rhinolophus ferrumequinum | 7 | |
Rhinolophus hipposideros | 4 | |
August | Barbastella barbastellus | 109 |
Rhinolophus ferrumequinum | 31 | |
Rhinolophus hipposideros | 23 | |
September | Barbastella barbastellus | 9 |
Rhinolophus ferrumequinum | 64 | |
Rhinolophus hipposideros | 17 | |
October | Barbastella barbastellus | 47 |
Rhinolophus ferrumequinum | 2 | |
Rhinolophus hipposideros | 8 |
Tables of numbers can be frustrating to read. To help readability the table can be annotated highlighting the count on a colour scale linked to the magnitude of the result. This is shown in Table 4.4 for the count of bats shown in Table 4.3.
Show the code
# Libraries (Packages) used
library(tidyverse)
library(flextable)
library(officer)
library(iBats)
# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)
# Vector of month names used in the factor function
month_levels_full <- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
# re-order months column with rev so months run top to bottom of the table
statics_plus <- statics_plus %>%
mutate(MonthFull = factor(MonthFull, levels = rev(month_levels_full), ordered = T))
AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
table_border <- fp_border(color = "black", width = 1) # from library(officer)
table_data <- statics_plus %>%
filter(Species %in% AnnexII) %>%
group_by(MonthFull, Species) %>%
count() %>%
# arrange descending
arrange(desc(MonthFull)) %>%
# rename n as count
rename(Month = MonthFull, `Bat Species` = Species, Count = n)
# Find max and min counts
maxCount <- max(table_data$Count, na.rm = T)
minCount <- min(table_data$Count, na.rm = T)
#Make coloured palette
colourer <- scales::col_numeric(
palette = c("transparent", "orangered2"),
domain = c(minCount, maxCount))
# Table made with flextable
table_data %>%
flextable(col_keys = colnames(.)) %>%
italic(j = 2, italic = TRUE, part = "body") %>%
fontsize(part = "header", size = 12) %>%
fontsize(part = "body", size = 12) %>%
colformat_double(j = "Count", digits = 4, big.mark = ",") %>%
width(j = 1, width = 2) %>%
width(j = 2, width = 2.5) %>%
width(j = 3, width = 1) %>%
merge_v(j = 1) %>%
# Scale colour to count
bg(bg = colourer, j = "Count", part = "body") %>%
border_inner_h(part = "body", border = table_border) %>%
hline_bottom(part = "body", border = table_border) %>%
bg(bg = "black", part = "header") %>%
color(color = "white", part = "header")Month | Bat Species | Count |
|---|---|---|
May | Barbastella barbastellus | 7 |
Rhinolophus ferrumequinum | 9 | |
Rhinolophus hipposideros | 10 | |
June | Barbastella barbastellus | 198 |
Rhinolophus ferrumequinum | 47 | |
Rhinolophus hipposideros | 11 | |
July | Barbastella barbastellus | 49 |
Rhinolophus ferrumequinum | 7 | |
Rhinolophus hipposideros | 4 | |
August | Barbastella barbastellus | 109 |
Rhinolophus ferrumequinum | 31 | |
Rhinolophus hipposideros | 23 | |
September | Barbastella barbastellus | 9 |
Rhinolophus ferrumequinum | 64 | |
Rhinolophus hipposideros | 17 | |
October | Barbastella barbastellus | 47 |
Rhinolophus ferrumequinum | 2 | |
Rhinolophus hipposideros | 8 |
4.2 Summary Statistics
Summary statistics for the Common pipistrelle (Pipistrellus pipistrellus) observations at all static locations for each night is shown in Table 4.5; to make the table easier to read, the median column is highlighted with a graduated colour scale. The summary statistics are calculated with the favstats function from the mosaic package.
Show the code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)
# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)
# Group by Description and Night and Count the Observations
grouped_data <- statics_plus %>%
filter(Species == "Pipistrellus pipistrellus") %>%
group_by(Description, Night) %>%
tally()
# The summary statistics are saved into a variable riven_cond_stats
cond_stats <- favstats(n~Description, data = grouped_data)
# riven_cond_stats is made into a the table (using the code below)
cond_stats %>%
# Create the table with the gt package
gt() %>%
# Style the header to black fill and white text
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")),
locations = cells_column_labels(
columns = c(everything())
)
) %>%
gt_color_rows(median, palette = "ggsci::yellow_material") %>%
tab_options(data_row.padding = px(2)) | Description | min | Q1 | median | Q3 | max | mean | sd | n | missing |
|---|---|---|---|---|---|---|---|---|---|
| Static 1 | 1 | 1.0 | 2 | 3.5 | 73 | 6.391304 | 15.570190 | 23 | 0 |
| Static 2 | 1 | 6.0 | 10 | 17.5 | 42 | 12.031746 | 8.820738 | 63 | 0 |
| Static 3 | 1 | 3.5 | 6 | 11.0 | 40 | 9.740741 | 10.323955 | 27 | 0 |
| Static 4 | 10 | 33.5 | 78 | 155.5 | 438 | 115.129032 | 121.519749 | 31 | 0 |
| Static 5 | 1 | 4.0 | 7 | 9.0 | 31 | 8.103448 | 6.586484 | 29 | 0 |
Table 4.6 shows the summary statistics for Annex II Bat Species4 observations by month; with the maximum number of nightly bat passes highlighted. The table can be grouped by Month through switching Common and MonthFull names in the code below; this alternative table is shown in Table 4.7 with the mean (average) number of nightly bat passes highlighted.
Show the code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)
# Annex II Bats (in the static)
AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)
# List of bat common names and the scientific names
BatCommon <- c(
"Barbastella barbastellus" = "Barbastelle",
"Rhinolophus ferrumequinum" = "Greater horseshoe",
"Rhinolophus hipposideros" = "Lesser horseshoe")
# From Scientific name create a Common Name Vector
statics_plus$Common <- unname(BatCommon[statics_plus$Species])
# Group by Description and Night and Count the Observations
grouped_data <- statics_plus %>%
filter(Species %in% AnnexII) %>%
group_by(Common, MonthFull, Night) %>%
tally() %>%
summarise(Minimum = round(min(n, na.rm = T), 2),
Q1 = round(quantile(n, c(0.25), na.rm = T), 2),
Mean = round(mean(n, na.rm = T), 2),
Median = round(median(n, na.rm = T), 2),
Q3 = round(quantile(n, c(0.75), na.rm = T), 2),
Maximum = round(max(n, na.rm = T), 2),
SD = round(sd(n, na.rm = T), 2),
Nr = n())
# riven_cond_stats is made into a the table (using the code below)
grouped_data %>%
# Create the table with the gt package
gt(rowname_col = "MonthFull",
groupname_col = "Common") %>%
# Style the header to black fill and white text
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")),
locations = cells_column_labels(
columns = c(everything())
)
) %>%
tab_style(
style = list(
cell_fill(color = "midnightblue"),
cell_text(color = "white"),
cell_text(weight = "normal")
),
locations = cells_body(
columns = MonthFull
)) %>%
gt_color_rows(Maximum, palette = "ggsci::purple_material") %>%
tab_options(data_row.padding = px(2)) | Minimum | Q1 | Mean | Median | Q3 | Maximum | SD | Nr | |
|---|---|---|---|---|---|---|---|---|
| Barbastelle | ||||||||
| May | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
| June | 1 | 1.50 | 18.00 | 18.0 | 31.50 | 42 | 15.67 | 11 |
| July | 4 | 5.75 | 8.17 | 8.0 | 10.25 | 13 | 3.43 | 6 |
| August | 1 | 3.00 | 7.27 | 6.0 | 8.00 | 22 | 6.13 | 15 |
| September | 1 | 1.00 | 1.80 | 1.0 | 2.00 | 4 | 1.30 | 5 |
| October | 4 | 5.00 | 6.71 | 6.0 | 7.00 | 13 | 2.98 | 7 |
| Greater horseshoe | ||||||||
| May | 2 | 2.50 | 3.00 | 3.0 | 3.50 | 4 | 1.00 | 3 |
| June | 1 | 1.50 | 3.13 | 3.0 | 4.00 | 7 | 1.81 | 15 |
| July | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
| August | 1 | 1.00 | 2.07 | 2.0 | 3.00 | 6 | 1.39 | 15 |
| September | 3 | 3.25 | 10.67 | 4.0 | 6.25 | 43 | 15.91 | 6 |
| October | 1 | 1.00 | 1.00 | 1.0 | 1.00 | 1 | 0.00 | 2 |
| Lesser horseshoe | ||||||||
| May | 1 | 1.00 | 2.50 | 1.5 | 3.00 | 6 | 2.38 | 4 |
| June | 1 | 1.00 | 1.38 | 1.0 | 2.00 | 2 | 0.52 | 8 |
| July | 1 | 1.50 | 2.00 | 2.0 | 2.50 | 3 | 1.41 | 2 |
| August | 1 | 1.00 | 2.30 | 1.5 | 3.00 | 6 | 1.70 | 10 |
| September | 1 | 1.50 | 2.43 | 3.0 | 3.00 | 4 | 1.13 | 7 |
| October | 1 | 1.00 | 2.67 | 1.0 | 3.50 | 6 | 2.89 | 3 |
Show the code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)
# Annex II Bats (in the static)
AnnexII <- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
# Add data and time information to the statics data using the iBats::date_time_info
statics_plus <- iBats::date_time_info(statics)
# List of bat common names and the scientific names
BatCommon <- c(
"Barbastella barbastellus" = "Barbastelle",
"Rhinolophus ferrumequinum" = "Greater horseshoe",
"Rhinolophus hipposideros" = "Lesser horseshoe")
# From Scientific name create a Common Name Vector
statics_plus$Common <- unname(BatCommon[statics_plus$Species])
# Group by Description and Night and Count the Observations
grouped_data <- statics_plus %>%
filter(Species %in% AnnexII) %>%
group_by(Common, MonthFull, Night) %>%
tally() %>%
summarise(Minimum = round(min(n, na.rm = T), 2),
Q1 = round(quantile(n, c(0.25), na.rm = T), 2),
Mean = round(mean(n, na.rm = T), 2),
Median = round(median(n, na.rm = T), 2),
Q3 = round(quantile(n, c(0.75), na.rm = T), 2),
Maximum = round(max(n, na.rm = T), 2),
SD = round(sd(n, na.rm = T), 2),
Nr = n())
# riven_cond_stats is made into a the table (using the code below)
grouped_data %>%
# Create the table with the gt package
gt(rowname_col = "Common",
groupname_col = "MonthFull") %>%
# Style the header to black fill and white text
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")),
locations = cells_column_labels(
columns = c(everything())
)
) %>%
gt_color_rows(Mean, palette = "ggsci::green_material") %>%
tab_options(data_row.padding = px(2)) | Minimum | Q1 | Mean | Median | Q3 | Maximum | SD | Nr | |
|---|---|---|---|---|---|---|---|---|
| May | ||||||||
| Barbastelle | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
| Greater horseshoe | 2 | 2.50 | 3.00 | 3.0 | 3.50 | 4 | 1.00 | 3 |
| Lesser horseshoe | 1 | 1.00 | 2.50 | 1.5 | 3.00 | 6 | 2.38 | 4 |
| June | ||||||||
| Barbastelle | 1 | 1.50 | 18.00 | 18.0 | 31.50 | 42 | 15.67 | 11 |
| Greater horseshoe | 1 | 1.50 | 3.13 | 3.0 | 4.00 | 7 | 1.81 | 15 |
| Lesser horseshoe | 1 | 1.00 | 1.38 | 1.0 | 2.00 | 2 | 0.52 | 8 |
| July | ||||||||
| Barbastelle | 4 | 5.75 | 8.17 | 8.0 | 10.25 | 13 | 3.43 | 6 |
| Greater horseshoe | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
| Lesser horseshoe | 1 | 1.50 | 2.00 | 2.0 | 2.50 | 3 | 1.41 | 2 |
| August | ||||||||
| Barbastelle | 1 | 3.00 | 7.27 | 6.0 | 8.00 | 22 | 6.13 | 15 |
| Greater horseshoe | 1 | 1.00 | 2.07 | 2.0 | 3.00 | 6 | 1.39 | 15 |
| Lesser horseshoe | 1 | 1.00 | 2.30 | 1.5 | 3.00 | 6 | 1.70 | 10 |
| September | ||||||||
| Barbastelle | 1 | 1.00 | 1.80 | 1.0 | 2.00 | 4 | 1.30 | 5 |
| Greater horseshoe | 3 | 3.25 | 10.67 | 4.0 | 6.25 | 43 | 15.91 | 6 |
| Lesser horseshoe | 1 | 1.50 | 2.43 | 3.0 | 3.00 | 4 | 1.13 | 7 |
| October | ||||||||
| Barbastelle | 4 | 5.00 | 6.71 | 6.0 | 7.00 | 13 | 2.98 | 7 |
| Greater horseshoe | 1 | 1.00 | 1.00 | 1.0 | 1.00 | 1 | 0.00 | 2 |
| Lesser horseshoe | 1 | 1.00 | 2.67 | 1.0 | 3.50 | 6 | 2.89 | 3 |
Socrates in Plato’s Republic (circa 380 BC)↩︎
note:- in this case it is a count of bat passes↩︎
https://ec.europa.eu/environment/nature/conservation/species/habitats_dir_en.htm↩︎
https://ec.europa.eu/environment/nature/conservation/species/habitats_dir_en.htm↩︎