4 Aggregation
I am the wisest man alive, for I know one thing, and that is that I know nothing.1
Data aggregation stands as a fundamental aspect of data analysis, representing the process by which tidy data (along with metadata) is collected or grouped and subsequently presented in a summarized format. In the realm of bat survey data, aggregation typically occurs through two primary methods:
- Aggregation over a specified time period.
- Spatial aggregation for locations or factors associated with a particular location (e.g., woodland).”
The bat survey data statics
in the iBats
package has 6,930 bat observations; a random selection of 5 rows is shown in Table 4.1.
Show the code
%>% # statics is a tidy data set from the iBats package
statics sample_n(5) %>%
arrange(DateTime) %>%
# Table made with gt
gt() %>%
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")
),locations = cells_column_labels(
columns = c(everything())
)%>%
) # Make bat scientific name italic
tab_style(
style = list(
cell_text(style = "italic")
),locations = cells_body(
columns = c(Species)
)%>%
) tab_options(data_row.padding = px(2)) %>%
cols_align(
align = "left",
columns = DateTime)
Description | DateTime | Species | Longitude | Latitude |
---|---|---|---|---|
Static 4 | 2016-06-13 02:04:27 | Pipistrellus pipistrellus | -3.591858 | 50.33123 |
Static 4 | 2016-08-05 00:09:43 | Pipistrellus pipistrellus | -3.591748 | 50.33136 |
Static 2 | 2016-08-12 22:23:46 | Pipistrellus pipistrellus | -3.592583 | 50.33323 |
Static 4 | 2016-08-24 00:12:15 | Pipistrellus pipistrellus | -3.592228 | 50.33091 |
Static 2 | 2016-08-29 21:37:44 | Pipistrellus pipistrellus | -3.592583 | 50.33323 |
4.1 Count of Bats
The simplest form of aggregation is a count of bats2; as shown in Table 4.2
Show the code
# Libraries (Packages) used
library(tidyverse)
library(iBats)
library(gt)
%>%
statics group_by(Species) %>%
count() %>%
#arrange descending
arrange(desc(n)) %>%
# rename n as count
rename(`Bat Species` = Species, Count = n) %>%
# so table is produced with individual species on one row
ungroup() %>%
# Table made gt()
gt() %>%
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")
),locations = cells_column_labels(
columns = c(everything())
)%>%
) # Make bat scientific name italic
tab_style(
style = list(
cell_text(style = "italic")
),locations = cells_body(
columns = c(`Bat Species`)
)%>%
) tab_options(data_row.padding = px(2))
Bat Species | Count |
---|---|
Pipistrellus pipistrellus | 4972 |
Myotis spp. | 435 |
Barbastella barbastellus | 419 |
Nyctalus noctula | 315 |
Pipistrellus spp. | 283 |
Rhinolophus ferrumequinum | 160 |
Plecotus spp. | 136 |
Pipistrellus pygmaeus | 86 |
Rhinolophus hipposideros | 73 |
Pipistrellus nathusii | 32 |
Eptesicus serotinus | 12 |
Nyctalus leisleri | 3 |
Myotis nattereri | 2 |
Nyctalus spp. | 2 |
More informative tables can be made by adding meta data such as the Month and Year of the observations. This is easy done on a tidy data conforming to the minimal data requirement with the date_time_info
function in the iBats
package. Table 4.3 shows the monthly count (bat passes) of Annex II species3; .
Show the code
# Libraries (Packages) used
library(tidyverse)
library(iBats)
library(flextable)
library(officer)
# Add data and time information to the statics data using the iBats::date_time_info
<- iBats::date_time_info(statics)
statics_plus
# Vector of month names used in the factor function
<- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
month_levels_full
# re-order months column with rev so months run top to bottom of the table
<- statics_plus %>%
statics_plus mutate(MonthFull = factor(MonthFull, levels = rev(month_levels_full), ordered = T))
<- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
AnnexII
<- fp_border(color = "black", width = 1) # from library(officer)
table_border
%>%
statics_plus filter(Species %in% AnnexII) %>%
group_by(MonthFull, Species) %>%
count() %>%
# arrange descending
arrange(desc(MonthFull)) %>%
# rename n as count
rename(Month = MonthFull, `Bat Species` = Species, Count = n) %>%
# Table made with flextable
flextable(col_keys = colnames(.)) %>%
italic(j = 2, italic = TRUE, part = "body") %>%
fontsize(part = "header", size = 12) %>%
fontsize(part = "body", size = 12) %>%
colformat_double(j = "Count", digits = 4, big.mark = ",") %>%
width(j = 1, width = 2) %>%
width(j = 2, width = 2.5) %>%
width(j = 3, width = 1) %>%
merge_v(j = 1) %>%
border_inner_h(part = "body", border = table_border) %>%
hline_bottom(part = "body", border = table_border) %>%
bg(bg = "black", part = "header") %>%
color(color = "white", part = "header")
Month | Bat Species | Count |
---|---|---|
May | Barbastella barbastellus | 7 |
Rhinolophus ferrumequinum | 9 | |
Rhinolophus hipposideros | 10 | |
June | Barbastella barbastellus | 198 |
Rhinolophus ferrumequinum | 47 | |
Rhinolophus hipposideros | 11 | |
July | Barbastella barbastellus | 49 |
Rhinolophus ferrumequinum | 7 | |
Rhinolophus hipposideros | 4 | |
August | Barbastella barbastellus | 109 |
Rhinolophus ferrumequinum | 31 | |
Rhinolophus hipposideros | 23 | |
September | Barbastella barbastellus | 9 |
Rhinolophus ferrumequinum | 64 | |
Rhinolophus hipposideros | 17 | |
October | Barbastella barbastellus | 47 |
Rhinolophus ferrumequinum | 2 | |
Rhinolophus hipposideros | 8 |
Tables of numbers can be frustrating to read. To help readability the table can be annotated highlighting the count on a colour scale linked to the magnitude of the result. This is shown in Table 4.4 for the count of bats shown in Table 4.3.
Show the code
# Libraries (Packages) used
library(tidyverse)
library(flextable)
library(officer)
library(iBats)
# Add data and time information to the statics data using the iBats::date_time_info
<- iBats::date_time_info(statics)
statics_plus
# Vector of month names used in the factor function
<- c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
month_levels_full
# re-order months column with rev so months run top to bottom of the table
<- statics_plus %>%
statics_plus mutate(MonthFull = factor(MonthFull, levels = rev(month_levels_full), ordered = T))
<- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
AnnexII
<- fp_border(color = "black", width = 1) # from library(officer)
table_border
<- statics_plus %>%
table_data filter(Species %in% AnnexII) %>%
group_by(MonthFull, Species) %>%
count() %>%
# arrange descending
arrange(desc(MonthFull)) %>%
# rename n as count
rename(Month = MonthFull, `Bat Species` = Species, Count = n)
# Find max and min counts
<- max(table_data$Count, na.rm = T)
maxCount <- min(table_data$Count, na.rm = T)
minCount
#Make coloured palette
<- scales::col_numeric(
colourer palette = c("transparent", "orangered2"),
domain = c(minCount, maxCount))
# Table made with flextable
%>%
table_data flextable(col_keys = colnames(.)) %>%
italic(j = 2, italic = TRUE, part = "body") %>%
fontsize(part = "header", size = 12) %>%
fontsize(part = "body", size = 12) %>%
colformat_double(j = "Count", digits = 4, big.mark = ",") %>%
width(j = 1, width = 2) %>%
width(j = 2, width = 2.5) %>%
width(j = 3, width = 1) %>%
merge_v(j = 1) %>%
# Scale colour to count
bg(bg = colourer, j = "Count", part = "body") %>%
border_inner_h(part = "body", border = table_border) %>%
hline_bottom(part = "body", border = table_border) %>%
bg(bg = "black", part = "header") %>%
color(color = "white", part = "header")
Month | Bat Species | Count |
---|---|---|
May | Barbastella barbastellus | 7 |
Rhinolophus ferrumequinum | 9 | |
Rhinolophus hipposideros | 10 | |
June | Barbastella barbastellus | 198 |
Rhinolophus ferrumequinum | 47 | |
Rhinolophus hipposideros | 11 | |
July | Barbastella barbastellus | 49 |
Rhinolophus ferrumequinum | 7 | |
Rhinolophus hipposideros | 4 | |
August | Barbastella barbastellus | 109 |
Rhinolophus ferrumequinum | 31 | |
Rhinolophus hipposideros | 23 | |
September | Barbastella barbastellus | 9 |
Rhinolophus ferrumequinum | 64 | |
Rhinolophus hipposideros | 17 | |
October | Barbastella barbastellus | 47 |
Rhinolophus ferrumequinum | 2 | |
Rhinolophus hipposideros | 8 |
4.2 Summary Statistics
Summary statistics for the Common pipistrelle (Pipistrellus pipistrellus) observations at all static
locations for each night is shown in Table 4.5; to make the table easier to read, the median column is highlighted with a graduated colour scale. The summary statistics are calculated with the favstats
function from the mosaic
package.
Show the code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)
# Add data and time information to the statics data using the iBats::date_time_info
<- iBats::date_time_info(statics)
statics_plus
# Group by Description and Night and Count the Observations
<- statics_plus %>%
grouped_data filter(Species == "Pipistrellus pipistrellus") %>%
group_by(Description, Night) %>%
tally()
# The summary statistics are saved into a variable riven_cond_stats
<- favstats(n~Description, data = grouped_data)
cond_stats
# riven_cond_stats is made into a the table (using the code below)
%>%
cond_stats # Create the table with the gt package
gt() %>%
# Style the header to black fill and white text
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")),
locations = cells_column_labels(
columns = c(everything())
)%>%
) gt_color_rows(median, palette = "ggsci::yellow_material") %>%
tab_options(data_row.padding = px(2))
Description | min | Q1 | median | Q3 | max | mean | sd | n | missing |
---|---|---|---|---|---|---|---|---|---|
Static 1 | 1 | 1.0 | 2 | 3.5 | 73 | 6.391304 | 15.570190 | 23 | 0 |
Static 2 | 1 | 6.0 | 10 | 17.5 | 42 | 12.031746 | 8.820738 | 63 | 0 |
Static 3 | 1 | 3.5 | 6 | 11.0 | 40 | 9.740741 | 10.323955 | 27 | 0 |
Static 4 | 10 | 33.5 | 78 | 155.5 | 438 | 115.129032 | 121.519749 | 31 | 0 |
Static 5 | 1 | 4.0 | 7 | 9.0 | 31 | 8.103448 | 6.586484 | 29 | 0 |
Table 4.6 shows the summary statistics for Annex II Bat Species4 observations by month; with the maximum number of nightly bat passes highlighted. The table can be grouped by Month through switching Common
and MonthFull
names in the code below; this alternative table is shown in Table 4.7 with the mean (average) number of nightly bat passes highlighted.
Show the code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)
# Annex II Bats (in the static)
<- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
AnnexII
# Add data and time information to the statics data using the iBats::date_time_info
<- iBats::date_time_info(statics)
statics_plus
# List of bat common names and the scientific names
<- c(
BatCommon "Barbastella barbastellus" = "Barbastelle",
"Rhinolophus ferrumequinum" = "Greater horseshoe",
"Rhinolophus hipposideros" = "Lesser horseshoe")
# From Scientific name create a Common Name Vector
$Common <- unname(BatCommon[statics_plus$Species])
statics_plus
# Group by Description and Night and Count the Observations
<- statics_plus %>%
grouped_data filter(Species %in% AnnexII) %>%
group_by(Common, MonthFull, Night) %>%
tally() %>%
summarise(Minimum = round(min(n, na.rm = T), 2),
Q1 = round(quantile(n, c(0.25), na.rm = T), 2),
Mean = round(mean(n, na.rm = T), 2),
Median = round(median(n, na.rm = T), 2),
Q3 = round(quantile(n, c(0.75), na.rm = T), 2),
Maximum = round(max(n, na.rm = T), 2),
SD = round(sd(n, na.rm = T), 2),
Nr = n())
# riven_cond_stats is made into a the table (using the code below)
%>%
grouped_data # Create the table with the gt package
gt(rowname_col = "MonthFull",
groupname_col = "Common") %>%
# Style the header to black fill and white text
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")),
locations = cells_column_labels(
columns = c(everything())
)%>%
) tab_style(
style = list(
cell_fill(color = "midnightblue"),
cell_text(color = "white"),
cell_text(weight = "normal")
),locations = cells_body(
columns = MonthFull
%>%
)) gt_color_rows(Maximum, palette = "ggsci::purple_material") %>%
tab_options(data_row.padding = px(2))
Minimum | Q1 | Mean | Median | Q3 | Maximum | SD | Nr | |
---|---|---|---|---|---|---|---|---|
Barbastelle | ||||||||
May | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
June | 1 | 1.50 | 18.00 | 18.0 | 31.50 | 42 | 15.67 | 11 |
July | 4 | 5.75 | 8.17 | 8.0 | 10.25 | 13 | 3.43 | 6 |
August | 1 | 3.00 | 7.27 | 6.0 | 8.00 | 22 | 6.13 | 15 |
September | 1 | 1.00 | 1.80 | 1.0 | 2.00 | 4 | 1.30 | 5 |
October | 4 | 5.00 | 6.71 | 6.0 | 7.00 | 13 | 2.98 | 7 |
Greater horseshoe | ||||||||
May | 2 | 2.50 | 3.00 | 3.0 | 3.50 | 4 | 1.00 | 3 |
June | 1 | 1.50 | 3.13 | 3.0 | 4.00 | 7 | 1.81 | 15 |
July | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
August | 1 | 1.00 | 2.07 | 2.0 | 3.00 | 6 | 1.39 | 15 |
September | 3 | 3.25 | 10.67 | 4.0 | 6.25 | 43 | 15.91 | 6 |
October | 1 | 1.00 | 1.00 | 1.0 | 1.00 | 1 | 0.00 | 2 |
Lesser horseshoe | ||||||||
May | 1 | 1.00 | 2.50 | 1.5 | 3.00 | 6 | 2.38 | 4 |
June | 1 | 1.00 | 1.38 | 1.0 | 2.00 | 2 | 0.52 | 8 |
July | 1 | 1.50 | 2.00 | 2.0 | 2.50 | 3 | 1.41 | 2 |
August | 1 | 1.00 | 2.30 | 1.5 | 3.00 | 6 | 1.70 | 10 |
September | 1 | 1.50 | 2.43 | 3.0 | 3.00 | 4 | 1.13 | 7 |
October | 1 | 1.00 | 2.67 | 1.0 | 3.50 | 6 | 2.89 | 3 |
Show the code
# Libraries (Packages) used
library(tidyverse)
library(mosaic)
library(gt)
library(gtExtras)
library(iBats)
# Annex II Bats (in the static)
<- c("Barbastella barbastellus", "Rhinolophus ferrumequinum", "Rhinolophus hipposideros")
AnnexII
# Add data and time information to the statics data using the iBats::date_time_info
<- iBats::date_time_info(statics)
statics_plus
# List of bat common names and the scientific names
<- c(
BatCommon "Barbastella barbastellus" = "Barbastelle",
"Rhinolophus ferrumequinum" = "Greater horseshoe",
"Rhinolophus hipposideros" = "Lesser horseshoe")
# From Scientific name create a Common Name Vector
$Common <- unname(BatCommon[statics_plus$Species])
statics_plus
# Group by Description and Night and Count the Observations
<- statics_plus %>%
grouped_data filter(Species %in% AnnexII) %>%
group_by(Common, MonthFull, Night) %>%
tally() %>%
summarise(Minimum = round(min(n, na.rm = T), 2),
Q1 = round(quantile(n, c(0.25), na.rm = T), 2),
Mean = round(mean(n, na.rm = T), 2),
Median = round(median(n, na.rm = T), 2),
Q3 = round(quantile(n, c(0.75), na.rm = T), 2),
Maximum = round(max(n, na.rm = T), 2),
SD = round(sd(n, na.rm = T), 2),
Nr = n())
# riven_cond_stats is made into a the table (using the code below)
%>%
grouped_data # Create the table with the gt package
gt(rowname_col = "Common",
groupname_col = "MonthFull") %>%
# Style the header to black fill and white text
tab_style(
style = list(
cell_fill(color = "black"),
cell_text(color = "white", weight = "bold")),
locations = cells_column_labels(
columns = c(everything())
)%>%
) gt_color_rows(Mean, palette = "ggsci::green_material") %>%
tab_options(data_row.padding = px(2))
Minimum | Q1 | Mean | Median | Q3 | Maximum | SD | Nr | |
---|---|---|---|---|---|---|---|---|
May | ||||||||
Barbastelle | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
Greater horseshoe | 2 | 2.50 | 3.00 | 3.0 | 3.50 | 4 | 1.00 | 3 |
Lesser horseshoe | 1 | 1.00 | 2.50 | 1.5 | 3.00 | 6 | 2.38 | 4 |
June | ||||||||
Barbastelle | 1 | 1.50 | 18.00 | 18.0 | 31.50 | 42 | 15.67 | 11 |
Greater horseshoe | 1 | 1.50 | 3.13 | 3.0 | 4.00 | 7 | 1.81 | 15 |
Lesser horseshoe | 1 | 1.00 | 1.38 | 1.0 | 2.00 | 2 | 0.52 | 8 |
July | ||||||||
Barbastelle | 4 | 5.75 | 8.17 | 8.0 | 10.25 | 13 | 3.43 | 6 |
Greater horseshoe | 1 | 1.00 | 1.75 | 1.5 | 2.25 | 3 | 0.96 | 4 |
Lesser horseshoe | 1 | 1.50 | 2.00 | 2.0 | 2.50 | 3 | 1.41 | 2 |
August | ||||||||
Barbastelle | 1 | 3.00 | 7.27 | 6.0 | 8.00 | 22 | 6.13 | 15 |
Greater horseshoe | 1 | 1.00 | 2.07 | 2.0 | 3.00 | 6 | 1.39 | 15 |
Lesser horseshoe | 1 | 1.00 | 2.30 | 1.5 | 3.00 | 6 | 1.70 | 10 |
September | ||||||||
Barbastelle | 1 | 1.00 | 1.80 | 1.0 | 2.00 | 4 | 1.30 | 5 |
Greater horseshoe | 3 | 3.25 | 10.67 | 4.0 | 6.25 | 43 | 15.91 | 6 |
Lesser horseshoe | 1 | 1.50 | 2.43 | 3.0 | 3.00 | 4 | 1.13 | 7 |
October | ||||||||
Barbastelle | 4 | 5.00 | 6.71 | 6.0 | 7.00 | 13 | 2.98 | 7 |
Greater horseshoe | 1 | 1.00 | 1.00 | 1.0 | 1.00 | 1 | 0.00 | 2 |
Lesser horseshoe | 1 | 1.00 | 2.67 | 1.0 | 3.50 | 6 | 2.89 | 3 |
Socrates in Plato’s Republic (circa 380 BC)↩︎
note:- in this case it is a count of bat passes↩︎
https://ec.europa.eu/environment/nature/conservation/species/habitats_dir_en.htm↩︎
https://ec.europa.eu/environment/nature/conservation/species/habitats_dir_en.htm↩︎