Groundwater quality

Multivariate statistical analyses were used in the Assessment to identify spatial patterns of groundwater chemistry that may be related to processes such as:

  • groundwater recharge
  • surface water – groundwater connectivity
  • aquifer connectivity
  • evolution of groundwater within different aquifers along the flow paths.

Piper plots were then used to compare the results of multivariate statistical analysis.

Historical water chemistry data were collated from these tables in the Queensland and NSW groundwater databases:

  • ‘Water analysis’ in the DNRM groundwater database (Bioregional Assessment Programme, Dataset 2)
  • ‘Groundwater quality’ in the National Groundwater Information System (NGIS) groundwater database (Bureau of Meteorology, Dataset 1).

Data quality validation

Prior to use in multivariate statistical analyses, all water chemistry major ion data used in the Assessment were subjected to standard quality checks such as charge balance calculations. The common exclusion criterion for water chemistry data is a charge balance error (CBE) of ±5% (Freeze and Cherry, 1979). However, due to the poor spatial coverage of sampling points in some parts of the Clarence-Moreton bioregion, and in agreement with other authors (e.g. Guggenmos et al., 2011; Güler et al., 2002; King et al., 2014), a CBE of ±10% was chosen in the Assessment to ensure that only samples with significant charge imbalances were excluded.

Multivariate statistical procedure

Hierarchical Cluster Analysis (HCA) is a multivariate statistical technique that incorporates a combination of any number of user-defined chemical and physical constituents including non-numerical parameters (Güler et al., 2002; Raiber et al., 2012). This method is commonly adopted in groundwater hydrochemical studies to identify certain patterns within a dataset to further understand the physical and chemical processes that underpin groundwater evolution (e.g. Stetzenbach et al., 1999; Güler et al., 2002; Thyne et al., 2004; Daughney and Reeves, 2006; O’Shea and Jankowski, 2006; Cloutier et al., 2008; Menció and Mas-Pla, 2008; Woocay and Walton, 2008; Daughney et al., 2011; Raiber et al., 2012).

The number of variables used in an HCA should be sufficiently large to ensure an accurate depiction of groundwater quality while a representative subset of those variables is sought to reflect the spatial variability of groundwater chemistry and the processes that control it. Nine variables were selected for the HCA of the Clarence-Moreton bioregion, namely, Ca, Mg, Na, K, HCO3, Cl, SO4,electrical conductivity and pH. Except for the latter, all variables were log-transformed to ensure that they closely follow a normal distribution.

The HCA presented in this work was carried out using the StatGraphics Centurion software (Manugistics Inc., USA) with two different linkage rules implemented. Firstly, the nearest neighbour linkage rule was used to identify monitoring sites having significantly different hydrochemical signatures compared to other sites (i.e. outliers that were placed as residuals in a separate group). Secondly, the Ward’s linkage rule was adopted to generate distinct clusters based on an analysis of variance used to group all non-residuals into separate clusters (i.e. each site in a cluster is more similar to other sites in the same cluster than to any site from a different cluster). The square of the Euclidean distance (E) was used in the HCA as a measure of similarity, which was performed over all variables included in the analysis. In previous studies (e.g. Güler et al., 2002; Daughney and Reeves, 2006; Cloutier et al., 2008; Daughney et al., 2011; Raiber et al., 2012), the outlined transformations of the input data, linkage rules and similarity measure were identified as the most appropriate techniques for classifying hydrochemical data. The outcome of this procedure is a dendrogram (e.g. Cloutier et al., 2008).

HCA is considered to be a semi-objective technique since an element of judgment is still required in the crucial step of determining the appropriate number of clusters that best represents the sample population. In this study, a step-wise procedure was conducted, which involved the visual inspection of the dendrogram (Cloutier et al., 2008; Raiber et al., 2012) and the comparison of centroid concentrations (represented by the median) for the different input variables for different clusters at different separation thresholds. The median was deemed to be a better indicator of central tendency as it is less sensitive to extreme values compared to the mean (Helsel and Hirsch, 2002).

Last updated:
8 January 2018
Thumbnail images of the Clarence-Moreton bioregion

Product Finalisation date