water data QA & QC

decision support

Water Quality Assurance/Quality Control and Data Management

This section contains information and guidance to assist with the development of quality assurance/quality control procedures and data management systems for water data

  • Quality Assurance/Quality Control
  • Data Management

Quality assurance/quality control

Quality assurance/quality control (QA/QC) activities should be an integral part of any water management processes as they improve transparency, consistency, comparability, completeness and accuracy of greenhouse gas inventories.

Quality control (QC) is defined as a system of checks to assess and maintain the quality of the inventory being compiled. Quality control procedures are designed to provide routine technical checks to measure and control the quality of water data; to ensure data consistency, integrity, correctness and completeness; and to identify and address errors and omissions. Quality control checks should cover everything from data acquisition and handling, application of approved procedures and methods, technical reviews of emission factors and other estimation parameters, to calculation of estimates and documentation. Examples of general quality control checks include:

  • checking for transcription errors in data input;
  • checking that emission and removals are estimated and calculated correctly;
  • checking that proper conversion factors were used;
  • checking that all sources and sinks have been accounted for; and
  • checking appropriateness of emissions factors.

Quality assurance (QA) is a planned system of review procedures conducted outside the actual inventory compilation by personnel not directly involved in the inventory development process. It is a non-biased, independent review of methods and/or emissions estimates that ensures that the inventory continues to incorporate the most current scientific knowledge and data available. Quality assurance procedures may include expert peer reviews of calculations and assumptions and audits to assess the quality of the inventory and to identify where improvements could be made.

Data management

At the facility level, it is important that collected water data and information is effectively managed to ensure the reliability of the inventory whether it is used by regulators, stakeholders, partners, consultants, or the general public. A water data management system should include methods and processes to ensure the integrity of the water data and information. It should also cover the data and information monitoring and management, as well as electronic spreadsheets and forms, data management software, and hardcopy records. It is important to that ensure data management controls exist throughout the data management system and are of significance whenever there is a transfer or change of data or information. These controls are extremely important for electronic spreadsheets to provide a record of any changes and limit access to avoid unauthorized revisions. Implementing controls on spreadsheets will also enable them to be verified. Examples of water controls are provided in the table below.

 
Water Data Management Processes Examples of Water Data Controls
Data Collection data entry controls including tolerance limits, ranges for applicable data fields, computations, and data sequence checks
staff competency requirements
documented procedures and schedule for data collection
prescribed sampling methodologies
Data Consolidating and Processing checks on the conversion of data between their origin and their destination to confirm that the data are complete and accurate
controlled access to spreadsheets including password protection, locking or protecting formulas or master data
approvals and testing procedures for any changes to spreadsheets or models
error checking procedures including data reconciliations, recalculation
Data Transmission where there are interfaces with other systems ensure the ability to recover the data in the event of incomplete data transmissions
the existence and adequacy of input, output, and transformation error checking, routines
Data Reporting distribution lists for reports
documented procedures for management review of reports
System Security and Recovery defined user access to greenhouse gas data in terms of roles and responsibility
segregation of duties to ensure no one individual has complete control over all key processing functions
documented process to ensure appropriate timing and frequency of data back-up
Maintenance and Retention clearly defined accountability and codes of conduct for staff and service providers
internal audit procedures, and schedules for management reviews
documented data retention procedures

ISO 9000

The definition of data quality can be defined as the degree to which a set of characteristics of data fulfills requirements. Examples of characteristics are: completeness, validity, accuracy, consistency, availability and timeliness. Requirements are defined as the need or expectation that is stated, generally implied or obligatory.

farn1

okanagan wetland wildlife

Optimum use of water data quality: Water Data Quality (DQ)

 

Water Data Quality (DQ) is a niche area required for the integrity of the data management by covering gaps of data issues. This is one of the key functions that aid data governance by monitoring data to find exceptions undiscovered by current data management operations. Data Quality checks may be defined at attribute level to have full control on its remediation steps.

DQ checks and business rules may easily overlap if an organization is not attentive of its DQ scope. Business teams should understand the DQ scope thoroughly in order to avoid overlap. Data quality checks are redundant if business logic covers the same functionality and fulfills the same purpose as DQ. The DQ scope of an organization should be defined in DQ strategy and well implemented. Some data quality checks may be translated into business rules after repeated instances of exceptions in the past.

Below are a few areas of data flows that may need perennial DQ checks:

Completeness and precision DQ checks on all data may be performed at the point of entry for each mandatory attribute from each source system. Few attribute values are created way after the initial creation of the transaction; in such cases, administering these checks becomes tricky and should be done immediately after the defined event of that attribute's source and the transaction's other core attribute conditions are met.

All data having attributes referring to Reference Data in the organization may be validated against the set of well-defined valid values of Reference Data to discover new or discrepant values through the validity DQ check. Results may be used to update Reference Data administered under Master Data Management (MDM).

All data sourced from a third party to organization's internal teams may undergo accuracy (DQ) check against the third party data. These DQ check results are valuable when administered on data that made multiple hops after the point of entry of that data but before that data becomes authorized or stored for enterprise intelligence.

All data columns that refer to Master Data may be validated for its consistency check. A DQ check administered on the data at the point of entry discovers new data for the MDM process, but a DQ check administered after the point of entry discovers the failure (not exceptions) of consistency.

As data transforms, multiple timestamps and the positions of that timestamps are captured and may be compared against each other and its leeway to validate its value, decay, operational significance against a defined SLA (service level agreement). This timeliness DQ check can be utilized to decrease data value decay rate and optimize the policies of data movement timeline.

In an organization complex logic is usually segregated into simpler logic across multiple processes. Reasonableness DQ checks on such complex logic yielding to a logical result within a specific range of values or static interrelationships (aggregated business rules) may be validated to discover complicated but crucial business processes and outliers of the data, its drift from BAU (business as usual) expectations, and may provide possible exceptions eventually resulting into data issues. This check may be a simple generic aggregation rule engulfed by large chunk of data or it can be a complicated logic on a group of attributes of a transaction pertaining to the core business of the organization. This DQ check requires high degree of business knowledge and acumen. Discovery of reasonableness issues may aid for policy and strategy changes by either business or data governance or both.

Conformity checks and integrity checks need not covered in all business needs, it’s strictly under the database architecture's discretion.

There are many places in the data movement where DQ checks may not be required. For instance, DQ check for completeness and precision on not–null columns is redundant for the data sourced from database. Similarly, data should be validated for its accuracy with respect to time when the data is stitched across disparate sources. However, that is a business rule and should not be in the DQ scope.

Regretfully, from a software development perspective, Data Quality is often seen as a non functional requirement. And as such, key data quality checks/processes are not factored into the final software solution. Within Healthcare, wearable technologies or Body Area Networks, generate large volumes of data.  The level of detail required to ensure data quality is extremely high and is often under-estimated. This is also true for the vast majority of mHealth apps, EHRs and other health related software solutions. The primary reason for this, stems from the extra cost involved is added a higher degree of rigor within the software architecture.

 

Criticism of existing tools and processes

The main reasons cited are:

  • Project costs: costs are typically in the hundreds of thousands of dollars
  • Time: lack of enough time to deal with large-scale data-cleansing software
  • Security: concerns over sharing information, giving an application access across systems, and effects on legacy systems

 

Data standards information:

 

1_deep