Hydrometeorological Data Evaluation Tool (HDET)
HDET is designed and developed by Dr. Ramesh Teegavarapu with the help of Dr. A. Goly.
to identify and evaluate: 1) anomalies and 2) outliers in hydrometeorological time series data. HDET can be used for any hydrological or meteorological data in both stand-alone mode and database connectivity mode.
Background and Objectives
Data cleaning is one of the first steps in data storage and analysis process requiring identification of outliers, non-homogeneous observations and data sets suspected to be influenced by instrumental and sensor-based, human and transcription errors. Hydrologic and climate data measured under varying field conditions and multiple sensors are known to be plagued by the problem of data anomalies and outliers. Techniques for identifying outliers and methods for performance evaluation of anomaly detection methods are critical for task of maintaining unbiased, clean and error-free homogeneous data. The HDET is a tool for identification of anomalies from hydro-meteorological and environmental data collected by several national, state and private agencies. The tool uses a number of several statistical and data mining anomaly detection techniques. These echniques are built based on traditional approaches which include: 1) classification-based, 2) near-neighbor-based; 3) clustering-based; 4) statistical; 5) information-theoretic and 6) spectral. The information-theoretic and spectral techniques are being investigated and implemented.
Main Module of HDET :
The module consists of several sub-modules and they include: 1) Data Input; 2) Exploratory Data Analysis; 3) Evaluate; 4) Performance Measures; 5) Reports and 6) Help.
The architecture of HDET is shown below :
Exploratory Data Analysis (EDA) sub-Module
The EDA sub-module shown below provide visual and quantitative details of the data (e.g.
stage data). Provides information of outliers and anomalies in the data identifies them within the time series. Rule-based and domain knowledge is used to develop the EDA module. The outliers and anomalies are flagged for later evaluation.
The Setup sub-module involves two options: 1) novice user and 2) expert user. Different methods (statistical, rule0based and others) can be selected by the expert user to identify and flag the outliers and anomalous observations. This module can be run in series or parallel mode of method s election and execution.
The run sub-module allows the users to select a combination of methods in each “run” for identification of anomalies and outliers. Each run is ranked based on a 2 x 2 contingency table that uses expert-identified anomalous observations and those identified by HDET.
The report sub-module generates the outliers and anomalous observations identified by all the runs as well the final set of results based on the best ranked run.
HDET can be adaptively improved by including most recent domain knowledge from the system and is expandable to include more methods and functions to handle any hydro-climatological variables. Future work will involve: 1) methods to correct anomalous observations and 2) infilling missing data
Please contact Dr. T. for more details : firstname.lastname@example.org