"" Using Data to Improve Care for Children EKG
Home >

Clean the Data Using a Predefined Specification

Once you’ve identified the problems in your dataset, you will want to develop a cleaning routine. This cleaning routine will be used to help you produce more reliable, consistent, and accurate results in your data.

A Typical Cleaning Routine

  1. Identify invalid data. Use your standards of data quality and your key necessities to identify all the invalid or inaccurate data.
  2. Investigate the reasons for the bad data. Having this understanding will assist you in taking the necessary actions to correct the data.
  3. Determine how the dirty data should be cleaned. Whenever possible, invalid data should be corrected so it can be used for your project.
  4. Perform accuracy tests to ensure the data were properly cleaned. Accuracy tests are a physical comparison of the data collected with the actual event/object.

    For example, you may want to compare the written run report with the electronic version that was recently entered into the database.

These steps may seem time consuming but they are worth every minute!

Next Step
Identify Methods to Minimize More Bad Data >>




rev. 04-Aug-2022




Disclaimer | Website Feedback | U of U
© NEDARC 2010

This website is supported by the Health Resources and Services Administration (HRSA) of the U.S. Department of Health and Human Services (HHS) as part of the Emergency Medical Services for Children Data Center award totaling $3,200,000 with 0% financed with non-governmental sources. The contents are those of the author(s) and do not necessarily represent the official views of, nor an endorsement, by HRSA, HHS, or the U.S. Government. For more information, please visit HRSA.gov.

(In accordance with the Americans with Disabilities Act, the information in this site is
available in alternate formats upon request.)