Data profiling

Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to:


 * 1) Find out whether existing data can easily be used for other purposes
 * 2) Improve the ability to search the data by tagging it with keywords, descriptions, or assigning it to a category
 * 3) Give metrics on data quality, including whether the data conforms to particular standards or patterns
 * 4) Assess the risk involved in integrating data for new applications
 * 5) Assess whether metadata accurately describes the actual values in the source database
 * 6) Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.
 * 7) Have an enterprise view of all data, for uses such as Master Data Management where key data is needed, or data governance for improving data quality.

Some companies also look at data profiling as a way to involve business users in what traditionally has been an IT function. Line of business users can often provide context about the data, giving meaning to columns of data that are poorly defined by metadata and documentation.