Skip to Main Content UMKC University Libraries

Finding Data

What is Data?

What is Data? As defined by Merriam-Webster is factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.
Without data sets, it can be difficult to understand and apply data retrieval and analysis skills. While artificially generated data may help introduce a technique or skill, it can be gratifying to learn data science on real-world data.


Criteria to Consider

How will you use data to assist your research question? List characteristics that the data must have to fulfill your research question such as:
  • Time Period- Do you need historical data? Is there a gap in the data?
    • examples. Birthrates during the Depression; Gun violence pre- and post-911
  • Frequency- Does currency matter for this topic? How often do you need this data collected? Quarterly, monthly, or annual data?
    • example: Yearly average income, Reported COVID-19 cases per month in 2020.
  •  Geography- Where do you need data from? Do you need data by county, state, country, national, or subnational data?
    • example: population by racial breakdown by district before and after redistricting
  • Methodology - How is the data likely to be collected?
    • example: Yogurt vs Ice cream: survey of customer preference; observations of traffic violations in a neighborhood


Who gathers/makes data?

Government Agencies- collect data through various surveys and publish it in data tables, data files, data portals, or reports organized by sectors, which are represented by 2-6 digits for each sector. Data is collected through 3 types of censuses:

  • Decennial: measures the population of housing count for every resident in the US,
  • Economic: Measures the health of the Nation's economy by providing vital statistics about industries and businesses, and
  • Census of Governments: Identifies the scope and nature of the nation's state and local government sector

  • Pro: Easy accessibility of mass amounts of data
  • Con: Mass amounts of data may be overwhelming at first to cipher. 

Data Archive/Repository-  a dedicated archive for storing and sharing digital data.

  • Pro: provides easy access to research data. 
  • Con: Often do not assess data set quality, which may allow issues with confidentiality, copyright, incomplete metadata, missing documentation, or unavailable formats.

Trade/Industry- collect data from members of various industries and fields. For example: electricians in a union. 

  • Pro: Reports are often free
  • Con: The data from the reports published are often not. Data may not be randomized sampled or statistically reliable, so be aware of potential biases.

International Organizations- collects statistics and data from countries that are members of the organization.

  • Pro: Often shared for free. 
  • Con: May contain inconsistencies.

Nonprofit Organizations- invests in mission-related data. 

  • Pro: Valuable for filling in gaps in current government data and may (partially) be free.
  • Con: Issues with potential biases

Private Data Vendors- compile public or private data into a database. Often used as a pointer to find original data and verify accuracy.

  • Pro: Makes scattered data more available and accessible.
  • Con: Often need paid access. Data could still have missing values, errors, inconsistencies, standardization, rounding, or selection bias.