People Science: The Data

People Science: The Data

Article 3 in a series of 6.

The age of big data is here. We are creating 2.5 quintillion bytes of data every day and 90% of all data today is less than two years old. Yet in this data abundant world, HR data and people data is often found wanting. For many people scientists, the biggest and most persistent challenge is accessible, accurate, current and relevant people data. Why is working with people data so problematic? Why hasn’t the age of big data given us the age of big HR data? Exploring the root of people data issues starts with understanding the concepts of structured and unstructured data.

  • Structured data– is information with a high degree of organization, e.g. data in HR or finance systems.
  • Unstructured data– is information that is not organized in a pre-defined manor, e.g. emails, videos, social media.

Structured People Data

Information organized as structured data is what most of us think of as ‘data’ e.g. the spreadsheet with employee details organized into rows and columns. Once information is organized into a structured database, it can be explored and analyzed with statistics and data analytics. Before tackling unstructured people data, many organizations first need to get a handle on their structured data. People data is often in different HR systems or spreadsheets, inaccurate or out of date, inconsistent, and incomplete. The first step in the People Science Journey is integrating all relevant HR data and people data into an accessible and accurate system of record – the single version of the truth. This foundation of accessible and robust people data is required for people analytics and insights. It is important to note, the data foundation must include all relevant people data, with relevance being determined by the hypotheses being tested. Integration of all people data across all systems is an ideal state, however, not always possible nor necessary. People Science knowledge is constrained- along with people analytics- by the people data foundation. A narrow foundation will limit the breadth of analysis while a poor-quality foundation will limit the reliability and confidence of insight.

Unstructured People Data

When unstructured information is organized into structured data it can be easily analyzed with statistics. Unfortunately, transforming unstructured data into structured data is exceedingly laborious and often manual. The HR/people function has been dealing with the challenges of organizing and structuring people information for a very long time, often without realizing that was what they were doing. The annual performance review is a great example. Every year, inputs for employees are gathered from objectives, metrics, strengths & opportunities, skills assessment, self-reviews, downward reviews, peer reviews, upward reviews, etc. and based on these inputs employees receive a performance rating. From all this information, only a few elements are stored as structured data, often only the final performance rating; the rest is stored as unstructured data and rarely accessed or analyzed again.

Until recently the only way to leverage unstructured data was through a laborious process of organizing it into structured data. Emerging new big data technologies are changing this, enabling direct analysis of unstructured data. For example, semantic analysis, natural language processing, and text mining tools enable direct analysis of text heavy unstructured data. Marketers are using these tools to analyze sentiment in social media. These exciting new technologies enable data scientists to access and analyze a new world of data and people scientists are not far behind.

External data

In addition to internal people data, there are many times people scientists will leverage external data, e.g. benchmarks, competitor analysis, labor market data. External data can add additional context when robust benchmarks are available or as direct inputs into a statistical model. For example, attraction and retention are functions of both internal and external factors, and both types should be included as inputs or drivers in these models. Benchmarks and external labor market conditions are also very useful in adding context to attraction and retention analysis. In employment branding analysis, external data from social media and company review sites like Glassdoor are very important to include.

Data quality and governance

Rigorous data governance is essential to ensure a robust data foundation. Data governance is the overall management of data and includes all the necessary processes to ensure data is managed correctly. Governance of HR data and people data should be based on the overarching data governance framework, however, if this does not already exist it will need to be part of building the people data foundation. An effective data governance framework should include goals, data rules and definitions, decision owners, data controls, stakeholders, data stewards, and the process for ongoing data management.

Today, building a high-quality foundation of people data is possible for all companies. New HR systems and data technologies enable data integration, automate previously error-prone manual processes, improve accessibility with cloud storage, and much more. As a result, People Science has emerged to leverage these new data assets, and extract to create actionable knowledge.

The People Science Journey continues with part 4, The Tools.


See also

Article 1 – From Data Science to People Science

Article 2 – People Science the Science Foundation


Wait - don't go!