People Science: Theories, Methods and Tools
Article 4 in a series of 6.
People Science, like data science, is an interdisciplinary field and uses a variety of tools to extract knowledge from people data. These tools include theories from across the social sciences, math & statistics, and data systems and technologies. This abundance of tools is both a blessing and curse for people scientists. Each field of social sciences has its own measurement frameworks and theories; there are often competing theories within a field as well. Math and statistics offer innumerable permutations of quantitative methods. The data technology landscape has exploded in the last few years and continues to expand rapidly with big data, BI, and data visualization tools. With so many tools to choose from decision fatigue followed by paralysis is a risk! What tools should you consider and how do you choose?
In academia, empirical science is segmented into natural and social sciences. Natural sciences study the material universe and include life and physical sciences. Social sciences study people and societies and include sociology, economics, psychology, etc. Traditionally, HR has relied heavily on I/O psychology, for theories on people and performance, along with behavioral sciences. As HR analytics matured to talent analytics and people analytics, the theories have expanded to include natural sciences like neurosciences. HR solutions have expanded as well with assessment providers now offering brain science based assessments. Recently, criticism of the long held economic assumption of rationality has highlighted cognitive biases and the impact these have on behaviors and decisions.
Understanding cognitive biases and how they impact behaviors and decisions is critical in People Science as these can significantly impact model design. For example, in a candidate conversion analysis, a common assumption should a candidate decline your company’s job offer, and takes a job elsewhere, that the candidate makes this decision for logical reasons. This assumption of the candidate’s ability to rationally compare and choose between job offers in embedded in the analysis, rarely tested, and can lead to spurious conclusions. Further complications arise as people are rarely aware of cognitive biases and how they impact their behaviors and decisions. Scientific theories include associated assumptions, unfortunately, outside of academia these assumptions are often overlooked. People scientists rely on theories from so many different fields, they need to be extra diligent in identifying and testing assumptions embedded in each field and theory. When selecting a theory consider 1) the challenge or hypotheses being tested 2) model feasibility given organizational context 3) what is being assumed and the validity of those assumptions.
Math and Statistics
The formal sciences study logic and math and include statistics, game theory, and decision theory. People Science uses the formal sciences as a tool, as do most empirical sciences, in conjunction with theories to test hypotheses. How to normalize and report on HR data by transforming it into ratios, percentages, rates, or aggregated metrics is critical component of People Science. HR metrics and key performance indicators (KPIs) definitions, calculations, and analysis has been the core of HR analytics and people analytics and likewise is crucial in People Science.
Statistics and probability theories are important in People Science and these theories include assumptions as well. For example, ordinary least squares (OLS), the most common method used for linear regressions, and an essential tool for people scientists, is based on a series of data assumptions. One of these assumptions-no linear dependence– assumes each of the dependent variables (or drivers) is linearly independent of each other- e.g. there is no correlation or relationship. When this assumption is invalid the variables are multicollinear and strong multicollinearity can cause substantial errors in estimation, to extent of changing a positive relationship into a negative coefficient! If variables are perfectly collinear, e.g. height in meters and height in inches, the model will not calculate and statistical software will produce an error. In People Science, data is rarely independent or normally distributed, and it is vital to know what underlying mathematical and statistical assumptions are in models and how to proceed if these are invalid. For example, the Calibrated Bayes statistical paradigm can be used with big data when random sampling assumptions are not valid. The selection of math and stats tools should be based on 1) the challenge or hypotheses 2) data characteristics and properties 3) validity of associated model assumptions.
The age of big data and data science has come with revolutionary new data technologies and solutions. For example, there are new ways to store data with NoSQL and NewSQL databases, unstructured data processing with Hadoop, machine learning algorithms, and much more. Unfortunately, the data tech available to people scientists is often constrained to a company’s investments in existing systems. The data tech landscape includes HR specific solutions- HR tech- and everything else- non-HR tech. In people science, the core HR systems- HRMS or HRIS- is also the core of the data foundation. After the core system, additional supplementary systems should be considered and evaluated based on business need and the people strategy. The non-HR data tech landscape includes data infrastructure, applications (non-HR), and analytics solutions. While not designed specifically for HR data or people data, many can easily be applied to people data, or be sources of relevant non-HR data to include in analysis. People scientists do not need to be experts on data infrastructures and technology stacks, however, a basic understanding enables you to identify opportunities for leveraging non-HR data tech for people analytics and analysis.
People scientists need access to statistical software for advanced modeling and analytics. Some solution specific HR tech include embedded analytics, but with very limited flexibility. Excel can be used for descriptive statistics, correlations, and basic linear regressions and is a reasonable option for occasional analysis. Excel does not support larger data sets, complex models, or data manipulation and processing. Pure stats software programs, e.g. SPSS, SAS, Stata, are specifically designed for structured data analytics and heavily used in academia. SPSS is frequently used by social scientists and in business environments and is good option for less technical users with extensive help sections on statistical methodology. R Project, an open source option, is popular with many data scientists and is a good choice for technically savvy users. Learning a new stats program is time consuming and the most practical choice is often the one your people scientist knows how to use.
New HR specific analytics products are another option to consider. These products are designed specifically for HR data and include predefined metrics, analytics, and data visualization. The prebuilt models make it easier for HR teams without people scientists or strong stats backgrounds to analyze date. Unfortunately, embedded analytics also constrains flexibility and limits the range of analysis, models, and customization available. With all the new data tech available, many teams make the mistake of investing in analytics products without adding analytical skills to the team. People analytics tech adds value when the results are evaluated and interpreted by someone with a strong stats background.
The proliferation of tools for people scientists is both a blessing and a curse. It is easy for all the various options to blend together into noise or overwhelm us with choices to the point of decision fatigue. Furthermore, these decisions cannot be made in insolation as the theories, models, and data tech need to be considered in the selection of each other. Theories and models will need to be selected for each challenge or hypotheses and should be reevaluated periodically overtime. Selection of data tech is a longer-term commitment, and it is important to consider flexible solutions that can be used on a wide variety of people data, models, and statistics both today and in the future. In today’s rapidly changing business environment People Science tools need to be able to dynamic and flexible so they can change with the business, or the context.