Product Requirements Document

Salary Equity Analyzer

Author: Alena Reva

Version: 2.0 (Revised)

Statistical methodology enhancements, privacy safeguards, and improved accuracy measures

1. Introduction

1.1. Purpose

This document outlines the product requirements for the Salary Equity Analyzer, a web application designed to help organizations identify and address potential pay disparities based on gender, race, and other demographic factors. The application analyzes employee compensation data uploaded via CSV to provide actionable insights into pay equity, ensuring fair and compliant compensation practices.

1.2. Scope

The initial version focuses on providing a robust tool for conducting comprehensive pay equity analysis. Core functionality includes secure CSV data import, statistical analysis of compensation data, visualization of results, and downloadable reports. The application does not store user data long-term, prioritizing data privacy and security.

1.3. Target Audience

The primary users are Human Resources (HR) professionals, compensation analysts, and business leaders responsible for ensuring fair pay practices within their organizations.

2. User Personas

2.1. HR Analyst (Primary)

  • Background: Responsible for compensation, benefits, data analysis, and HR compliance

  • Needs: User-friendly tool to conduct complex pay equity analysis without advanced statistical expertise

  • Frustrations: Current methods are manual, time-consuming, and prone to errors

2.2. C-Level Executive (Secondary)

  • Needs: Clear, concise summary of pay equity status with key findings and recommendations

  • Frustrations: Lack of visibility into potential pay disparities and associated risks

3. Features and Functionality

3.1. Data Import

Users import compensation data via CSV upload. Data must follow the required template to ensure quality and consistency.

⚠️ IMPORTANT PRIVACY NOTICE

DO NOT include employee names in your data. Use employee IDs, code names, or other non-identifying unique identifiers only. This tool provides NO PRIVACY GUARANTEE. Data is processed through third-party systems and users are solely responsible for ensuring compliance with their organization’s data protection policies.

Required Data Fields:

  • Unique Identifier (Employee ID or code name ONLY)

  • Gender

  • Race

  • Job Title

  • Location (US State or Country)

  • Overall Years of Experience

  • Years in Current Role

  • Performance Rating (Below Midpoint / Midpoint / Above Midpoint)

  • Base Salary

3.2. Data Validation

Upon import, the application validates data format and required fields. Errors are flagged for correction.

Missing Data Handling

  • Display records with missing data and which fields are incomplete

  • User chooses: fix the data or exclude the record

  • Warning if >20% records excluded: results may not be representative

Outlier Detection

  • Salary outliers identified using IQR method (below Q1 - 1.5×IQR or above Q3 + 1.5×IQR)

  • Users can include or exclude flagged outliers

3.3. Equity Analysis Engine

The core statistical engine performs regression analysis to identify pay disparities, controlling for legitimate factors (job title, experience, location) to isolate the impact of gender and race on compensation.

3.4. Results Dashboard

  • Overall pay equity score (0–100)

  • Progressive pay gap breakdown by gender and race across multiple adjustment levels

  • 95% confidence intervals for all reported pay gaps

  • Intersectionality analysis (gender × race combinations)

  • At-risk employee identification

  • Model diagnostics summary (R², sample sizes, assumption checks)

3.5. Data Export

CSV export for pay gaps, at-risk employees, full model results, and descriptive statistics. PDF report with executive summary, methodology, and disclaimers.

4. Statistical Methodology

4.1. Core: Multiple Linear Regression

The analysis uses multiple linear regression, the industry standard for pay equity audits. The natural logarithm of salary (log(Salary)) is the dependent variable, which normalizes skewed salary distributions and allows results to be interpreted as percentage differences.

Regression Equation:

log(Salary) = β0 + β1(Experience) + β2(RoleExperience) + Σβj(JobTitle_j) + Σβk(Location_k) + Σβm(Performance_m) + γ1(Gender) + Σγn(Race_n) + ε

Categorical variables are dummy-coded. One group per category serves as the baseline. Coefficients are interpreted relative to the baseline.

Multicollinearity Assessment

  • Variance Inflation Factors (VIF) calculated for all predictors

  • Variables with VIF > 10 trigger a warning

4.2. Progressive Adjustment Models

Five models are run with progressively more control variables to show how gaps change as legitimate factors are accounted for:

  • Model 1: Gender + Race only (unadjusted)

  • Model 2: + Job Title

  • Model 3: + Experience variables

  • Model 4: + Performance

  • Model 5: + Location (fully adjusted)

4.3. Pay Gap Calculation

  • Coefficients converted to percentages: (exp(β) - 1) × 100

  • 95% confidence intervals: CI = coefficient ± (1.96 × standard error)

  • Intervals not containing zero indicate statistical significance at α = 0.05

4.4. Intersectionality Analysis

  • Interaction terms created for Gender × Race combinations

  • Combined effect reported (main effects + interaction)

  • Only intersections with n ≥ 10 employees are reported

4.5. At-Risk Employee Identification

Employees are flagged based on studentized residuals (not raw residuals), which account for observation leverage:

  • Primary at-risk: Studentized residual < -2.0

  • Watch list: Protected class members with residual < -1.5

  • 80% prediction intervals provided for each at-risk employee

4.6. Pay Equity Scoring

Score based on the largest statistically significant negative pay gap (using upper bound of 95% CI for conservatism):

  • No significant gaps → Score = 100

  • Gap 0.1–2.9% → Score 85–99 (Excellent)

  • Gap 3.0–4.9% → Score 70–84 (Fair)

  • Gap ≥5.0% → Score <70 (Needs Improvement)

4.7. Model Diagnostics

  • R² and Adjusted R² (typical range: 0.60–0.85 for pay equity models)

  • F-statistic and p-value for overall model significance

  • Breusch-Pagan test for homoscedasticity

  • Shapiro-Wilk test for normality of residuals

  • Status indicators: ✓ Pass, ⚠ Warning, ✗ Fail

5. Non-Functional Requirements

5.1. Security and Privacy

  • No user data stored after session ends

  • HTTPS enforced in production

  • All inputs validated on backend

⛔ NO PRIVACY GUARANTEE DISCLAIMER

This tool provides NO PRIVACY GUARANTEE. Users must NOT upload employee names. By using this application, users acknowledge that data may be processed through third-party infrastructure, complete data security cannot be guaranteed, and users bear full responsibility for compliance with applicable regulations.

5.2. Small Sample Size Warnings

  • Groups with n < 30: prominent caution banner

  • Groups with n < 10: strong warning that results are not suitable for decision-making

5.3. Performance

Analysis should complete in under 30 seconds for 1,000 employees. Frontend remains responsive during analysis with progress indication.

6. Technical Architecture

  • Frontend: React with TypeScript, Vite, TailwindCSS, Recharts

  • Backend: Node.js with Express and TypeScript

  • Statistics: Python 3.11 with statsmodels, pandas, numpy, scipy, scikit-learn

  • Deployment: Docker container on Railway

  • Export: CSV and PDF generation

7. Future Considerations

  • Cohort analysis with filtering by performance, experience, job title, location

  • Google Sheets integration for direct data import

  • Track pay equity over time with historical comparisons

  • "What-If" scenario modeling for salary adjustment impact