Product Requirements Document
Salary Equity Analyzer
Author: Alena Reva
Version: 2.0 (Revised)
Statistical methodology enhancements, privacy safeguards, and improved accuracy measures
1. Introduction
1.1. Purpose
This document outlines the product requirements for the Salary Equity Analyzer, a web application designed to help organizations identify and address potential pay disparities based on gender, race, and other demographic factors. The application analyzes employee compensation data uploaded via CSV to provide actionable insights into pay equity, ensuring fair and compliant compensation practices.
1.2. Scope
The initial version focuses on providing a robust tool for conducting comprehensive pay equity analysis. Core functionality includes secure CSV data import, statistical analysis of compensation data, visualization of results, and downloadable reports. The application does not store user data long-term, prioritizing data privacy and security.
1.3. Target Audience
The primary users are Human Resources (HR) professionals, compensation analysts, and business leaders responsible for ensuring fair pay practices within their organizations.
2. User Personas
2.1. HR Analyst (Primary)
Background: Responsible for compensation, benefits, data analysis, and HR compliance
Needs: User-friendly tool to conduct complex pay equity analysis without advanced statistical expertise
Frustrations: Current methods are manual, time-consuming, and prone to errors
2.2. C-Level Executive (Secondary)
Needs: Clear, concise summary of pay equity status with key findings and recommendations
Frustrations: Lack of visibility into potential pay disparities and associated risks
3. Features and Functionality
3.1. Data Import
Users import compensation data via CSV upload. Data must follow the required template to ensure quality and consistency.
⚠️ IMPORTANT PRIVACY NOTICE
DO NOT include employee names in your data. Use employee IDs, code names, or other non-identifying unique identifiers only. This tool provides NO PRIVACY GUARANTEE. Data is processed through third-party systems and users are solely responsible for ensuring compliance with their organization’s data protection policies.
Required Data Fields:
Unique Identifier (Employee ID or code name ONLY)
Gender
Race
Job Title
Location (US State or Country)
Overall Years of Experience
Years in Current Role
Performance Rating (Below Midpoint / Midpoint / Above Midpoint)
Base Salary
3.2. Data Validation
Upon import, the application validates data format and required fields. Errors are flagged for correction.
Missing Data Handling
Display records with missing data and which fields are incomplete
User chooses: fix the data or exclude the record
Warning if >20% records excluded: results may not be representative
Outlier Detection
Salary outliers identified using IQR method (below Q1 - 1.5×IQR or above Q3 + 1.5×IQR)
Users can include or exclude flagged outliers
3.3. Equity Analysis Engine
The core statistical engine performs regression analysis to identify pay disparities, controlling for legitimate factors (job title, experience, location) to isolate the impact of gender and race on compensation.
3.4. Results Dashboard
Overall pay equity score (0–100)
Progressive pay gap breakdown by gender and race across multiple adjustment levels
95% confidence intervals for all reported pay gaps
Intersectionality analysis (gender × race combinations)
At-risk employee identification
Model diagnostics summary (R², sample sizes, assumption checks)
3.5. Data Export
CSV export for pay gaps, at-risk employees, full model results, and descriptive statistics. PDF report with executive summary, methodology, and disclaimers.
4. Statistical Methodology
4.1. Core: Multiple Linear Regression
The analysis uses multiple linear regression, the industry standard for pay equity audits. The natural logarithm of salary (log(Salary)) is the dependent variable, which normalizes skewed salary distributions and allows results to be interpreted as percentage differences.
Regression Equation:
log(Salary) = β0 + β1(Experience) + β2(RoleExperience) + Σβj(JobTitle_j) + Σβk(Location_k) + Σβm(Performance_m) + γ1(Gender) + Σγn(Race_n) + ε
Categorical variables are dummy-coded. One group per category serves as the baseline. Coefficients are interpreted relative to the baseline.
Multicollinearity Assessment
Variance Inflation Factors (VIF) calculated for all predictors
Variables with VIF > 10 trigger a warning
4.2. Progressive Adjustment Models
Five models are run with progressively more control variables to show how gaps change as legitimate factors are accounted for:
Model 1: Gender + Race only (unadjusted)
Model 2: + Job Title
Model 3: + Experience variables
Model 4: + Performance
Model 5: + Location (fully adjusted)
4.3. Pay Gap Calculation
Coefficients converted to percentages: (exp(β) - 1) × 100
95% confidence intervals: CI = coefficient ± (1.96 × standard error)
Intervals not containing zero indicate statistical significance at α = 0.05
4.4. Intersectionality Analysis
Interaction terms created for Gender × Race combinations
Combined effect reported (main effects + interaction)
Only intersections with n ≥ 10 employees are reported
4.5. At-Risk Employee Identification
Employees are flagged based on studentized residuals (not raw residuals), which account for observation leverage:
Primary at-risk: Studentized residual < -2.0
Watch list: Protected class members with residual < -1.5
80% prediction intervals provided for each at-risk employee
4.6. Pay Equity Scoring
Score based on the largest statistically significant negative pay gap (using upper bound of 95% CI for conservatism):
No significant gaps → Score = 100
Gap 0.1–2.9% → Score 85–99 (Excellent)
Gap 3.0–4.9% → Score 70–84 (Fair)
Gap ≥5.0% → Score <70 (Needs Improvement)
4.7. Model Diagnostics
R² and Adjusted R² (typical range: 0.60–0.85 for pay equity models)
F-statistic and p-value for overall model significance
Breusch-Pagan test for homoscedasticity
Shapiro-Wilk test for normality of residuals
Status indicators: ✓ Pass, ⚠ Warning, ✗ Fail
5. Non-Functional Requirements
5.1. Security and Privacy
No user data stored after session ends
HTTPS enforced in production
All inputs validated on backend
⛔ NO PRIVACY GUARANTEE DISCLAIMER
This tool provides NO PRIVACY GUARANTEE. Users must NOT upload employee names. By using this application, users acknowledge that data may be processed through third-party infrastructure, complete data security cannot be guaranteed, and users bear full responsibility for compliance with applicable regulations.
5.2. Small Sample Size Warnings
Groups with n < 30: prominent caution banner
Groups with n < 10: strong warning that results are not suitable for decision-making
5.3. Performance
Analysis should complete in under 30 seconds for 1,000 employees. Frontend remains responsive during analysis with progress indication.
6. Technical Architecture
Frontend: React with TypeScript, Vite, TailwindCSS, Recharts
Backend: Node.js with Express and TypeScript
Statistics: Python 3.11 with statsmodels, pandas, numpy, scipy, scikit-learn
Deployment: Docker container on Railway
Export: CSV and PDF generation
7. Future Considerations
Cohort analysis with filtering by performance, experience, job title, location
Google Sheets integration for direct data import
Track pay equity over time with historical comparisons
"What-If" scenario modeling for salary adjustment impact