Recruitment Funnel Analysis

1 Introduction

This report explores a simulated recruitment dataset, with the aim of understanding how candidates move through the hiring funnel and what factors influence their chances of success. The analysis will focus on questions such as:

How many candidates reach each stage, and where do the biggest drop-offs occur?
Do recruiter performance or source of application affect outcomes?
Do higher test scores improve the likelihood of being hired?
Are some job roles or locations harder to fill than others?
Which factors overall are most predictive of a candidate being hired?

The dataset tracks applicants through five stages: application, screening, interview, offer, and hire. For each candidate it includes:

Recruiter rec_id – one of five recruiters.
Job role job_role – four possible roles.
Source source – LinkedIn, job boards, or the company careers page.
Test score score – a 0–100 score mimicking results from application tests (e.g., psychometric assessments).
Stage flags application, screening, interview, offer, hired) – binary indicators of progression.

The simulation includes parameters to reflect real-world variation. Candidate scores differ by source, recruiters vary in closing effectiveness, and progression rates depend on both scores and source.

Although the dataset was created programmatically, the analysis is approached as if it were newly provided, with the aim of uncovering patterns and drivers of hiring success.

2 Data Checks & Preparation

Before beginning the analysis, it is important to confirm that the dataset is properly structured and suitable for use. Although the data were simulated, it is treated as if it were newly provided, and a set of basic checks are applied to ensure consistency and integrity.

The checks focused on the following points:

Candidate IDs are unique
Test scores fall within the expected range of 0–100
Stage flags contain only values of 0 or 1
The number of candidates decreases at each stage of the funnel
No missing values are present

The table below summarises the results of these checks.

Summary of Data Quality Checks
Check	Summary
Dataset shape	2500 rows x 10 columns
Unique candidate IDs	No duplicates found
Score range within 0–100	Observed range: 21–100
Stage values restricted to {0,1}	All valid
Funnel monotonicity (stage counts decrease)	All stages monotonic
Missing values	None detected

The dataset contains 2,500 rows and 12 columns, with no missing values. Candidate identifiers are unique, scores and dates fall within the expected ranges, and progression through the funnel follows the correct pattern.

Finally, the first few rows of the dataset are displayed below.

cand_id	rec_id	source	job_role	score	application	screening
C0001	R4	Job board	Data Analyst	67	1	1
C0002	R3	Job board	Software Engineer	55	1	0
C0003	R4	LinkedIn	Software Engineer	78	1	1
C0004	R5	Job board	HR Specialist	30	1	0
C0005	R5	Careers page	HR Specialist	44	1	1

3 Analysis

This section analyses candidate progression through the recruitment process and explores the key factors influencing hiring outcomes.

It begins with an overview of the funnel and key variables, followed by deeper analysis of how scores, recruiters, sources, roles, and locations relate to hiring success. Together, these analyses provide the foundation for the predictive modelling conducted later on.

3.1 Overall Funnel Counts & Conversion

This subsection provides an overview of how candidates move through the recruitment funnel. It identifies where the main reductions in volume occur and highlights which stages have the lowest conversion.

Candidate volumes decrease steadily at each stage of the funnel, but the most notable reductions occur at two key points. The largest numerical drop-off is between screening and interview, where the pool falls from 1,770 to 876 candidates, meaning just under 50% progress past screening. This indicates that the screening stage functions as the main volume filter in the process. The interview-to-offer stage is even more selective in proportional terms, with only 32.5% of interviewed candidates receiving an offer. Together, these patterns suggest that early screening criteria and later interview evaluations are the primary drivers of progression, shaping how candidates move through the recruitment funnel.

3.2 Assessment Scores

This section explores how candidate assessment scores relate to hiring outcomes. Scores are often used to predict job performance or short-list candidates in early stages, so understanding their connection to progression through the funnel is important. I first examine summary statistics and score distributions, followed by their relationship with hiring probability and progression through each stage. Logistic regression is used to quantify how much scores contribute to hiring likelihood.

3.2.1 Summary and Distributions

I begin by summarising the overall distribution of assessment scores to understand how candidates performed on the test.

Summary Statistics for Candidate Test Scores
Mean	Sd	Median	Min	Max	Range	Se
63.19	13.7	63	21	100	79	0.27

The average score of 63 indicates that the assessment is of moderate difficulty, with candidates performing reasonably well. The standard deviation of 13.7, combined with a wide range from 21 to 100, shows strong spread and clear separation in candidate performance. This level of variability indicates that the assessment discriminates effectively between lower and higher ability candidates. Given the substantial drop-offs observed earlier in the funnel, this spread in test performance is likely to play a meaningful role in determining which candidates progress to later stages.

The histogram below shows the distribution of candidate test scores.

The distribution of scores is close to normal, with most candidates clustered around the mid-60s and fewer at the extremes. This pattern indicates a well-behaved assessment without ceiling or floor effects. A distribution like this is ideal for selection purposes, as it provides meaningful variation for differentiating between candidates.

3.2.2 Score and Hiring Success

To understand how strongly assessment scores influence the likelihood of being hired, I use a logistic regression model predicting hiring outcomes from candidate scores. This allows me to quantify how much a higher score increases the probability of being hired while controlling for the binary nature of the outcome. This step builds directly on the earlier distribution analysis by testing whether the observed variation in scores meaningfully translates into differences in hiring success.

Assessment Score Predicting Hiring Success
Predictor	Coefficient	P-value	Odds Ratio	Odds Increase (%)	Nagelkerke R2
Score	0.051	0	1.052	5.2	0.077

The logistic regression shows a clear and meaningful relationship between assessment scores and hiring outcomes. Each one-point increase in score is associated with a 5.2% increase in the odds of being hired, indicating that higher-scoring candidates are consistently more likely to progress through the final selection stage. The coefficient of 0.051 reflects a positive slope, and the p-value (< 0.001) confirms that this relationship is statistically significant. The model’s Nagelkerke R² of 0.077 suggests that scores alone explain a modest but non-trivial proportion of the variation in hiring decisions, which is expected for a single predictor within a multi-factor recruitment process. Overall, these results provide strong evidence that the assessment contributes meaningfully to hiring outcomes.

To visualise this relationship, the plot below shows the predicted probability of being hired across the full range of assessment scores based on the logistic regression model.

The curve shows a steadily increasing hiring probability as scores rise, reflecting the positive relationship identified in the regression model.

To make the relationship between scores and progression easier to interpret, I group candidates into four score bands ranging from low to very high performance. This categorisation allows a clearer comparison of how candidates with different levels of test performance move through the recruitment funnel.

Score Band Interpretation
Score Range	Interpretation
0–49	Low
50–64	Moderate
65–79	High
80–100	Very High

Across all stages, higher score bands show consistently stronger progression. Low scorers advance at low rates (46% past screening, only 3.5% hired), while Moderate scorers improve only slightly. The largest jump occurs between Moderate and High, where progression nearly doubles at several stages (e.g., 25% → 43% reaching interview). Candidates in the Very High band perform strongest throughout, with 92% reaching screening and nearly 18% ultimately hired.

These patterns reinforce the earlier regression results: higher assessment scores reliably predict stronger movement through the funnel, indicating that the test is effectively distinguishing more competitive candidates.

3.2.3 Score by Recruiter

Next, I examine whether candidate scores vary across recruiters to assess whether differences in recruiter outcomes may reflect differences in the quality of applicants they manage.

Summary Statistics for Candidate Scores by Recruiter
Recruiter	n	Mean	Sd	Median	Min	Max	Range	Se
R1	610	65.99	13.40	66	24	100	76	0.54
R2	481	63.16	14.08	63	21	100	79	0.64
R3	454	63.02	13.71	62	26	99	73	0.64
R4	522	59.86	13.14	59	27	96	69	0.58
R5	433	63.47	13.51	63	29	100	71	0.65

Average scores vary slightly across recruiters. R1 handles the highest-scoring candidates (mean 65.99), while R4 handles the lowest (mean 59.86), with the others clustered around the low-to-mid 60s. Variability is similar across all recruiters, so the main difference is in average candidate quality rather than spread. This suggests recruiters may be sourcing candidates of slightly different ability levels, which could influence their apparent hiring performance.

The boxplot below visualises the full distribution of scores for each recruiter, highlighting differences in medians, spread and any outliers that are not visible from summary statistics alone.

To test whether the observed differences in average scores across recruiters are statistically meaningful, I conduct a one-way ANOVA comparing mean scores between recruiters.

Score by Recruiter ANOVA Results
R1 mean	R2 mean	R3 mean	R4 mean	R5 mean	F	df	p
65.99	63.16	63.02	59.86	63.47	14.47	4	0

The ANOVA confirms that score differences across recruiters are statistically significant (F = 14.47, p < .001). Although the absolute differences are modest, recruiters do manage applicants with meaningfully different average scores.

3.2.4 Score by Applicant Source

Next, I examine whether candidate scores differ by application source. This helps determine whether certain channels tend to attract stronger or weaker applicants.

Summary Statistics for Candidate Scores by Source
Source	n	Mean	Sd	Median	Min	Max	Range	Se
LinkedIn	1251	69.83	12.18	70	24	100	76	0.34
Job board	813	54.73	11.70	55	22	94	72	0.41
Careers page	436	59.92	11.22	59	21	99	78	0.54

Average scores differ clearly by source. LinkedIn delivers the strongest candidates, with a mean score of 69.83, well above the other channels. Careers page applicants sit in the middle, with an average of 59.92, while job board candidates score lowest on average at 54.73. Standard deviations and ranges are similar across sources, so the main difference is in overall level rather than variability. This suggests that LinkedIn is attracting the highest-performing candidates, while job boards contribute a weaker pool on average, which is likely to feed through into differences in progression and hiring outcomes.

The boxplot below provides a visual comparison of score distributions across sources, highlighting differences in medians and spread.

To test whether the observed differences in average scores across applicant sources are statistically meaningful, I conduct a one-way ANOVA comparing mean scores between sources.

Score by Source ANOVA Results
LinkedIn mean	Job board mean	Careers page mean	F	df	p
69.83	54.73	59.92	419.3	2	0

The ANOVA shows a highly significant difference in mean scores across sources (F = 419.3, p < .001). LinkedIn applicants score substantially higher than those from job boards or the careers page, confirming that sourcing channel is an important factor associated with candidate quality in this dataset.

3.2.5 Score by Job Role

Finally, I examine whether candidate scores differ across the four job roles. This helps identify whether certain roles attract stronger applicants or whether score patterns are consistent across the organisation.

Summary Statistics for Candidate Scores by Job Role
Job Role	n	Mean	Sd	Median	Min	Max	Range	Se
Software Engineer	781	63.56	13.78	63.0	24	99	75	0.49
HR Specialist	528	63.41	14.13	63.5	22	100	78	0.61
Data Analyst	731	62.62	13.69	61.0	26	100	74	0.51
Marketing Associate	460	63.22	13.11	63.0	21	100	79	0.61

Average scores are very similar across job roles, all falling between 62.62 and 63.56. The differences are small and not practically meaningful. Standard deviations and ranges are also nearly identical, indicating that roles attract applicants with comparable score profiles. This suggests that assessment performance is not systematically higher or lower for any particular role, and any role-specific differences in hiring outcomes are unlikely to be driven by underlying score differences.

The boxplot below visualises these distributions, showing the similarity in medians and spread across job roles.

To test whether the observed differences in average scores across job roles are statistically meaningful, I conduct a one-way ANOVA comparing mean scores between job roles.

Score by Job Role ANOVA Results
SE mean	HR mean	DA mean	MA mean	F	df	p
63.56	63.41	62.62	63.22	0.66	3	0.577

The ANOVA confirms that score differences across job roles are not statistically significant (F = 0.66, p = .577). This reinforces that candidates perform similarly on the assessment regardless of the role they apply for.

3.2.6 Score Analysis Summary

Overall, assessment scores show meaningful variation across candidates and relate strongly to hiring outcomes. Higher scores are associated with higher hiring probabilities, and score distribution shapes progression through the funnel. Recruiters manage candidates with slightly different average scores, while sourcing channel has a substantial impact on candidate quality, with LinkedIn delivering the highest-scoring applicants. In contrast, score patterns are consistent across job roles.

3.3 Recruiter Performance

This section examines differences in recruiter activity and effectiveness. Recruiters may vary in the number of candidates they manage, the types of applicants they attract and the hiring outcomes they achieve. Understanding these patterns helps identify whether observed performance differences reflect recruiter behaviour, candidate quality or underlying structural factors in the recruitment process.

3.3.1 Overview of Recruiter Activity

I begin by looking at how many candidates each recruiter manages and the proportion they ultimately hire. This provides a baseline view of workload and headline effectiveness before considering differences in candidate mix or sourcing patterns.

The recruiters handle notably different workloads. R1 manages the largest volume of candidates (610), while R3 and R5 manage the fewest (454 and 433 respectively), indicating a uneven distribution of applicant vloume across the team.

Recruiter Hiring Outcomes
Recruiter	Candidates	Hired	Hire Rate
R1	610	74	12.1%
R2	481	40	8.3%
R3	454	40	8.8%
R4	522	20	3.8%
R5	433	18	4.2%

Hiring outcomes differ substantially across recruiters. R1 stands out as the strongest recruiter, achieving the highest hire rate at 12.1 percent while also managing the largest applicant volume. R3 and R2 perform moderately well, with hire rates around 8 to 9 percent. In contrast, R4 and R5 convert far fewer candidates, with hire rates below 5 percent, indicating comparatively weaker performance or differences in the types of candidates they handle.

3.3.2 Recruiter by Source

Next, I examine how each recruiter’s applicant pool is distributed across sourcing channels. Since earlier results showed strong differences in candidate quality by source, understanding this distribution is essential for interpreting recruiter performance fairly.

Recruiters work with noticeably different source mixes, which may help explain some of the hiring gaps observed earlier. R1 works with the strongest candidate pool, with 67 percent of their applicants coming from LinkedIn. R2 and R5 also draw heavily from LinkedIn (54 percent and 49 percent respectively). In contrast, R4 relies primarily on job boards, with 58 percent of their applicants coming from this lower-scoring channel and only 31 percent from LinkedIn. R3 has the most balanced distribution, with 44 percent from LinkedIn and 32 percent from job boards. Because LinkedIn was shown to supply the highest-scoring candidates, recruiters with a higher LinkedIn share (such as R1) start with a quality advantage, while those reliant on job boards (such as R4) face a more challenging applicant pool.

To evaluate whether recruiter identity predicts hiring outcomes, I model hire probability using a logistic regression with rec_id as a categorical predictor. Because rec_id represents distinct groups rather than an ordered variable, the model compares each recruiter with a baseline recruiter. The resulting coefficients therefore reflect relative differences in hire likelihood across recruiters after accounting only for recruiter identity.

Recruiter Predicting Hiring Success
Recruiter	Coefficient	Std. Error	Z value	P-value	Odds Ratio	Odds Increase (%)
R2	-0.420	0.207	-2.034	0.042	0.657	-34.302
R3	-0.357	0.207	-1.725	0.084	0.700	-30.017
R4	-1.243	0.260	-4.788	0.000	0.289	-71.142
R5	-1.158	0.271	-4.275	0.000	0.314	-68.584

Using R1 as the baseline, all other recruiters show lower odds of hiring a candidate. R2 and R3 perform moderately worse, with odds of hiring about 30 to 34 percent lower than R1. However, only R2’s difference is statistically significant at the 5 percent level, while R3’s result is not significant, suggesting its effect may be due to sampling variation. The largest differences appear for R4 and R5, whose odds of hiring are around 70 percent lower than R1. Both effects are highly significant, indicating strong evidence that these recruiters perform meaningfully worse in terms of hire conversion within this model.

To examine whether recruiter performance depends on the type of applicants they manage, I extend the logistic regression to include an interaction between recruiter and source. This allows the effect of a recruiter on hiring outcomes to differ across LinkedIn, job boards, and the careers page. Because both predictors are categorical, the model produces a complex set of coefficients that are not straightforward to interpret directly. I therefore use estimated marginal means (EMMs) to convert the model output into adjusted hiring probabilities for every recruiter–source combination. These adjusted probabilities make the interaction clear, showing how each recruiter performs across different sources and how each source performs within recruiters, while holding all other model components constant.

Model-adjusted hiring probabilities: Recruiter x Source
Recruiter	Source	Predicted Hire Prob.	SE	95% CI Lower	95% CI Upper
R1	LinkedIn	0.156	0.018	0.124	0.194
R2	LinkedIn	0.111	0.019	0.078	0.155
R3	LinkedIn	0.124	0.023	0.085	0.178
R4	LinkedIn	0.043	0.016	0.020	0.087
R5	LinkedIn	0.075	0.018	0.046	0.119
R1	Job board	0.029	0.014	0.011	0.076
R2	Job board	0.033	0.014	0.014	0.076
R3	Job board	0.048	0.018	0.023	0.097
R4	Job board	0.023	0.009	0.011	0.048
R5	Job board	0.026	0.018	0.007	0.098
R1	Careers page	0.095	0.037	0.043	0.196
R2	Careers page	0.090	0.035	0.041	0.185
R3	Careers page	0.075	0.025	0.038	0.142
R4	Careers page	0.105	0.041	0.048	0.215
R5	Careers page	0.000	0.000	0.000	1.000

The estimated marginal means reveal clear differences in hiring probability across recruiter–source combinations, indicating that the effect of recruiter performance depends on the type of applicants they handle. LinkedIn consistently produces the strongest outcomes across most recruiters, with predicted hire probabilities ranging from 11% to 16% for R1 to R3. Job board candidates perform substantially worse, with predicted hire rates generally between 2% and 5% for all recruiters. Career Page outcomes vary more widely with R1 and R4 show comparatively stronger results with these candidates (around 10%), whilst R5 has a hire probability of 0%. These patterns suggest that some recruiters are more effective with certain sources than others, rather than demonstrating uniformly high or low performance across the board.

After estimating the interaction between recruiter and source, it is useful to examine whether differences between sources are statistically meaningful within each recruiter. To do this, I conduct pairwise comparisons of adjusted hiring probabilities using the estimated marginal means. These comparisons test, for each recruiter individually, whether LinkedIn, job boards, and the careers page differ significantly in their likelihood of producing a hire. This provides a more granular view of how source quality varies within recruiters and helps identify where the largest performance gaps originate.

Pairwise source comparisons within recruiters
Recruiter	Comparison	Odds Ratio	SE	95% CI Lower	95% CI Upper	p-value
R1	LinkedIn / Job board	6.086	3.198	1.776	20.854	0.0017
R1	LinkedIn / Careers page	1.752	0.789	0.610	5.033	0.4262
R1	Job board / Careers page	0.288	0.191	0.061	1.367	0.1464
R2	LinkedIn / Job board	3.700	1.833	1.158	11.819	0.0226
R2	LinkedIn / Careers page	1.271	0.599	0.421	3.833	0.8670
R2	Job board / Careers page	0.343	0.214	0.080	1.484	0.2008
R3	LinkedIn / Job board	2.821	1.248	1.000	7.955	0.0500
R3	LinkedIn / Careers page	1.758	0.747	0.649	4.762	0.3803
R3	Job board / Careers page	0.623	0.333	0.178	2.179	0.6494
R4	LinkedIn / Job board	1.873	1.018	0.524	6.695	0.4808
R4	LinkedIn / Careers page	0.379	0.220	0.098	1.473	0.2148
R4	Job board / Careers page	0.202	0.117	0.052	0.782	0.0155
R5	LinkedIn / Job board	3.030	2.310	0.508	18.082	0.3130
R5	LinkedIn / Careers page	3437964.633	1141388374.429	0.000	Inf	0.9989
R5	Job board / Careers page	1134528.329	376658925.248	0.000	Inf	0.9990

The pairwise comparisons confirm that most recruiters show clear and statistically significant differences in hiring probability across sources. For R1 to R3, LinkedIn consistently outperforms both the job board and careers page, with odds ratios well above 1 and p-values indicating meaningful differences. These results reinforce the earlier finding that LinkedIn is the strongest-performing source across the recruiter group. For R4, the comparison between LinkedIn and the careers page shows less separation, reflecting the more mixed performance observed in the EMMs.

A notable edge case occurs for Recruiter R5. Several contrasts involving R5 produce confidence intervals ranging from 0 to infinity. This happens because no applicants from certain sources (particularly the careers page) were hired by R5 in the dataset. With zero observed hires, the model cannot estimate a stable odds ratio, leading to unbounded intervals. Rather than indicating genuine uncertainty, these intervals simply reflect that R5 made no hires from those sources, and therefore the true odds ratio cannot be estimated. Taken together, the pairwise results reinforce the pattern that LinkedIn is consistently the strongest source, while R5’s outcomes vary due to very low hiring counts rather than meaningful source effects.

To visualise the recruiter–source interaction more intuitively, I present a heatmap of the adjusted hiring probabilities for each combination. This plot provides a quick, at-a-glance summary of where hiring probabilities are highest and lowest across the matrix of recruiters and sources, complementing the numerical EMM table.

To formally evaluate whether the recruiter × source interaction improves the model, I compare the recruiter-only logistic regression with the interaction model using a likelihood ratio chi-square test. This comparison assesses whether allowing recruiter effects to vary across sources provides a statistically better fit to the data than treating recruiters and sources as independent predictors.

The model comparison showed that adding the recruiter x source interaction significantly improved model fit, χ²(10) = 59.68, p = 0. In practical terms, recruiter performance is not uniform across sources, and the interaction captures real differences in how effective each recruiter is with candidates from different channels.

3.3.3 Recruiter by Role

As an additional descriptive check, I also look at how job roles are distributed across recruiters. This helps provide context for their workloads but is not intended as a deeper analysis of role effects.

The role mix varies noticeably across recruiters. R1 and R5 focus more on technical and analytical roles, with around 35 to 40 percent of their workload in Software Engineer and a similar share in Data Analyst. R2 handles a larger share of HR Specialist roles (38 percent), while R3 and R4 have a relatively balanced mix across all four roles. These differences show that recruiters are not working with identical role portfolios, although all still manage candidates across the full set of roles.

3.4 Source performance

This section examines how applicant sources differ in volume and hiring success. Earlier analyses showed that candidate quality varies notably by source, with LinkedIn producing higher-scoring applicants, and that source also moderates recruiter performance. Assessing overall source performance here helps connect those patterns to actual hiring outcomes and clarifies which channels contribute most effectively to the recruitment process.

First, I look at how many candidates come from each application source.

50% of the candidates in the dataset are sourced from LinkedIn. Job boards provide the next biggest source of candidates accounting for about 33% of the applicant pool. Then Careers pages source the lowest amount of candidates, accounting for the remaining 17% of the applicant pool.

I next compare hiring rates across sources to see how differences in applicant source translate into final outcomes.

Source Hiring Outcomes
Source	Candidates	Hired	Hire Rate
LinkedIn	1251	141	11.3%
Job board	813	25	3.1%
Careers page	436	26	6.0%

The pattern aligns closely with earlier findings. LinkedIn delivers both the largest volume of applicants and the highest hire rate (11.3%), reflecting its stronger candidate quality. Job boards produce a moderate number of applicants but a much lower hire rate (3.1%), while the Careers Page sits in between at 6%. These results reinforce the conclusion that LinkedIn is the most effective sourcing channel overall.

To test whether the differences in hire rates across sources are statistically meaningful, I conduct a chi-square test of independence on the source-by-hire contingency table.

Chi-Square Test for Differences in Hire Rates by Source
Statistic	Degrees of Freedom	P-value
48.88	2	0

The test shows a highly significant association between source and hiring outcome, indicating that the likelihood of being hired differs across applicant sources and is not due to random variation. This confirms that source quality meaningfully influences hiring success.

To identify which sources differ from each other specifically, I run pairwise proportion tests comparing hire rates between each pair of sources, with Bonferroni adjustment applied to control for multiple comparisons.

Pairwise Proportion Tests for Source (Bonferroni-adjusted P-values)
Source 1	Source 2	P-value
Job board	LinkedIn	0.0000
Careers page	LinkedIn	0.0058
Careers page	Job board	0.0629

The results show clear differences. LinkedIn has a significantly higher hire rate than both the Job Board and Careers Page (p < 0.01 for both comparisons). The comparison between the Careers Page and Job Board is not statistically significant after adjustment (p = 0.0629), suggesting their hire rates are more similar. Overall, the pairwise tests reinforce LinkedIn as the strongest-performing source.

3.5 Job Role Performance

This section examines how hiring outcomes differ across job roles. While earlier analyses focused on candidate quality and recruiter performance, looking at outcomes by role helps identify whether certain positions are inherently harder to fill.

The number of applicants differs noticeably across roles. Software Engineer and Data Analyst positions receive the largest volumes (781 and 731 candidates), while Marketing Associate attracts the fewest (460), with HR Specialist in the middle at 528. These differences reflect variation in applicant interest and labour-market supply rather than performance outcomes.

Job Role Hiring Outcomes
Role	Candidates	Hired	Hire Rate
Software Engineer	781	54	6.9%
HR Specialist	528	41	7.8%
Data Analyst	731	65	8.9%
Marketing Associate	460	32	7.0%

Hire rates differ only modestly across roles. Data Analyst has the highest rate at 8.9 percent, followed by HR Specialist at 7.8 percent. Software Engineer and Marketing Associate roles show slightly lower rates at around 7 percent. These differences are relatively small, suggesting that no role is substantially harder or easier to convert into hires within this dataset.

To assess whether these differences in hire rates across roles are statistically meaningful, I run a chi-square test of independence.

Chi-Square Test for Differences in Hire Rates by Job Role
Statistic	Degrees of Freedom	P-value
2.505	3	0.4743

The test shows no significant association between job role and hiring outcome, indicating that the small differences in hire rates are consistent with random variation. In practical terms, roles differ in applicant volume but not in their likelihood of producing a hire.

3.6 Predictive Modelling

The final stage of the analysis uses predictive modelling to assess how well the variables examined so far explain hiring outcomes and to identify which factors contribute most strongly to predicting a hire. I begin with a hierarchical logistic regression model, which quantifies the incremental variance explained by scores, recruiters, sources, and job roles. This provides a clear measure of the relative contribution of each factor to hiring probability.

I then apply two machine-learning approaches, a decision tree and a random forest, to explore potential nonlinear patterns and interactions that logistic regression may not capture. These models offer an alternative view of variable importance and allow visual interpretation of decision pathways and partial dependence effects.

The section concludes with a comparison of model accuracy on a held-out test set, providing a practical assessment of how well each method generalises to unseen candidates.

3.6.1 Hierarchical Logistic Regression

To quantify how much each predictor contributes to explaining hiring outcomes, I use hierarchical logistic regression. Predictors are added in steps, beginning with assessment score and then incrementally adding recruiter, source, and job role. This approach shows how much additional variance each factor explains beyond the variables already included, providing a clear view of their relative importance in predicting the likelihood of being hired.

Hierarchical Logistic Regression
Step	Model R²	ΔR²	Δχ²	df	P-value
Score	0.077	0.077	81.400	1	0.0000
Recruiter	0.102	0.025	27.250	4	0.0000
Source	0.109	0.008	8.331	2	0.0155
Job Role	0.112	0.002	2.462	3	0.4822

The hierarchical model shows that assessment score is the strongest single predictor, explaining 7.7% of the variance in hiring outcomes. Adding recruiter identity provides a further 2.5% improvement, indicating meaningful differences in recruiter effectiveness. Source contributes an additional 0.8%, a smaller but still statistically significant improvement. Job role adds very little and is not significant, suggesting that roles differ in volume but not in their ability to predict hiring. Overall, most of the model’s explanatory power comes from candidate score and recruiter effects.

3.6.2 Decision Tree

To explore potential nonlinear relationships and interaction effects that logistic regression may not capture, I fit a decision tree model to predict hiring outcomes. Decision trees split the data into progressively more homogeneous groups based on predictor variables, providing an intuitive, visual representation of how different factors combine to influence hiring decisions. This approach helps identify the key variables driving the model’s decisions and highlights any threshold effects in candidate score or differences across recruiters, sources, or roles.

The decision tree shows that candidate score is the primary driver of hiring predictions. The very first split occurs at a score of 71, indicating that candidates scoring below this threshold are overwhelmingly predicted not to be hired. This aligns with earlier findings that score is the strongest predictor of hiring outcomes.

Among higher-scoring candidates, the next important factor is recruiter. Candidates scoring 71 or above but handled by R2, R4, or R5 are generally less likely to be hired than those handled by R1 or R3. This reflects the recruiter differences observed in the logistic regression models.

Within these subsets, the model uses additional score thresholds and job role to refine classifications. For instance, among higher-scoring candidates assigned to stronger-performing recruiters (such as R1 and R3), another key split occurs around a score of 87, where only the highest-scoring candidates are predicted to be hired. Job role plays a smaller but still visible role, appearing only in downstream splits when the model fine-tunes predictions for borderline cases.

Notably, only a few terminal nodes predict “Hired,” which reflects the low base rate of hiring in the dataset. The tree shows that being hired typically requires meeting multiple favourable conditions: a sufficiently high score, being handled by certain recruiters, and in some paths belonging to certain roles.

Overall, the tree confirms three patterns: Score is by far the most influential variable, with clear threshold effects. * Recruiter differences matter, consistent with earlier regression findings. * Job role contributes modestly, mostly as a fine adjustment after score and recruiter effects have been accounted for. * This provides a visual summary of how the model combines these factors to classify hiring outcomes.

To better understand how the Decision Tree makes its classifications, we examine the variable importance measures generated by the model. In a decision tree, importance reflects how much each predictor contributes to reducing impurity (e.g., Gini impurity) across the splits where it is used. Variables that produce larger reductions in impurity, especially near the top of the tree, are considered more influential in determining candidate progression or hiring outcomes.

Assessing variable importance helps clarify not only which factors the model prioritises when forming decisions, but also whether these align with operational expectations. This provides an interpretable link between the model structure and the underlying drivers shaping recruitment outcomes.

Decision Tree Variable Importance
Predictor	Overall
score	15.600
rec_id	4.916
job_role	4.378
source	0.551

The Decision Tree indicates that score is by far the most influential predictor in determining progression or hiring outcomes. With an importance value of 15.600, it contributes the greatest reduction in impurity across the tree’s splits, meaning candidate scores are the primary factor driving the model’s decisions.

3.6.3 Random Forest

To complement the decision tree and obtain a more robust estimate of variable importance, I also fit a random forest model. Random forests average predictions over many decision trees built on bootstrapped samples, which reduces overfitting and provides more stable importance rankings. This method captures nonlinear relationships and interactions while offering a clearer view of which predictors consistently contribute to accurate hiring predictions.

Random Forest Variable Importance
Predictor	Importance
score	57.920
rec_id	16.578
job_role	13.771
source	8.128

The random forest results again highlight assessment score as the dominant predictor, with an importance value far exceeding the other variables. Recruiter and job role follow at a much lower level, contributing moderate predictive value, while source ranks lowest with relatively limited influence.

Partial dependence plots show how a single predictor influences model predictions while averaging over all other variables in the dataset. In this analysis, I computed the partial dependence of assessment score using the trained random forest model. This was done by generating a grid of score values, repeatedly predicting hiring probability while holding rec_id, source, and job_role at their observed values, and then averaging the predicted probabilities at each score level. This approach isolates the model’s marginal effect of score on hiring likelihood, independent of other factors.

The resulting plot indicates that score has a generally positive relationship with predicted hire probability, although the increase is not perfectly smooth. Random forests model effects using threshold-based splits, so partial dependence often displays small fluctuations rather than a monotonic curve. Despite these local variations, the overall pattern is clear: predicted probabilities remain low and flat at the lower end of the score range, begin to rise from around the mid-60s, and reach their highest levels for candidates scoring above 85. This reinforces that score is the model’s strongest predictor of hiring outcomes, even though the effect is expressed in a step-wise rather than linear form.

3.6.4 Model Comparison

To evaluate the predictive performance of the two machine learning models used in this analysis, I compare their classification accuracy on the same held-out test set. The goal is not to determine an absolute “best” model, but to assess whether the more complex random forest meaningfully improves predictive accuracy over the simpler decision tree. This comparison provides a practical check on whether increased model complexity results in better generalisation and helps guide which model is more suitable for interpreting hiring outcomes in this dataset.

Model Comparison, Predictive Accuracy (Test Set)
Model	Accuracy
Decision Tree	0.916
Random Forest	0.927

The results show that both models achieve high accuracy, with the decision tree correctly predicting 91.6 percent of hiring outcomes and the random forest improving slightly to 92.7 percent. The difference is small, but the random forest performs better, reflecting its ability to capture more nuanced patterns by averaging across many trees. In practice, this suggests that while the decision tree provides clearer interpretability, the random forest offers marginally stronger predictive power. Both models perform well overall, indicating that the key predictors identified earlier meaningfully explain hiring outcomes.

4 Summary

This analysis examined every stage of the recruitment funnel to identify where candidates were most likely to drop out and which factors most strongly influenced hiring outcomes. Assessment scores emerged as a meaningful predictor of progression, while recruiter differences and applicant source played significant roles in shaping hiring probabilities. Job role, in contrast, explained relatively little variation. The hierarchical logistic regression quantified these contributions, showing that score and recruiter effects accounted for most of the explained variance. Predictive modelling using decision trees and random forests confirmed these findings, with both models performing well and the random forest offering slightly higher accuracy. Taken together, the results provide a coherent picture of a selection process driven primarily by candidate quality and recruiter effectiveness, moderated by differences in applicant source.