How Google Proved Managers Matter by Measuring Situations, Not Personalities

Google tried to prove managers do not matter. Instead, they discovered that technical expertise - the trait most companies hire for - ranked dead last.

In 2002, Larry Page and Sergey Brin ran an experiment. They eliminated all engineering managers at Google. Every engineer would report directly to a VP. No middle management. No hierarchy. Pure engineering meritocracy. The experiment collapsed within months.

Seven years later, Google's People Analytics team asked the question again - not as an ideological experiment this time, but as a data problem. The research project they launched, called Project Oxygen, would fundamentally challenge how organisations think about what makes someone effective in a role. And the finding that mattered most was the one nobody expected.

Deliberately trying to prove managers do not matter

This is the detail that makes Project Oxygen credible in a way that most corporate research is not. The team - led by Prasad Setty, VP of People Analytics, and researchers Neal Patel and Michelle Donovan - did not set out to identify what makes a good manager. They set out to prove that managers are unnecessary.

They knew Google's culture. Engineers distrusted management on principle. If the research team had started from the premise that managers matter, the results would have been dismissed as self-serving HR propaganda. So they began with the null hypothesis: managers do not affect team outcomes. Prove it with data.

They gathered over 10,000 observations across more than 100 variables - performance reviews, employee surveys, 360-degree feedback, nominations for top-manager awards, and over 400 pages of double-blind qualitative interview notes. The analysis was rigorous enough to satisfy a company where statistical literacy is a baseline expectation.

The null hypothesis failed. Managers mattered enormously. Teams with great managers consistently outperformed teams with poor managers on every metric Google tracked - productivity, retention, satisfaction, and innovation. But whatmade a manager great was where the real insight lived.

Technical expertise ranked last

Project Oxygen identified eight key behaviours that distinguished the best managers from the worst. The research team ranked them by impact. At the bottom of the list - position number eight out of eight - was “has key technical skills that help advise the team.”

At Google. The company that had built its entire identity on technical excellence. The company where engineers evaluated each other on the quality of their code. The company that used to require candidates to solve algorithmic brainteasers in interviews. At that company, the data said technical skill was the least important predictor of management effectiveness.

What ranked at the top? Being a good coach. Empowering the team rather than micromanaging. Expressing genuine interest in team members' well-being and success. Being a good communicator who listens. These are not personality traits. They are observable behaviours that occur in specific situations - in one-on-one meetings, during project reviews, when a team member is struggling, when priorities shift.

As Laszlo Bock, then SVP of People Operations, explained in the Harvard Business Review, the approach worked because the attributes were about them, created by them, and designed for their specific context - not imported from a generic leadership model.

The death of the brainteaser

Project Oxygen was not Google's only revolution in people assessment. Around the same period, the company overhauled its hiring process after discovering that its famous brainteaser interviews - questions like “How many golf balls fit in a school bus?” - had zero statistical correlation with job performance.

Google had been using these questions for years, believing they measured creative problem-solving and cognitive ability. When Bock's team actually tested that assumption against performance data, the correlation was not weak. It was nonexistent. The brainteasers were measuring a candidate's ability to answer brainteasers - nothing more.

Google replaced them with structured interviewing: identical questions for every candidate, graded on standardised rubrics by trained interviewers. They discovered what they called the Rule of Four - that four structured interviews predicted hiring success with 86% confidence. This eliminated the marathon interview loops of 15 to 25 sessions that had previously been standard.

The pattern is the same in both cases. Generic, decontextualised assessment (personality labels, brainteaser performance, trait inventories) was replaced by structured, situational evaluation (how does this person behave in the specific situations this role requires?). And the situational approach won decisively.

What this reveals about traditional psychometrics

Google's findings map directly onto the broader research on personality-based assessment. The landmark 2022 meta-analysis by Sackett, Zhang, Berry, and Lievens found that the predictive validity of personality tests for job performance had been overestimated for decades. Even Conscientiousness - the single strongest personality predictor - explains roughly 3.6% of variance in actual performance.

Google proved this empirically within their own organisation. Their best engineers were not necessarily their best managers. Their most cognitively gifted hires were not necessarily their highest performers. The traits that conventional assessment elevated - cognitive ability, technical expertise, confidence - were either weakly predictive or actively misleading when applied to the wrong role context.

What predicted effectiveness was behavioural. How someone actually responded when a team member needed coaching. How they communicated when priorities changed. Whether they created psychological safety or demanded compliance. These are not fixed personality dimensions. They are situational responses that vary based on the pressures of the specific role.

Google did not discover new traits that predict success. They discovered that traits are the wrong unit of measurement. What matters is behaviour in context.

The results were measurable

Project Oxygen was not a theoretical exercise. Google used the findings to redesign manager training, feedback systems, and promotion criteria. By 2012, median manager favourability scores had risen from 83% to 88%. More importantly, the company achieved statistically significant improvement in 75% of their worst-performing managers - the population where impact matters most.

They later expanded the framework to Project Aristotle, which studied team effectiveness and found that psychological safety - not talent composition, not cognitive ability, not personality mix - was the single strongest predictor of team performance. Again: a situational condition, not a trait profile.

Why this matters for every organisation

Most companies are not Google. They do not have a dedicated People Analytics team with data scientists and a decade of performance data. But the principle Google validated is universal: what predicts success in a role is how someone behaves under the specific pressures of that role, not what personality label they carry into it.

The implication for assessment is profound. If you are evaluating a candidate for a management role and your primary tools measure personality traits and cognitive ability, you are measuring the two things Google's data showed matter least. If your assessment does not test how the candidate responds to realistic scenarios drawn from the actual pressures of the role - team conflict, priority shifts, underperformance, ambiguous authority - you are making the decision with the least predictive information available.

Google invested millions of dollars and years of research to arrive at a conclusion that is simple to state and difficult to act on: stop labelling people and start testing how they respond to the situations they will actually face.

Further Reading

How Google Sold Its Engineers on Management

Harvard Business Review

This analysis is part of our ongoing research into defensible people decisions. PERSONA is the assessment platform we built from six years of this research to help organizations test how people respond to the specific pressures of a role, not label them with fixed traits.