Do teams improve by working together over time?

Research from the Collective Intelligence Labs at the Stockholm School of Economics found that teams without structured reflection on their coordination actually declined in performance over approximately two months. Observer-rated performance dropped by roughly 6.5%. The assumption that experience alone builds better teamwork was not supported.

What is the 'second best' team phenomenon?

The 'second best' phenomenon describes teams that perform just below top performers while carrying significant hidden coordination weaknesses. These teams had weaker task understanding, lower integration behaviors, and lower psychological safety — but produced decent output. They consistently overrated their own performance relative to outside observers and resisted interventions designed to improve coordination.

Why are confident teams a coordination risk?

Teams that overrate their coordination quality are unlikely to seek help, unlikely to accept feedback, and unlikely to believe they have a problem. Their decent output masks structural weaknesses in how they integrate knowledge and make decisions together. When faced with genuinely ambiguous or high-stakes decisions, they lack the integration behaviors and psychological safety needed for effective participatory decision-making.

How can organizations detect invisible coordination problems?

Coordination quality has identifiable structural components — shared task representation, integration behaviors, reflective capacity, and psychological safety — that can be observed and measured. Research shows that digital instrumentation can achieve performance improvements of approximately 20% by making these coordination behaviors visible to teams. Without external measurement, teams default to self-assessment, which the research shows is unreliable for the teams most at risk.

What is decision reliability infrastructure?

Decision reliability infrastructure instruments the coordination layer so that the structural quality of how teams decide together becomes observable, measurable, and improvable. Rather than replacing frameworks or processes, it shows whether the coordination those frameworks depend on is actually happening — addressing the gap between documented process and actual team behavior that research identifies as a key risk factor.

Your Most Confident Teams Are Your Biggest Risk

There’s a common assumption in how organizations think about teams: that working together makes teams better at working together. Experience builds intuition. Repetition builds cohesion. Time together, over enough iterations, produces something greater than the sum of individual capability.

Research from the Collective Intelligence Labs at the Stockholm School of Economics suggests this assumption is wrong — and that the failure mode it creates is more dangerous than most leaders realize.

The study

Researchers Philip Runsten and Andreas Werr studied 50 knowledge-intensive teams across 22 international and Swedish organizations — both public and private — over approximately two months. Half the teams received a structured digital intervention — a self-guiding team debrief designed to make coordination behaviors visible and reflectable. The other half served as controls. They just kept working.

Both groups were assessed on performance (self-rated and observer-rated) and on a set of knowledge integration variables: how well team members understood their task, how effectively they brought diverse expertise into discussions, and whether the psychological safety existed to surface disagreements and learn from mistakes.

Knowledge integration: three layers

The framework Runsten and Werr used breaks knowledge integration into three layers. Representation: does the team share a clear, complete understanding of the task and who knows what? Integration: do members actually bring their expertise to bear, coach each other, challenge assumptions? Reflection: does the team learn from what happened — and is it safe enough to be honest about what went wrong?

Finding one: teams degrade without structured reflection

The control teams — the ones that simply continued working together without any structured reflection on their coordination — didn’t hold steady. They declined. Observer-rated performance dropped by roughly 6.5% over the same period. Self-rated performance declined by about 4.6%.

This isn’t noise. This is measurable degradation in teams that, by the organization’s standards, were functioning normally. No intervention. No disruption. Just the passage of time and the quiet erosion of coordination quality that nobody was watching.

The intervention teams, by contrast, improved by approximately 15% (self-rated) to 21.5% (observer-rated). And the mechanism wasn’t additional training, new processes, or leadership coaching. It was a structured digital debrief — a tool that made coordination behaviors visible so teams could reflect on them.

The gap between the two groups isn’t subtle. It suggests that the default trajectory for teams is not improvement through experience. It’s decay through inattention.

Finding two: the “second best” phenomenon

This is where the research becomes unsettling.

When Runsten and Werr segmented the intervention teams by performance, an unexpected pattern emerged. The top-performing teams improved. The lower-performing teams improved. But a cluster of teams ranked just below the top — the “second best” — actually got worse after the intervention.

Their profile before the intervention is telling. These teams had weaker understanding of their own task than both the top performers and the broader group. They scored lower on integration behaviors — meaning less peer coaching, less expertise surfacing, fewer substantive challenges to each other’s thinking. They had lower psychological safety. And critically, they consistently rated their own performance higher than outside observers did — by a margin of 0.36 units that held constant before and after the intervention.

Despite all of this, they were producing decent output. From the outside, these teams looked fine. They weren’t flagged. They weren’t struggling in any visible way. They were the teams leadership would have the least reason to worry about.

But underneath the surface, their coordination was thinner than it appeared. Individual competence — or familiar task patterns, or organizational momentum — was carrying the performance, not the quality of the team’s actual collaboration.

The catch-22

Here’s what makes the “second best” finding genuinely concerning: when the intervention made coordination behaviors visible, these teams didn’t improve. They resisted.

The study references Chris Argyris’s work on defensive routines in organizations. Teams composed of high-performing individuals who are unaccustomed to examining their own coordination develop a particular kind of fragility. They avoid sensitive topics. They maintain polite surfaces. They hold individual interpretations intact rather than truly integrating knowledge. Reflection feels threatening because what it reveals contradicts their self-image.

One member of a “second best” team summarized their experience: “My strongest impression is that we as a team already are very good at many of the things this study wants to demonstrate, which is positive!” Another reported that the intervention “created some discontent and irritation during the boost-sessions.” The self-overrating — that persistent 0.36-unit gap between how they scored themselves and how observers scored them — didn’t change at all. Before the intervention: overrated. After: still overrated by exactly the same margin.

Compare this to the top-performing teams, who actually rated themselves lower than observers did — and whose self-assessment accuracy improved over the study period. The best teams were humble about their coordination. The “second best” teams were confident about theirs. The confidence was unfounded.

What this means for organizations

The immediate implication is that the teams most at risk of invisible coordination failure are the ones least likely to seek help, least likely to accept feedback, and least likely to believe they have a problem. They’re performing well enough that nobody intervenes. They’re confident enough that they resist when someone does.

But there’s a deeper risk here.

These teams are succeeding on familiar terrain. Their output is adequate because the tasks are within range — routine enough that individual competence and established patterns can compensate for weak integration. The coordination debt doesn’t show up in the output. Not yet.

The moment the task becomes genuinely ambiguous — a novel strategic decision, a cross-functional tradeoff with no clear precedent, a situation where no single person holds enough context to decide alone — those teams have nothing to fall back on. The integration behaviors aren’t practiced. The psychological safety to say “I don’t understand” or “I disagree” hasn’t been built. The shared understanding of the task is thinner than anyone on the team realizes.

And the self-assessment blindness means the team won’t see the problem coming. They’ll enter the high-stakes moment with the same confidence they’ve always carried, unaware that their coordination has been quietly degrading underneath performance that looked fine.

This is how organizations produce decisions that, in retrospect, seem inexplicable. Not from obviously dysfunctional teams, but from capable teams whose coordination was never measured, never made visible, and never structurally supported.

Coordination is observable

What the Runsten and Werr study demonstrates — across all team segments, not just the “second best” — is that coordination quality is not a byproduct of time together or individual talent. It’s a structural property with identifiable components: shared task representation, integration behaviors, reflective capacity, psychological safety. These components can be observed, measured, and developed.

But only if something makes them visible.

The control teams degraded because nothing was watching. The “second best” teams resisted because what they saw was uncomfortable. The top teams improved because they were already inclined to look honestly at their own coordination — and the structured reflection gave them a mechanism to do so.

The study also offers a striking validation of digital tooling: the self-guiding digital debrief achieved performance improvements comparable to facilitator-led interventions that typically show 20-25% gains. You don’t need a consultant in the room. You need a reliable mechanism for making coordination quality visible to the team itself.

At Growth Wise, this is the problem we are working on. We call it decision reliability infrastructure — instrumenting the coordination layer so that the structural quality of how teams decide together becomes observable, measurable, and improvable. Not replacing frameworks or processes, but showing whether the coordination those frameworks depend on is actually happening.

The research suggests this isn’t optional. Without it, teams degrade by default. And the ones most confident they don’t need it may be the ones who need it most.

Reference

Runsten, P. and Werr, A. (2020). Knowledge Integration and Team Performance — The Effect of a Digitally Supported Team Debrief. Collective Intelligence Labs, Stockholm School of Economics.

Your Most Confident Teams Are Your Biggest Risk

The study

Knowledge integration: three layers

Finding one: teams degrade without structured reflection

Finding two: the “second best” phenomenon

The catch-22

What this means for organizations

Coordination is observable

Reference

Frequently Asked Questions

Do teams improve by working together over time?

What is the “second best” team phenomenon?

Why are confident teams a coordination risk?

How can organizations detect invisible coordination problems?

What is decision reliability infrastructure?

Related Articles

Why Teams Keep Reopening Decisions

The Science of Closure Quality

Beyond DACI, RAPID, and SPADE

Stay close to the research