Paper Critique: Towards a Theory of Software Development Expertise

Towards a Theory of Software Development Expertise Sebastian Baltes and Stephan Diehl.
Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018).

I remember when I was just starting out as a software developer. I had moments of complete certainty about my utter expertise in the profession. I also had moments of complete despair that I had any clue at all about what I was doing. Reading books like The Pragmatic Programmer helped me get a sense of what it meant to grow in expertise. This paper by Baltes and Diehl tackles this same question with a Grounded Theory approach.


Adrian Coyler wrote on his excellent blog, the morning paper his own summary, which I suggest you go and read after you make it through mine. I considered simply linking to his summary and skipping straight to the critique, but I want to highlight provide a summary that fits directly with the critique.

What we have here is a mixed methods study (there is qualitative as well as some quantitative analysis of the data) on the topic of software development expertise. Specifically, the authors are interested in how expertise in software development is built.

With this paper, we contribute a theory that describes central properties of SDExp and important factors influencing its formation. Our goal was to develop a process theory, that is a theory intended to explain and understand “how an entity changes and develops” over time. In our theory, the entities are individual software developers working on different software development tasks, with the long-term goal of becoming experts in those tasks

Grounded Theory

They investigate this question by conducting an online survey of StackOverflow and Github users, coding the responses to open-ended questions, extracting categories, building a theory of those categories, and then augmenting this theory with theories from research into expertise and expert performance.

Three different online surveys were conducted to collect data. Each survey targeted a different group. The first survey (122 responses, 12% response rate) performed provided the data that resulted in the first version of the theory. The latter two surveys were designed as follow up questions to two different groups: active Java developers (127 responses, 8% response rate) and very experienced developers (86 responses, 10% response rate).

The result of the coding and categorising exercise for the first survey resulted in the following initial theory.

The next step of the authors’ research was to augment the theory that they had constructed with existing theories and concepts from expertise research. That resulted in the much larger model below.

The theory that they are putting together is a process theory. It describes the way in which expertise develops but does not describe a particular outcome as the connection between independent and dependent variables (a variance theory).

…our goal is not to treat performance as a dependent variable that we try to explain for individual tasks, we rather consider different performance monitoring approaches to be a means for feedback and self-reflection.

Other concepts from the original theory were modified in the following ways

  • Behavior => Individual differences + Behavior. Individual differences capture the personality and cognitive traits of the person who is acquiring expertise.
  • Work context => Task + Task Context. This captures that individual behaviours are specific to a particular task and the context of that task rather than to all work that could be done.
  • Knowledge => Task specific knowledge + General knowledge
  • Experience => Task specific experience + General experience

At this point, I believe that the theory has become ungrounded in the data. The new theory, in the words of the authors, “instead of focusing on source code, it introduces the general concept of performance as a result of having a certain level of expertise.” This and other new concepts do not seem to be tied back to the open coding of the survey responses.

The next phase was meant to validate and expand the theory further. For this, they surveyed two further groups: active Java developers and very experienced developers. In these new surveys, they asked for what character traits people associated with being experts (“In your opinion, what characterizes a software development expert?”) and what supports becoming an expert (“What character traits or behaviors do you think are supportive for becoming a software development expert?”). Note that the questions are not reported in the article but can be found in the supplementary materials available online. The new surveys also expanded on the respondent’s self-assessment of expertise. I’ll discuss the self-assessment results in the next section.

The result of this phase was to elaborate on the makeup of the various concepts. For instance, the Personality concept consists of Openness Agreeableness and Conscientiousness. The coding of the responses appears to re-ground the theory in the data as, for example, the Performance concept is linked again in the responses to code quality. The authors nicely provide quotes to support their coding of the responses. This brings their coding to life.

Experience and Expertise

This analysis investigated how years of experience, self-assessment of expertise, and a self-assessment of expertise guided by descriptions of the levels differ.

From the observed correlations, we cannot draw consistent con- clusions that are valid for all three samples and for both types of experience (general and Java). Our interpretation of these results is that, depending on the background of the participants, experience in years can or cannot be a valid proxy for (self-assessed) programming expertise.

Which is to say that experience in years cannot be a valid proxy as it is only dependable if your sample can be clearly taken from a specific group. In this case from the less experienced group.

The next question is whether people’s self-assessments are influenced by the guidance of the question. Participants were asked for their self-assessment at two different points in the survey. The first one was on a simple 7-point Likert scale from novice to expert. The second was on a 5-point scale with descriptions based on the Dreyfus model.

The result of these different questions was that the less experienced (by years) developers judged their expertise to be the same (statistically) between the two rating scales and the more experienced increased their self-assessed expertise when the Dreyfus model descriptions of expertise were given.


The authors lay out a clear goal of their research and follow through on it with their construction of a model of interacting concepts to describe the process of growing software development expertise. I’m going to stick analysing to the qualitative parts of this paper (the theory building) and leave the quantitive parts (the investigation of self-assessment) as I don’t want to get bogged down into many different ways of looking at the paper.

Rigour and Quality

I do not believe that what the authors performed was actually Grounded Theory. The result is a model does not succinctly explain a phenomenon and may not be well grounded in the data. They also provide little evidence of searching for theoretical saturation (they have covered all of the theoretical concepts that their theory proposes and implies and new data does not add anything new) as the stopping criteria for their theoretical sampling (looking for cases that can modify the theory as it is being built).

The first stumbling block for the researches is their reliance on one-off online surveys to collect data. The survey format allowed for a large number of responses but not for a deep investigation into the way that each participant thought about expertise. The depth is limited by whatever level the individuals wanted to reach and removed any ability of the researchers to probe further. I believe that this shows up in how anaemic the phase 1 model was.

The next departure was how radically different the phase 2 model was from the phase 1 model. The result appears to be a grounded theory that has been more overrun by the literature review and integration of existing theories than one that has been enriched by it.

The final departure is on the principle of parsimony. The final theory has 10 high-level concepts and 49 sub-concepts, plus a number of connections, annotations, and clarifications. I would like to have seen more distillation to reach a single central category and the small number of connections that explain it.

I think what might have happened is that in an effort to represent all of the data that they had available to them, the authors shied away from the distillation and interpretation that Grounded Theory asks. The model and article became a recitation of categories (at one point during my reading I noted “stop listing so much and tell the story” in the margin). I would love to see the authors continue on this line of inquiry by continuing to refine the model by conducting interviews and observations with software professionals and reducing the model to the kernel needed.

Wider Potential

The authors entered this research with two questions:

RQ1: Which characteristics do developers assign to novices and which to experts?

RQ2: Which challenges do developers face in their daily work?

Both of these questions are asked again and again in the software development world. We confront them whenever a junior developer asks what she needs to do to advance to senior level or an engineering manager works out what criteria a software engineering department should have as part of their professional development ladder (just google “software career ladder” and see the number of questions and posts).

I don’t believe that there is a single ladder that works for all people or all groups, but having some solid research into how people construct the differences and the challenges would help guide professionals as well as researchers. Professionals would get guidance beyond their own sphere of knowledge. Researchers would get guidance about how to interpret the social dynamics that they find in software organisations.


I like the idea of this research and I like the approach that the authors laid out. The questions they pose are relevant to both research and industry and the approach they chose could create interesting findings.

The execution and reporting of the findings seem only half complete. The theory is not clear. The reporting doesn’t leave me with a sense of what the participants were thinking and feeling. The sampling seemed incomplete.

That said, I found that even in the current state that it was a bit of food for thought. If I was currently working on refining a career ladder for my teams, I would find the list of concepts and categories in this paper to be useful food for thought.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s