At the university that I have been working for almost seven years by now, improvement efforts turned to performance evaluations recently, after previous improvements. I found the new performance evaluation method as ineffective and biased as the previous method. In this text, I suggest a method that has better properties to account for performance in research, service and teaching – the three domains university professors are expected to perform in.
Performance in any area can be measured by using a reference. Performance can be thought of as a measure of improvement, e.g., the distance in achieving an objective with respect to a starting point, within a fixed amount of time period. Performance can be thought of as a measure of growth, e.g., the time to accomplish/ reach an objective with respect to a starting point in time, within a fixed amount of objective distance.
In a Formula One race, performance is a measure of growth, the driver (and her/ his team) who can complete a set distance in a track at the lowest time period wins the race, displaying the highest performance. In a soccer game, performance is a measure of improvement. The team that scores the most goals against an opposing team, within 90 minutes, wins the game. All games, basically, ranging from board games to chess and to online games share a basic performance measure of these two types. Performance is either a measure of improvement or a measure of growth. As a consequence, games are loved at all ages, and universally. In some games, such as those played with LEGO bricks, the performance type and its characteristics can be decided by the game player. A six-year old may envision making a robot using LEGO bricks from a set designed to make a LEGO castle. In building either the robot or the castle, performance is a measure of improvement, how well the robot or the castle will fit how our creative player envisions the robot or the designers at LEGO made the castle prototype.
There are sources of complexity in performance measurement. Imagine getting a sketch of a castle in a LEGO set, instead of the exact picture. Majority of customers would have been frustrated. Naturally not everyone in a society has the set of attitudes and skills to make them appreciate building a castle from a sketch. Therefore an architect is not an engineer or a builder. Similarly, imagine the same LEGO set to come with a perfect picture of the castle, yet with a description of how to make the bricks from polymer pellets of different colors in packages. If that description is not communicated at the most basic terms, frustration is more likely. What is a polymer? How about polypropylene? Furthermore, even a well-communicated description may not be well-defined. Imagine that our castle is displayed from a more aesthetic angle that obscures the facades of the building, e.g., the main door is never displayed. Where to place the gates of the castle is then an ambiguous objective, therefore subject to interpretation, and open to comments.
In settings beyond games, objectives are typically not well-defined, due to various sources of complexity. Even in organizations with a high level of management quality, objectives cannot be exact. They can rather be quantified approximations, supported with qualitative elaborations. For example a revenue of two hundred million Euros next year is indeed an approximation and not an exact target. In the case of a hundred and ninety three million Euros, that pleases all stakeholders, unless that seven million Euros is extremely important to pay bonus to employees and dividends. However, if that is the case, then that should be an objective by itself – and one that is still not as exact. Many employees or shareholders would then be upset if they are not paid bonuses or dividends if the firm makes six point five million Euros towards that goal, failing short of 0.5 million Euros. Hence such exact targets are only saved as a mechanism to reach goals where contracts have to be specific, as in crowdsourcing campaigns. Yet, in repeated settings, as those set between labor-providers (people and robots) and employment-providers (organizations including firms and universities), exact objectives without an acceptable range below and up would make any collaboration impossible beyond a single term. Thus, good objectives are quantified to some extent, yet even then are within somewhat well-defined ranges. Then the next question becomes, whether the organizations and labor-providers within, have the abilities to set these quantities to be valid in conforming to real performance, in both short and long-term effects, and reliable to provide a measure that is context-independent. Unfortunately, that reminds of Gödel’s theorem (e.g., Raatikaineen, 2013) at limits where we assume a perfectly capable organization with zero-error making labor-providers. That is, it is impossible to rely on setting well-defined objectives and expect to measure performance (to manage) through quantified clauses alone.
In academia, these limitations resulted in a declaration on research assessments in the Annual Meeting of the American Society for Cell Biology in San Francisco, in 2012, DORA. Then only recently in 2024, a project for improved assessment quality, TARA, announced that they have produced some tools that can be used instead of using journal scores as an indicator of research quality for an article, and typically and very bluntly as an indicator of the labor-provider’s, i.e., researcher’s. Certainly, an educator would recognize the problem of ascribing performance in a particular project to an individual, e.g., “A-student”, and not to the task, e.g., “A-paper”. The very distinction between adopting a growth mindset and a fixed mindset, thinking that A-student produces an A-paper, instead of A-paper can be produced by anyone when they learn how to produce A-papers, see Dweck and Yeager, 2019 about implications related to “attribution bias” and “learned helplessness”.