Fairer Citation Based Metrics

I describe a simple modification which can be applied to any citation count based index (e.g. Hirsch’s h-index) quantifying a researcher’s publication output. The key idea behind the proposed approach is that the merit for the citations of a paper should be distributed amongst its authors according to their relative contributions. In addition to producing inherently fairer metrics I show that the proposed modification has the potential to normalize partially for the unfair effects of honorary authorship and thus discourage this practice.


Introduction
The concept of quantification is intrinsic to the scientific method. Considering the central and pervasive role that quantification plays in science, it should come as no surprise that the magnifying glass would be turned back on itself and that scientists would want to quantify aspects of their own work. In particular in this paper I am interested in considering various indexes which have become commonplace metrics of a researcher's output.
Broadly speaking, the intended purpose of indexes discussed herein is to ''quantify the cumulative impact and relevance of an individual's scientific research output'' to quote Hirsch, the author of one of the most widely used indexes [1]. What is more they aim to achieve this quantification using citation statistics of the individual's publications as the observable input measurements. This very idea has produced much controversy [2][3][4]. I too argue that the subjective understanding of what 'impact' means in this context inherently makes the very aim of its objective quantification a non-scientific proposition. Considering the lack of an objective basis, the ground truth if you will, for assessing the performance of a particular index, unlike different previous authors (e.g. see h-index [1], e-index [5], g-index [6], z-index [7], i10-index [8]) who have described and argued in favour of different indexes [9,10] in this paper I do not propose a novel index per se. Rather, accepting the pragmatic standpoint that for better or worse citation indexes are being increasingly used in academia [4], I show how a simple modification, applicable to any citation count based index, can make it ipso facto fairer.

Contribution Weighted Citations
As the starting point to motivate the key idea, contemplate the following thought experiment and the question which naturally emerges from it. Consider a particular publication and two alternative scenarios: in one scenario the entire work is performed by a single author, in the other by two or more authors. The question I ask is: Is the contribution to the paper's impact of the sole author in the former scenario equal to the contributions of each of the authors of the latter scenario? Given that the totality of the work is the same and that in the latter case it is produced by a joint effort, it seems clear that the answer is no. What is more in the latter scenario the claim by each of the authors to the total impact of the work should not be equal but portioned according to the authors' relative contributions. Therefore I propose the following. Let us express a specific citation based index as a function f ðcitations 1 ; . . .; citations n Þ where citations i is the number of citations of a person's i-th publication (of n in total). I argue that regardless of the index used, i.e. regardless of the form of f, a fairer quantification using the same baseline idea can be achieved by evaluating: o citations 1 ; . . .; citations n ð Þ ¼ f citations 1 Â auth rank À1 1 ; . . .; citations n Â auth rank À1 where auth rank i is the rank of the researcher in the list of i-th paper's authors. The expression in Eq. (1) exploits the observation that the order of individuals in the list of a paper's authors conveys information about their relative contributions: the first author contributed at least as much as the second, the second at least as much as the third, and so on. This allows us to derive the upper bound of the relative contribution of the i-th author as auth rank À1 i . It is simple to see that this upper bound is achieved when the first i authors contribute equal amounts and the remaining authors nothing at all.
It is important to recognize the crucial difference in what I propose and the previous work on research output quantification. In particular I am referring to the nature of the sole assumption I make: that the ordering of authors reflects their relative contributions. Its validity is virtually ensured by the competing interests of authors; for one author to be promoted to a higher rank in the list of the paper's authors, another one must be demoted. This is in stark contrast to previous ideas which align the interests of all authors of a single paper and thus provide incentive to researchers to act in ways other than in ''the best spirit'' of academic publishing (e.g. by adding to the list of authors individuals who had not contributed to the work-I will come back to this shortly).

Analysis and Discussion
Recall that one of the key ideas motivating the proposed modification is that the total merit for the paper's impact should be unaffected by the number of researchers that authored the work. Consider the simple citation count quantification of output, the c-index for short. If a particular paper was authored by n authors and cited c times, the totality of the merit for the paper's impact is n 9 c since the citation count c contributes to all of the authors' c-index. Clearly, this is a linear function of n. If the proposed modification is introduced, the total merit becomes P n i¼1 i À1 . While this is still a function of n (the ideal characteristic would be a horizontal line with the ordinate value of 1), the growth is very much supralinear, as illustrated in Fig. 1. In addition to the fundamental argument laid out previously, this is important because it disincentivizes dishonest addition of non-contributing persons to the list of a paper's authors. Without the proposed modification, the incentive is high because all individuals involved stand to gain benefit, e.g. the person added as an author gets additional merit from all the citations of the paper while the actual authors of the paper gain by the expected reciprocal behaviour (i.e. by being themselves added as authors to papers that they have not contributed to) [11]. While the situation remains a positive sum game, with the proposed modification the incentive for such behaviour is much reduced by the quickly diminishing benefit to lower ranking authors. This remains the case when the modification is applied to other indexes too. For example, consider the h-index. For a paper to increase a researcher's h-index h, it is necessary (but not sufficient) that it receives at least h citations; in contrast, when the proposed modification is applied, the required number of citations becomes h Â auth rank. Let us consider in some more detail how the proposed modification affects Hirsch's h-index. Using a simple publishing model in which a researcher publishes p papers per year, each of which gets cited c times every subsequent year, Hirsch showed that the corresponding h-index is a linear function of the researcher's publishing age n: Using the same publishing model a similar derivation can be used to show that the relationship between a researcher's h-index calculated using only those papers published in the first y years (but all citations to date), and y is also linear. However, I find that this is seldom the case in practice. This may not be particularly surprising considering the limitations of the simple publishing model used; however, what does need further examination is the observation that nearly universally the actual relationship is superlinear. An example, using a successful researcher at a leading university, is shown in Fig. 2a (solid line). The significantly superlinear increase is readily apparent (the final leveling off being caused by the limited time that the recent publications have been available for citation), with the h-index increasing approximately six-fold in the second half of the researcher's career. There may be numerous factors involved in this: one's growing academic reputation increases the awareness of the person's research and with it the overall citation rate (creating a positive feedback loop), in some fields accumulated experience plays a role in increasing the quality of published work, and so on. However, further analysis suggests a more worrying dominant factor. The plot in Fig. 2b shows the number of papers published per year by the same researcher. Not only is the publication rate not constant across the researcher's publishing career, as assumed in Hirsch's simple model, but it is steadily increasing. It is remarkable to notice that the number of papers authored by this researcher in the peak publishing year is 117-this is a rate of one paper every 3 days. I would suggest that it is most unlikely that a single individual could have contributed to 117 publications in one year to a sufficient degree to meet the authorship threshold for all of them. Further insight is provided by the data show in Fig. 2c, which shows the average rank of the researcher in the list of authors across all authored papers for each year of the researcher's career.
Here too the trend is clearly evident: the researcher's has steadily been moving down the list of authors, often publishing papers as the leading author during the first 15 years of the career, and most often as the third author in recent years. In and of itself this is not a problem; indeed this trend is typical in most fields of research and can be a reflection of a shift in the nature of the person's contributions. Nevertheless, taken in the context of the previously presented data, namely the extraordinary publication rate and the associated rapid increase in the researcher's h-index, the totality of evidence suggests an increasing amount of so-called honorary authorship-the practice of a senior research member (such as the head of a laboratory or a research group) being included as an author to all publications produced by the lab without actually contributing to the work itself [11,12]. Such practice contravenes the norms of ''best academic practice''; to quote the uniform requirements of the International Committee for Medical Journal Editors for manuscripts submitted to biomedical journals [13]: Authorship credit should be based on (1) substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data; (2) drafting the article or revising it critically for important intellectual content; and (3) final approval of the version to be published. Authors should meet conditions 1, 2, and 3. Now let us consider the temporal behaviour of the h-index when the modification I propose is applied. Adopting the simple publication model of Hirsch, it is easy to see that if the authorship rank of a particular researcher is the same in all publications, the modified h-index h o also grows linearly with y, albeit at a rate slower by a factor of auth rank (clearly if auth rank ¼ 1 then h o becomes equal to h). However, this is a rather unrealistic assumption; in most cases a researcher publishes as the leading author in the early stages of the career, and over time with the increase in seniority contributes to research in a more supervisorial fashion (a valuable contribution entirely in accordance with authorship credit requirements; not to be confused with honorary authorship which by definition is neither). This shift has two effects on h o . The first of these acts so as to reduce it because of the weighting of citations by auth rank À1 . On the other hand, the more abstracted nature of contributions typical for senior researchers allows a person to contribute to a greater number of papers thereby acting so as to increase h o . Considering that the portioning of merit described in Eq. (1) allocates the upper bound of possible merit to each author, in most cases it can be expected that the latter of the two forces would prevail and that h o would exhibit superlinear growth. The example given in  Table 1 Examples of citation based impact metrics for computer scientists with an h-index of at least 110, without and with the proposed modification Conclusion In this paper I described a general modification which can be applied to any citation based metric of an individual's research output. The key idea was to distribute the merit for the citations of a paper amongst its authors according to their relative contributions inferred from the authorship order. I argued that the validity of this approach is ensured by the competing interests of different authors. Using both theoretical arguments and empirical examples, I showed that the proposed modification has the potential to normalizepartially for the unfair effects of honorary authorship and thus discourage this practice. Lastly, it should be noted that the proposed modification ceases to be useful when a researcher has publications in venues which use alphabetical ordering of authors. Today this is rare-a recent survey estimates that this practice is maintained by less than 4 % of academic journals, with a decreasing trend [14].