ICI Co-Director Mark Frazier interviewed Arunabh Ghosh, Associate Professor of History at Harvard University and author of Making it Count: Statistics and Statecraft in the Early People’s Republic of China (Princeton University Press, 2020).

Selected Chapters are also available at Project Muse.

Making it Count: Statistics in the Early People’s Republic of China (Princeton University Press, 2020)


Mark Frazier: Congratulations on the book. It makes fascinating insights by connecting the fields of China Studies with Science and Technology Studies. How did you first become interested in the study of statistics in the field of PRC (post-1949) history?

Arunabh Ghosh: Thank you, Mark, for your kind words about the book. I am afraid this may require a slightly involved response, since my interest in statistics in the early PRC is the outcome of various strands intersecting over time. I would like to think I have long had an unarticulated interest in histories of science and quantification.

In college, I studied history, economics, and mathematics, and wrote papers on debates in 1850s Britain surrounding the adoption of the metric system (for the record, the UK didn’t formally adopt the metric system until 1965) and on the (rather ugly) rivalry between Newton and Leibniz as each sought recognition as the inventor of integral calculus. After college, I spent two years at the Urban Institute (UI), a social and economic policy think tank in Washington, D.C., working primarily on health policy in the United States. UI is known for its heavily empirical approach to policy research and much of my time was spent wrangling with massively large datasets. I imagine that a general interest in the history of data and how they are used was likely seeded at that time.

Later, as I changed careers and began doctoral work in Chinese history at Columbia, I explored some of these themes in papers on late Qing ideas about quantification and accuracy, the abortive census of 1908, and the 1953 census. As my interests gravitated towards post-1949 history, the 1953 census, in particular, became the kernel of my dissertation prospectus. My plan at that time was to use the census as a case study of a large-scale statistical exercise at a time of nascent state-building.

The final push, however, came in archives and libraries in Beijing, where I confronted a relative absence of records on the 1953 census, while also encountering a notion of statistics that was far more all-encompassing than merely the collection of demographic data. Even more decisive was the contemporaneous discovery of materials on much more fundamental questions: Is there a correct way to count and ascertain social fact, is statistics a social science or a natural science, and so on. Chinese statisticians were deeply invested in these questions. Following their deliberations allowed me to explore their answers and, in turn, to track how those answers affected their statistical work, their ability to know the country quantitatively, the discipline of statistics, and how all of this was connected to global histories of statistics and data.

Frazier: Your book arrives at a time of global pandemic, when China’s reporting of Covid-19 cases, mortality rates, and so forth, are being criticized by American and other governments as being underestimated for political purposes. As you’ve witnessed the crisis unfold, are there insights from the book that can help us better understand these debates over “cooked” official data from China, which long precedes the current pandemic?

Ghosh: I think it is important to make two general points with regard to Covid-19 before I address the question of Chinese statistics. First, I think underestimation is a global problem right now. Nate Silver in a recent blog post (“Coronavirus Case Counts Are Meaningless”) offers a systematic explanation of the various ways in which we may be underestimating, and, therefore, how dangerous it is to rely on these numbers to formulate policy and inform politics. The second, and related point, is that we need greater transparency within and across nation states about data practices. A commentary (“COVID-19: The Devil in the Data”) I recently wrote offers a brief discussion of why this is important and how elusive it remains.

Now, coming to China. There is broad scholarly consensus that the Chinese state often, though certainly not always, “jukes” or “cooks” the stats. Much of the contemporary debate focuses on GDP, the predominant but also deeply problematic index for measuring economic development. A range of scholars have investigated this question, as have a whole host of journalists and commentators. Their analyses, for the most part, focus on what I label “post-hoc manipulation.” In other words, the massaging or manipulation of data after it has been generated, so that it conforms to specific (political) requirements. This manipulation can occur within states (a local bureau or private actor may report fudged numbers to superiors at the provincial or national level) or between states (where a nation may alter numbers post-fact to look better on the international stage). Jeremy Wallace and Jessica Chen Weiss and Bill Hurst recently wrote excellent op-eds exploring some of these issues.

In Making it Count I demonstrate that we need to be equally attentive to other processes that may skew the data in specific, sometimes unpredictable, ways. These processes have to do with what we might call “first principles” or “starting assumptions”. They could be something as simple as our assumptions about what needs to be counted—in the context of Covid-19, should we only count as confirmed infections those that we hospitalize or include all who test positive? These assumptions need not always be overtly political; rather they may stem from some banal bureaucratic logic. Alternatively, first principles may affect our choice of statistical methodology. For instance, for much of the 1950s, Chinese statisticians chose to rely almost exclusively on exhaustive enumeration, rejecting any kind of randomized sampling technique as theoretically inappropriate (for why, you’ll have to read the book!). This generated a whole range of problems from the over-production of data, to data irreconcilability, to an inability to assess agricultural production in an accurate and timely fashion.

For our current moment and Covid-19, the lesson is that we need to be careful as we disentangle how data gets produced, separating outcomes that can be traced to “first principles” from those that are a result of “post-hoc manipulation”. All data is biased, but they are not all biased in the same way. This is especially important when it comes to the politics surrounding data sharing and data shaming.

Frazier: You discovered some fascinating connections and exchanges during the 1950s between the leaders of the PRC State Statistics Bureau and the Indian Statistical Institute, in which Indian approaches to random-sampling were influential (though abruptly rejected in favor of Maoist-inspired enumeration methods by 1959). When and how did you come across these connections? Elsewhere in your Journal of Asian Studies article, you’ve cataloged a wide range of Sino-Indian exchanges during the 1950s. What’s the significance of these exchanges for understanding how both states seek to make policy by learning from external models?

Ghosh: Every historian has their own set of archive stories, those rare moments in a sea of toil, when you stumble upon a document and recognize it immediately as treasure. The discovery of a letter by the Indian statistician P.C. Mahalanobis that offered a detailed description of Zhou Enlai’s visit to the Indian Statistical Institute (ISI) was one such moment for me. That story now opens Chapter 7 in the book. My first clue to the statistical exchanges, however, was a chance encounter with a People’s Daily story from June 1957, which reported that Zhou had hosted Mahalanobis for dinner in Beijing. What I initially dismissed as a mere courtesy call, it turns out, was instead a dinner to celebrate the end of a three-week visit. Mahalanobis, along with a colleague, had been invited by the SSB to consult on statistical systems and methods; in particular, the technique of large-scale random sampling. The visit itself was one among a series of exchanges between Chinese and Indian statisticians. These exchanges helped me appreciate the nature of Soviet statistical aid to China and how disaffected the Chinese had become with exhaustive enumeration. Without this perspective, the story of statistics in 1950s China would have remained incomplete.

At a broader level, the exchanges are also a compelling instance of South-South technological knowledge sharing. For long, our approach to Cold War science has been dominated by the assumption that scientific and technical knowledge flowed outward (and downward!) from two nodes dominated by the United States and the Soviet Union. As the China-India statistical exchanges show, this model comes with significant blind spots. I wager we will find many fascinating stories of scientific and technological exchanges across the Global South—if we look for them! To cite but one example, my colleague in the History of Science Department at Harvard, Gabriela Soto Laveaga is currently working on a revisionist history of agriculture development aid in the twentieth century, focusing on links between India and Mexico. It was this insight—that connections between China and India can help us understand aspects of early PRC, Indian, and Cold War history—which also inspired the Journal of Asian Studies article and which continues to inform a couple of my ongoing projects. As I note in the article, such an approach requires us to expand our vision beyond, following Prasenjit Duara’s formulation, the two traditional frameworks that still dominate China-India research: civilization/culture and geopolitics.

It is therefore encouraging to see the wider interest—institutionally, disciplinarily, and topically—that China-India studies has begun to enjoy in recent years. The ICI is at the forefront of these trends. I am myself currently involved in two China-India projects. The first is a collaborative project (with the historian Tansen Sen and the literary scholar Adhira Mangalagiri), which examines archival materials pertaining to China and India from within the recently declassified Jawaharlal Nehru Papers (ca. 1947-1964). The second is an in-progress collection of essays on China-India networks of science, ca. 1920s to 1980s.

Frazier: One of the enduring challenges from imperial times to the present has been the clash of interests between local officialdom and the central government over access to local information, often in the form of numbers. In your archival research and reading of other official sources, how did the central government in the 1950s discuss ways of overcoming barriers to data collection at local levels?

Ghosh: This is indeed an enduring problem (incidentally, Kyle Jaros has an excellent article on how this applies to Wuhan and Hubei), and I can only provide a rather general response here. The 1950s witnessed successive waves of centralization and decentralization, each wave generating its own tensions. In statistics, much of the decade was dominated by a reliance on exhaustive enumeration using a periodic reporting system, which required a large statistical apparatus stretching from Beijing all the way down to 2,200 counties, and through them to 750,000 villages. By 1956, the system employed as many as 200,000 cadres. Centralization was essential to this system. All data had to filter up to Beijing, where the State Statistics Bureau (SSB) would compile it and make it available to the State Planning Commission (SPC) to draw up various plans. These would then be communicated back down to provincial and municipal planning committees, which would, in turn, communicate with their corresponding statistics committees, offices, and sections. Overproduction of reports quickly became a problem for the SSB, which had little capacity to meaningfully process the excess tables and numbers being generated.

Two other trends also had an influence on center-local tensions: increasing complexity of the economy as new factories and industries were set up and agriculture was reorganized and eventually nationalized; and a growing demand for the SSB to not just collect data, but also offer analyses. One outcome of these developments, acknowledged by the director of the SSB himself (Xue Muqiao), was the prioritization of timeliness over correctness. This, in turn, led to an ever-increasing disjuncture between local data and regional or national data. Typically, local data were often of reasonably good quality, but the pressures of timely reporting meant that estimation was permitted. As a consequence, the higher up the data traveled, the more estimation was likely used, causing provincial and eventually national data to be subject to ever larger margins of error. Of course, variations based on region and especially sector (e.g. agriculture was much worse off compared to industry) were also present.

In the final analysis, as I note in the book’s conclusion, “in spite of generating copious amounts of facts, [the Chinese state] remained poorly informed.”

After 1956, two solutions were attempted. The first was the series of exchanges with India, noted above. Large scale random sampling offered an enticing, cheaper, and faster way to collect grassroots data. The second solution gained dominance during the Great Leap Forward and called for the dismantling of the periodical report system in favor of an ethnographic mode of social surveying. Drawing inspiration from Mao’s 1927 Report on an Investigation of the Peasant Movement in Hunan, this method championed intensive knowledge about a prototypical case at the expense of a much larger and exhaustive survey. This is a classic tension in social science, between in-depth knowledge and generalizable claims. In the context of the late 1950s, it neatly side-stepped the challenges of the periodic report system and random sampling and contributed to the state’s reduced capacity to ascertain accurately the extent of deprivation in the countryside in 1959 and 1960, with devastating effects.

Frazier: Your book also speaks to the current era of “big data” – the enumeration of all things, and the enterprise of data-driven policymaking. As you point out, the early PRC was part of a mid-century global wave of zealous data collection and faith that numbers would improve policy choices and better regulate human behavior. How much difference was there across the Cold War divide between communist and capitalist states when it came to their approach to data and governance?

Ghosh: This is a great question. I think the Cold War at times imposes overdetermined categories and blinds us to a history that is full of interesting convergences and divergences. Statistics is a good example. As I point out in Making it Count, the post war years were witness to a seemingly irresistible drive toward the modernization of statecraft. Any problem, no matter how complex, could be addressed and resolved if only sufficient data were collected. A desire to collect and collate numerical data about an ever-expanding range of activities was a natural corollary. And running parallel were efforts to develop a vast array of statistical tools. Together, these signaled modernity, progress, and good governance.

This belief in the transformative power of numbers transcended ideological divides—real and manufactured—between capitalism and communism and can be found on either side of the Iron Curtain and beyond. At the same time, the Cold War did place particular ideological and practical constraints on scientific activity. In some cases, including in the People’s Republic, applied research was prioritized at the expense of basic research. Put differently, the desire of modern states for ever-increasing control generated organizational and institutional imperatives that were fairly universal, but the specific forms they took could vary significantly, to the point of being labeled as each other’s correctives.

In the case of statistics in China, I’ll mention three key distinctive features. The first of these was a matter of policy (and also true in the Soviet Union): Statistics was a state secret. Accordingly, data were tightly controlled. Only highly curated and summary level data were made publicly available. For instance, results of the 1953 census were released in 1954 but restricted to four highly aggregated categories of data: population by province, age group, gender, and ethnicity. Another major release of data was the 1959 publication Ten Great Years, a statistical celebration of the achievements of the first ten years of the People’s Republic. The Indian statistician Mahalanobis made several requests (some even delivered by the Indian Prime Minister, Jawaharlal Nehru) for disaggregated data on the Chinese economy, but was politely rebuffed.

The other two differences relate more directly to questions of ideology and theory and had significant consequences. First, statistics was defined as a social science, contra a natural science or a universal science, both of which were dubbed capitalist/bourgeois conceits. This entailed the rejection of all probabilistic methods (such as random sampling). The result was an almost exclusive reliance on exhaustive enumeration, which came with considerable costs. The second, a direct consequence of the first, was the separation of (the social science of) statistics from mathematical statistics, with the latter banished to math departments. As a result, there was little communication between practitioners of statistics and those engaged in more theoretical work, leaving each side ignorant of developments in the other field, unable to benefit from any kind of mutual cross-pollination.


Arunabh Ghosh | Harvard University | History Department

Arunabh Ghosh is a historian of modern China, with research and teaching interests in social and economic history, history of science and statecraft, transnational history, and China-India history.

Ghosh’s first book, Making it Count: Statistics and Statecraft in the early People’s Republic of China (Princeton University Press, 2020), investigates how the early PRC state built statistical capacity to know the nation through numbers. He has conducted research for the book in Beijing, Guangzhou, New Delhi, and Kolkata, and his work has been supported by grants and fellowships from the Andrew F. Mellon Foundation, the American Council of Learned Societies, the Social Science Research Council, and Columbia University. His work has appeared in the Journal of Asian StudiesOsirisBJHS ThemesEASTS, and the PRC History Review.