Automating Society Report 2020


Between care and control: 200 years of health data in France

by Nicolas Kayser-Bril

The French “Health Data Hub” will soon offer health data on all French citizens to AI startups that request it. It is the latest step in a project that began 200 years ago, aimed at centralizing health data. It is a project that has oscillated between care and control, but which has veered mostly towards control.

Coming back from the 1876 World Hygiene Congress, French doctor Olivier du Mesnil was in awe of Brussels’ health information system. There, he wrote, doctors immediately sent a copy of death certificates to the central hygiene bureau. Every week, bureau personnel produced a map of the city where each death was marked with a  pin, with different colors for each contagious disease, and shown to the mayor. This technology, Mr. du Mesnil said, allowed even “the person least familiar with scientific research to exert non-stop control over the health status of the population."

Mr. du Mesnil used this Belgian example to underline how much France had to catch up. Like its neighbor to the north, France was a rapidly industrializing country in the 19th century. Railroads sped up transport and factories brought together large numbers of workers packed in unhealthy, precarious dwellings.

Controlling cholera 

The main beneficiaries of this combination were bacteria that could jump from host to host rapidly over long distances. Two cholera epidemics, in 1832 and 1854, claimed over 100,000 lives each. The bacteria responsible for the illness was not identified until 1884, but governments across Europe understood the need to collect data to follow the spread of cholera and impose quarantines in time to stop its progression. After the first cholera epidemic, the French government began gathering information throughout the country on what was thought to be the cause of the sickness: bad air, rotten food, and even “ignorance”.

Despite its lethality, cholera was not the deadliest disease of the 19th century. Dysentery and tuberculosis were far more dangerous but were limited to the poorest in society. That the government only took action regarding cholera – which killed people from all social classes – was taken by many as proof that the health of the poor was only a concern when it impacted the rich. Distrust of government measures to fight cholera, including data gathering, ran high.

Health police

Until well into the 20th century, health was a matter of public order rather than well-being. Until the First World War, health was the purview of the ministry of the interior. To monitor and predict the spread of tuberculosis in large cities, authorities built a “health record” for each building, modeled on the criminal record of each individual. Workers would need to present both to obtain a job. (The health record was discontinued around 1908 after hospital personnel pointed out that building-level data was not of much use.)

A change of perspective occurred in the early part of the 20th century; first, warfare required to conscript not just part of an age cohort, but all of it. The general health level of the population acquired a new military importance. Second, eugenics, a pseudo-scientific craft that claimed to improve a population by rooting out its unhealthy members, increased in popularity. Health and hygiene became political goals in themselves and gained their own ministry in 1920.

Knowledge monopoly

Health statistics, once concerned only with epidemics and controlling the poor, started to record well-being. The League of Nations, a newly created and optimistic international organization, led the movement and commissioned health surveys across countries.

Not all French doctors were enthusiastic about the change. They complained that such data collection would endanger doctor-patient confidentiality, but their main concern may well have been the loss of status. At the time, doctors were the ultimate repository of medical knowledge. Passing on information to the state was seen as a devolution of power. Because doctors were almost entirely financed by their patients, they had little incentive to cooperate in systems they disliked.

In any case, the collection of health data for the well-being of the population was only limited to a fraction of French taxpayers. In the colonies, health was still seen as a production factor, to be optimized only insofar as it made plantations and mines more profitable. Until 1946, all French colonies together employed, at most, four statisticians, for whom health data was probably not a priority.

Knitting needles and data processing

Some in the medical sciences saw an opportunity in structuring data. In 1929, Louis Bazy, a surgeon consulting for the Paris-Orléans railway company, had the idea to use his employer’s “statistics machines” to aggregate data on the health of the company’s workforce. He designed a system whereby each employee’s illness was coded in a punch card, along with her or his personal data. The statistics machine could process 400 cards a minute and, with the help of the tabulating machine, provide information on the spread of a disease or correlate between variables. The scope of applications for medicine and research was endless, he wrote.

Not every doctor had access to statistics machines, so a professional magazine for progressive doctors explained how to process punchcards with knitting . Le Mouvement Sanitaire, 1933, p. 121-122." url=""]needles. Despite such efforts, there is no evidence that these early attempts at computerized medicine gained many followers.

The drive towards centralization

During the Second World War, the French government made big strives to implement eugenics. A side effect of this policy was the creation of a National hygiene institute (Institut national d’hygiène, INH). From 1942, it conducted large-scale data collection to track the effects of the government’s crackdown on alcoholism and venereal disease. It also built a central repository of information on 35,000 cancer patients.

After the war, INH expanded and kept monitoring the nation’s health (it became the French National Institute of Health and Medical Research, Inserm, in 1964). On the other hand, the post-war government offered social insurance to all its citizens. With it came a social security number which is given at birth and which remains immutable until death. Having a unique identifier for every citizen revived the old dream of governance through numbers, where decisions could be taken purely on the basis of data.

In France, as in other countries of the Western bloc, central planning was considered a necessity. The government felt it had to collect comprehensive data on morbidity (that is, on the illnesses affecting the population). A first attempt, in 1945, to force hospital doctors to fill out forms after each procedure, to be sent to a central authority, failed. Another attempt was made in 1958 and again in 1972. As in the 1930s, doctors did not comply with their new obligations. They criticized the methodology, complained about the added workload, and failed to see any benefits for them.


This changed in the 1980s when an attempt at centralizing morbidity data started in 1982. By the beginning of the next decade, all hospitals were feeding data to a central authority.

This success – for the state – might have to do with the economic environment of that decade. The slow economic growth provided an incentive for the government to cut costs in healthcare. Despite initial reluctance from doctors, several health ministers pushed it through and made the new system mandatory in 1991.

The data-gathering effort was, first and foremost, a cost-control mechanism. Knowing how many procedures each hospital carried out, and how much money each hospital received, the health ministry could rank their performance, in financial terms at least. Hospitals that overspent were called to order until, in the 2000s, the government introduced a pay-per-procedure system. A heart attack of level 1 is worth €1,205.57, increased to €1,346.85 if the patient dies within two days. Each procedure that doctors perform is coded, following a strict classification, and hospitals are paid by the social security accordingly.

To navigate the list of over 6,000 procedures, hospitals hire external consultants to “optimize” their coding practices. As AlgorithmWatch reported in May 2019, code optimization is nothing less than “generalized cheating” to maximize revenue, according to a health sociologist at the University of Lille.

 Quality concerns

Because France has mandatory national health insurance with a single-payer, the morbidity data can be linked to medication usage as soon as a drug is reimbursed by social security. For about 99% of the population, the French national health insurer has comprehensive information on hospital procedures and drug intake since the early 1990s.

This unique data set allowed researchers to find hidden correlations. This is how Benfluorex (sold under the name Mediator) was linked to heart valve disease, leading to the withdrawal of the drug in 2009.

However, all information on hospital procedures is related to accounting and not medical procedures. The optimization of procedure encoding does a great disservice to data quality, but no one knows exactly how bad the situation is, as very few studies have been conducted. One 2011 study showed that, for one specific procedure, the false positive rate was close to 80%, while false negatives reached 35%.


Despite this abysmal performance, in 2019, the French government pushed to build an even bigger database, called the “Health Data Hub”. Cédric Villani, a mathematician who spearheaded the Artificial Intelligence strategy of president Emmanuel Macron, wrote in a parliamentary report that the real risk of AI in health would be “not to welcome it”. The Hub aims at providing any health-related data to AI projects that request it.

Since 2004, the government has pushed for all French residents to open an electronic health record (EHR). After a slow start, the centralized EHR will be made opt-out in 2021, and should, in time, be versed in the Health Data Hub.

The French data protection authority criticized the project because of its broad aims. Data from the Hub can be used for any “public interest” goal, potentially opening the door to any commercial application. Critics also pointed out that personal data in the Hub is pseudonymized but not aggregated so that it can be easily de-anonymized.

Toxic relationships

A doctor, who wished to be identified only as Gilles, started a “data strike” when the Health Data Hub was officially launched in December 2019. He and others called on colleagues to stop filling out the forms that feed the Hub. Since the 1980s, he said, France moved from “a healthcare that cures to a healthcare that counts,” pointing to the cost management systems. He saw no benefits in the new database, saying that research does not need it. “Software only robs time that should be spent on caring for patients,” he added.

Even if the success of the strike cannot be quantified, Gilles’ anger is widely shared. In January 2020, over 1000 doctors resigned their administrative duties, citing the pay-per-procedure system as one of the largest problems.

It was also revealed that the designer of the Health Data Hub quit his job to work for a private sector firm specialized in selling health data. He saw no conflict of interest.

 Health data shrug

The main breakthrough of the Health Data Hub is that, for the first time, a French government decided to use an English name for an official project. The rationale that led to its creation is a direct continuation of 200 years of efforts by the French government to gather data on its citizens, to make its population more legible, and more governable.

No one knows what the Health Data Hub will bring, but history offers some insights. The information system that Brussels set up in the 1870s, which Mr. du Mesnil so admired, might have worked. The city was spared any large epidemic until the Spanish flu of 1918. But then again, so were all large cities in France. On the other hand, life expectancy in Brussels, relative to the Belgian countryside and other large cities, decreased between 1885 and 1910.

It could be that health data and actual health do not always go hand in hand.


Nicolas Kayser-Bril

Nicolas Kayser-BrilNicolas Kayser-Bril is a data journalist, and he works for AlgorithmWatch as a reporter. He pioneered new forms of journalism in France and Europe and is one of the leading experts on data journalism. He regularly speaks at international conferences, teaches journalism in French journalism schools, and gives training sessions in newsrooms. A self-taught journalist and developer (and a graduate in Economics), he started by developing small interactive, data-driven applications for Le Monde in Paris in 2009. He then built the data journalism team at OWNI in 2010 before co-founding and managed Journalism++ from 2011 to 2017. Nicolas is also one of the main contributors to the Data Journalism Handbook, the reference book for the popularization of data journalism worldwide.