Copyright ©ERS Journals Ltd 2001 Clusters, classification and epidemiology of interstitial lung diseases: concepts, methods and critical reflections1 Respiratory and Environmental Health Research Unit, Institut Municipal d'Investigació Medica (IMIM), Barcelona, Spain. 2 Dept of Experimental and Health Science, Universitat Pompeu Fabra (UPF), Barcelona, Spain. 3 Dept of Occupational and Environmental Medicine, National Heart and Lung Institute, London, UK CORRESPONDENCE: J.M. Antó, Carrer del Doctor Aiguader 80, E-08003-, Barcelona, Spain. Fax: 34 932216448 Keywords: epidemics, epidemiological techniques, cluster, classification of interstitial lung disease, cryptogenic fibrosing alveolitis, sarcoidosis
The present article reports on two conceptual and methodological issues concerning interstitial lung disease (ILD) about which there is a lot of misunderstanding and contradiction: the investigation of epidemics and clusters and the classification and epidemiology of ILD. In general, an epidemic is the occurrence of cases of an illness in excess of normal expectancy. The investigation of an epidemic often demands a number of sequential studies: first descriptive, then aetiological. Clusters consist of an increase in incidence of much smaller magnitude, perhaps excluding the possibility that this is merely a result of chance. In recent years, more valid statistical methods for the assessment of clusters have developed. For interstitial lung disease in particular, different classifications exist that are sometimes inconsistent and may change with time. All diseases require a process of ascertainment, whereby they are identified, classified and perhaps registered. The two most employed epidemiological techniques for testing aetiological hypotheses are the cohort and case-referent approaches. The aim of epidemiology is to understand the causes of disease and, by inference, develop approaches to aid prevention. Classically, its method begins with the description of the distribution of a disease within or between populations, and proceeds with analyses of the factors that determine this distribution. In this way, aetiological hypotheses are generated by observation and tested by formal study. Such neat distinctions conceal the muddle which is more often characteristic of the process by which a disease's aetiology is uncovered. However, the framework is a useful one, particularly in understanding where epidemiology has failed. Despite several decades of effort, the aetiologies of most interstitial lung diseases (ILDs) remain a mystery. Arguably, this is a reflection of a poor understanding of the basic distributions of these diseases and the consequent paucity of aetiological hypotheses for formal analysis. In this context, the identification of clusters and small epidemics may eventually allow relevant risk factors to be identified. Consequently, several clusters of ILD have been reported in the literature. This article examines some of the difficulties facing epidemiologists in this area, focusing on the difficulties of establishing a clear classification of ILD, as well as the investigation of clusters.
Definition of an epidemic The term epidemic refers to the occurrence, in a community or region, of cases of an illness in excess of normal expectancy. Usually, such excess is so sharp and of such magnitude that chance fluctuation is a very unlikely explanation and an immediate response from the public health services is frequently necessary. However, the concept of epidemicity is relative to the usual frequency of the disease in the same area, among the specified population, and during the same season of the year 1. From this point of view, even a small number of cases of a very rare disease may constitute an epidemic. Epidemics of respiratory disease have been reported both in the general population and in occupational groups. The former is illustrated by urban epidemic asthma 2. Unfortunately, there are many examples of occupational outbreaks that have involved many different respiratory diseases, including silicosis, asthma, byssinosis, hypersensitivity pneumonitis and even malignant diseases, such as lung cancer.
Sequential studies in an epidemic The first stage consists of descriptive studies aimed at defining the problem, describing the existing data and formulating aetiological hypotheses. This step involves a case definition, the identification of cases and the description of the distribution of cases in time and space. Some characteristic time-space models have been described in the form of a line, an area, or a point source, whose identification may help to establish aetiological hypotheses. The distribution of cases in a closed community, such as a factory or a school, may also be relevant to the generation of aetiological clues. The physiopathogenic mechanism of the disease may also help to formulate a hypothesis. However, experience has shown that epidemiological research often allows the aetiology of diseases to be identified in the absence of a well-established biological mechanism. A relevant lesson, learnt from investigation of the environmental origins of many epidemics, is that gathering clear-cut aetiological evidence may be extremely difficult and involves long-term intensive research. For example, in the case of toxic-oil syndrome, final characterization of the responsible chemical compounds and its toxicological effects has not yet been possible 4. The second stage consists of an aetiological investigation, which is, at the start, retrospective by definition. In the general population, where the enumeration of persons at risk is not usually possible, cross-sectional and case-control are the appropriate study designs. By contrast, in studies on industrial outbreaks, retrospective cohort studies are sometimes possible. The aims and characteristics of the case-control study may vary in parallel with the elaboration of more refined hypotheses about the particular circumstances of the causative exposure. This is well illustrated by the investigation of the eosinophilia-myalgia syndrome 5, 6. In the first step of this investigation, the initial cases of eosinophilia-myalgia were defined. The second step involved a case-control study, which identified an increased risk in those who reported the ingestion of tryptophan supplements. Next, two sequential case-control comparisons were conducted among subjects that reported the ingestion of tryptophan. These two analyses permitted identification of both the manufacturer and the specific period of production that caused the contamination of the tryptophan supplements. This approach of narrowing the reference group is also applicable in the investigation of outbreaks in close populations, like factories. In the study of an outbreak of organizing pneumonia, the case analysis according to period of employment facilitated the identification of the probable causative product 7. The use of the case-control design in the investigation of epidemics has been extensively reviewed by Dwyer et al. 8. An advantage of case-control studies is that the investigation of an epidemic is often facilitated upon reaching an odds ratio or relative risk >10. This is due to a high prevalence of exposure to the putative risk factor in the cases, and a low prevalence of exposure in the reference population. In the case of the asthma outbreaks identified in Barcelona during 19811987, the association between the presence of immunoglobulin-(Ig)E antibodies against soybean dust extract in serum and having suffered epidemic asthma, resulted in an odds ratio of 80 in a case-control study 9. However, the identification of a necessary cause should not exclude the analysis of other necessary causal factors that are often involved in the causal chain.
Definition of cluster and of clustering However, when the distribution of a disease is analysed over time and space, often only the incidence and population density are known, and areas are usually found where there are more cases than expected, according to a Poisson model. Most of these agglomerations can be explained by time-space variations in the risk factors, which cannot normally be determined as the necessary information is lacking. Knox 10 called this tendency towards grouping many diseases in time and space "clustering". Thus, clustering is the regular tendency of many diseases to present themselves irregularly in space and time, once population density and chance are accounted for. In relation to geographical coordinates, these patterns are usually called regional trends, while in relation to time, they are termed secular or seasonal trends. Irrespective of cluster occurrence, the identification of patterns of clustering can facilitate the establishment of original aetiological hypotheses.
Assessment of clusters The investigation of clusters in respiratory diseases may be difficult. One illustration of this is seen in the study of sarcoidosis in fire-fighters. Kern et al. 17 reported a unique time-space cluster of three cases of sarcoidosis among 10 fire-fighters who had trained together. The investigators put forward the hypothesis that fire-fighters may have an increased risk of sarcoidosis, and undertook a study of 1,282 active and retired fire-fighters and police officers in search of common exposures to smoke and infectious agents. The study did not show any increased risk relating to the suspected exposure. Obviously, three cases of sarcoidosis in this large population represented such a small number that is was difficult to exclude clustering or chance as alternative explanations. Another type of clustering is seasonality, which has recently been reported in sarcoidosis presenting with erythema nodosum 18. The possible difficulties in the investigation of clusters are further illustrated by two outbreaks of ILD that have been reported in textile workers. Lougheed et al. 19 described the occurrence of five cases of ILD among 88 workers at a Canadian textile factory and attributed the outbreak to the inhalation of aflatoxins. Kern et al. 20 reported eight cases of chronic ILD among 165 workers in a US plant of the same Canadian Company. In the latter study, it was possible to fully enumerate the cohort of workers, which included 162 males and three females, who accounted for 660 and nine person-yrs at risk, respectively. Through retrospective cohort analysis, the authors estimated an incidence of 1,495 cases per 100,000 person-yrs, as compared to an expected incidence of 31.5 cases per 100,000 person-yrs calculated from a population-based register in New Mexico 21. These two studies also illustrate the difficulties of performing aetiological case-control or cohort analyses in order to elucidate the potentially causative exposures in a small number of cases. Ad hoc environmental measures did not find elevated levels of aflatoxins in the US plant and suggested that the disease could have been caused by the inhalation of respirable-size nylon fragments.
Conclusions
Classifications ILDs are classified by their aetiology, pathology, radiology, associations with other diseases, responses to therapy, or even by the absence of an alternative diagnosis (fig. 1
In broad, aetiological terms, ILDs may be classified by three categories of understanding. 1) There are several instances where close associations between ILDs, particularly fibrotic ILDs, and exposures to a variety of specific environmental stimuli are seen. The clearest relationships are seen after occupational exposures, presumably through inhalation, to both inorganic and organic agents. Several inorganic fibrogens (particularly asbestos and beryllium) have clear exposure-response relationships, at least under conditions of high exposure. As with virtually all diseases, these relationships are modified by individual susceptibility, an area exemplified by berylliosis 22. Environmental factors may also dominate the aetiology of extrinsic allergic alveolitides, although evidence of this is less clear. Therapeutic drugs may also cause ILDs, presumably reaching the lung through the circulation. In some cases, especially anticancer and anti-arrhythmic therapies, where doses are often high and treatments prolonged, dose-response relationships may be discernible 23. Often, however, pulmonary disease only develops in a small proportion of treated patients, in which case, the abnormal response appears to be idiosyncratic and determined by host susceptibility factors. In many cases the pathology resolves once treatment is stopped. 2) Beyond these disease states lie a wide range of clinical and/or pathological states, the aetiologies of which remain either largely or wholly unknown. Most are uncommon, some excessively rare, but two (cryptogenic fibrosing alveolitis (CFA), synonymous with idiopathic pulmonary fibrosis (IPF), and sarcoidosis) are less so. 3) Occupying an uneasy (to the epidemiologist) position somewhere between these two classes of understanding are the collagen-vascular associated ILDs, the causes of which are presumed to be the same as those underlying the related connective-tissue condition.
Alternative, pathological classifications of these diseases do not neatly coincide with this aetiological grouping (fig. 1 Such overlaps presumably reflect the limited and stereotyped responses of the lung to toxic stimuli. If so, there are at least three important implications for the epidemiologist. First, observations of disease distribution that depend entirely on pathological patterns may be misleading; grouping conditions by their homogenous, microscopic pathology alone may obscure important differences in distribution. Thus, Loeffler's syndrome may be aetiologically quite distinct from other forms of pulmonary sarcoidosis. Second, the extent to which the aetiological understanding of diseases in category 1 can be translated to those of similar pathology in categories 2 and 3 is debatable. Third, the current pathological classifications can lead to logical absurdities, such as the difficulty in making a diagnosis of CFA in a male with a history of occupational asbestos exposure.
Ascertainment Sarcoidosis provides a reasonable example of some of these difficulties. The disease is commonly without clinical manifestations 25 and thus, may only be detectable by chest radiography. Outside campaigns of mass radiography (which are themselves selective), recognition of the disease is partly a reflection of the probability of having a chest radiograph. For example, in Rochester, MN, USA, sarcoidosis was more commonly identified as "asymptomatic" among recent immigrants and healthcare professionals, presumably because these groups undergo chest radiography more frequently 26. Similarly, part of the apparent clustering of the disease on the Isle of Man 27 may be a reflection of the high number of cases among nurses working at the island's main hospital. For the same reasons, temporal patterns apparent in the frequencies of these diseases should be viewed with caution. The increase in prevalence of sarcoidosis on the Isle of Man between 19771983 28 almost certainly reflects better case ascertainment, and similarly, the reported increases in male mortality from CFA in England and Wales 29 are within the bounds of improved diagnostic methods.
Distributions: age and sex Sarcoidosis has a peak incidence in young adulthood. In Scandinavian and British populations, a second peak is apparent among 5060-yr-old females. The disease is rare in childhood and among the elderly. Its sex distribution is almost equal, although certain manifestations may be more common among females. These features would be unusual in a disease that is the result of dust inhalation, particularly occupational dust inhalation. They are more typical of an infective aetiology, albeit one with an unusual distribution and perhaps a strong immunological component. Epidemiological and microbiological attempts to support this have not yet been successful. The search for epidemiological markers of infection, essentially the demonstration of disease clustering in time and space (including within families), has been unsuccessful and, as argued above, is open to the vagaries of differential ascertainment. The pathological similarities between sarcoidosis and tuberculosis (and the similarities in their age and sex distributions, where the latter is endemic) have led to many attempts to identify mycobacterial antigens in sarcoid tissues, a quest fuelled by the availability of polymerase chain reaction and similar techniques. Overall, the results have been inconsistent and ultimately disappointing, which may be due to the extreme sensitivity of the detection techniques and the near-ubiquity of nontuberculous mycobacteria. Recent reports of propionibacterial deoxyribonucleic acid (DNA) in sarcoid tissue may, however, provide an avenue for further, more successful investigation 30. Conversely, CFA is largely a disease of later life, and is probably more common among males than females. A number of occupational aetiologies for the disease have been proposed, and some formally tested. For example, associations with wood- and metal-work have been reported 31, although only within a relatively confined geographical area. The size of the resultant odds ratios suggests that only a small proportion of the disease can be attributed to these occupations. Disease misclassification must also be considered, particularly since these occupations are also strongly linked with mesothelioma. More broadly, however, if occupational exposures were responsible for all or a substantial proportion of cases of the disease, then an increasing male:female incidence ratio with age would be anticipated. Available evidence from death certificates in England and Wales suggests that this pattern is only weak. Conversely, infective aetiologies for the disease are required to invoke some form of delayed response to explain the disease's age distribution 32.
Other distributions
Epidemiological techniques
Conclusions
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||