How can I download histological slide from Cancer Digital Slide Archive

How can I download histological slide from Cancer Digital Slide Archive

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Could you tell me if it possible and how to download images data from ?

Thank you.

The Future of Digital Pathology

Michael Schubert, Luke Turner | 10/22/2020 | Longer Read

Although some doubters remain, the value of digital pathology is increasingly recognized around the world – especially in this post-pandemic era. Digital enthusiasts Jonhan Ho and Sylvain Mailhot review the current state of play and how digital adoption can be encouraged.

Digital pathology is, by now, indisputably the future of the field. Although some are still reluctant to accept its place in the lab – and despite its slow adoption in many institutions – the digital slide image is slowly replacing the glass slide on the microscope’s stage. The flexibility, accuracy, and potential to deploy ever-improving informatics and image analysis tools make digital pathology an attractive option. But significant hurdles remain nonetheless – the initial expense, disrupted workflows, and altered infrastructure required for implementation prevent many labs from accessing digital pathology’s full potential.

However, the onset of the COVID-19 crisis at the start of 2020 only served to highlight some of the key benefits of digital systems. Those institutions with digital pathology enabled their pathologists to better manage workloads and diagnose from the comfort – and safety – of their own home. Regulatory issues can hamper digital adoption – but, as we adapt to the practical consequences of the pandemic, medical and regulatory communities are increasingly analyzing ways to combine slide digitization and the benefits of remote diagnosis. To learn more about the current state of play and where things are headed, we spoke to Jonhan Ho and Sylvain Mailhot, two pathologists with digital experience.

Sponsored Content

How Can AI in the Lab Benefit Pathologists and Patients?

Join The Pathologist and ContextVision on March 25 to hear Filippo Fraggetta share his experience and clinical results from using INIFY Prostate Screening.

Time for change
When Jonhan Ho first saw a whole-slide image during his pathology residency, he was blown away. “I knew that one day we would all be making diagnoses routinely with digital pathology,” he says. At the same time, though, he was growing frustrated with the cumbersome medical software on offer. “I was incredibly frustrated with how much time we wasted clicking unnecessary buttons. I felt at the time that we had a unique chance to influence the future of pathology software for decades to come.” One vision for the future led to imperfect pathology workflows forced by bad software the other to happy and inspired pathologists at work – and Ho wanted to do everything in his power to make the latter a reality.

With the rise of the pandemic, his aspirations were timely. “COVID-19 has really hastened the need for remote learning, and we have shifted all of our teaching to whole-slide images,” Ho explains. But, as education becomes increasingly democratized, he noticed a glaring absence – a platform for doctors, especially pathologists, to share their knowledge. To solve that problem, Ho created KiKo – an acronym for “knowledge in, knowledge out.” There, pathologists share collections of whole-slide image cases paired with videos and other forms of content. And KiKo is taking off. Recently, the platform hosted its first global digital dermatopathology grand rounds, in which dermatopathologists worldwide shared cases and traded their best tips and tricks.

“Digital pathology empowers the pathologist,” says Ho. “We are no longer chained to the location of the histology lab. On top of that, digital pathology opens up a whole new set of informatics and image analysis tools that we are just now starting to create.” He cautions, however, that not all digital pathology software offers a good user experience – and that it’s important to impress upon vendors the importance of having dedicated user experience designers on their team.

But is now the right time for a move to digital? Ho thinks so. “Hospital systems will save money with digital pathology by decreasing errors and dynamically distributing workloads,” he says. “I am hopeful that, because the pandemic has highlighted the need, regulatory agencies will be more willing to allow pathologists to adopt digital pathology.”

Ho’s Top Tip: “Garbage in = garbage out. To get good, clean images, the histology lab must put out good, clean slides.”

Demonstrating digital desirability
After Sylvain Mailhot first experienced digital slides during the Laval University virtual slide telepathology project, he realized its unlimited potential and was sure that the field would be fully digital within five years. Why? “You have immediate access to colleagues anywhere on the planet. You can transfer them the files and quickly obtain a second opinion,” explains Mailhot. “It can also be difficult to physically locate slides when required, so being able to access them immediately on a digital system saves a lot of time.” Mailhot also believes that digital pathology is more ergonomic than the traditional setup, with digital slide viewers offering a more comfortable working experience than sitting hunched over a microscope.

But why has digital pathology not been embraced fully across the field as Mailhot anticipated a decade ago? “Based on my experience working with digital slides and discussions with others across Canada and the US, the major obstacle to digital pathology is the resistance of the pathologist,” he says. Mailhot believes that, for digital adoption to succeed, the number of pathologists who are interested in, and enthusiastic about, digital pathology must reach a critical threshold. He acknowledges that there can be stumbling blocks when it comes to digital implementation – but none are impossible to overcome. “There is a lag between the production of the slides and the digital image when scanning your own slides, so it’s important to devise a system for prioritizing important slides,” Mailhot explains. “There will also be times when you are disappointed with the quality of the digital image, which is why we manually pre-scan all slides to detect any out-of-focus areas.”

It’s partly thanks to digital pathology that Mailhot’s lab has successfully adapted to the challenges of COVID-19. With no on-site work permitted during lockdown, the ability to diagnose cases digitally meant that pathologists could operate from the safety of their homes. Mailhot also highlights the workflow benefits in the event of illness. “If one pathologist cannot work, I can immediately resend the slides to a different pathologist on the system. Before, I would have had to go into their office to physically retrieve and redistribute the slides.”

Given the current rate of adoption, how does Mailhot propose to convince the doubters that digital pathology is the way forward? “Some pathologists have the misconception that digital slides will never look as good as glass under a microscope – and that they take more time to navigate. To prove that this isn’t the case, it’s important to show these people digital slides on a high-quality screen using an advanced system to demonstrate image quality and ease of use.” Once people are on board, Mailhot believes the second step to successful implementation is to work closely with the IT department. It’s not only a question of having a great image, he says, but also of having sufficient storage and network facilities to support a seamless digital system.

Although the transition to routine digital diagnosis can be costly for institutions at the outset, Mailhot believes that the long-term savings make it worthwhile. Consolidating digital slide production to one site, distributing slides digitally, and improving slide organization will all save money and time in the long run. “Looking further into the future, I think that sophisticated artificial intelligence (AI) tools will force laboratories to go digital,” says Mailhot. “Once the early promise of AI is realized, pre-screening slides into categories will save both time and money.”

After scanning slides and diagnosing them digitally for many years, Mailhot believes that institutions like his have a key role to play. “Labs that have already made the transition will build up experience and expert knowledge – and they need to make sure they pass it on to others to increase digital pathology’s value and to help with the transition.”

Mailhot’s Top Tip: “I think people have misconceptions about digital pathology. I would advise everyone to try it. Look at slides using a high-quality screen, computer, and scanner, and I think you’ll love it!”

Jonhan Ho is Assistant Professor of Dermatology and Pathology and Director of the Dermatopathology Unit at the University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA.

Sylvain Mailhot is Medical Director at PathAssistant Laboratory, Moncton, New Brunswick, Canada.



For virtually every patient with colorectal cancer (CRC), hematoxylin–eosin (HE)–stained tissue slides are available. These images contain quantitative information, which is not routinely used to objectively extract prognostic biomarkers. In the present study, we investigated whether deep convolutional neural networks (CNNs) can extract prognosticators directly from these widely available images.

Methods and findings

We hand-delineated single-tissue regions in 86 CRC tissue slides, yielding more than 100,000 HE image patches, and used these to train a CNN by transfer learning, reaching a nine-class accuracy of >94% in an independent data set of 7,180 images from 25 CRC patients. With this tool, we performed automated tissue decomposition of representative multitissue HE images from 862 HE slides in 500 stage I–IV CRC patients in the The Cancer Genome Atlas (TCGA) cohort, a large international multicenter collection of CRC tissue. Based on the output neuron activations in the CNN, we calculated a “deep stroma score,” which was an independent prognostic factor for overall survival (OS) in a multivariable Cox proportional hazard model (hazard ratio [HR] with 95% confidence interval [CI]: 1.99 [1.27–3.12], p = 0.0028), while in the same cohort, manual quantification of stromal areas and a gene expression signature of cancer-associated fibroblasts (CAFs) were only prognostic in specific tumor stages. We validated these findings in an independent cohort of 409 stage I–IV CRC patients from the “Darmkrebs: Chancen der Verhütung durch Screening” (DACHS) study who were recruited between 2003 and 2007 in multiple institutions in Germany. Again, the score was an independent prognostic factor for OS (HR 1.63 [1.14–2.33], p = 0.008), CRC-specific OS (HR 2.29 [1.5–3.48], p = 0.0004), and relapse-free survival (RFS HR 1.92 [1.34–2.76], p = 0.0004). A prospective validation is required before this biomarker can be implemented in clinical workflows.


In our retrospective study, we show that a CNN can assess the human tumor microenvironment and predict prognosis directly from histopathological images.


Learning Patient Outcomes with Deep Survival Convolutional Neural Networks.

The SCNN model architecture is depicted in Fig. 1 (Fig. S1 shows a detailed diagram). H&E-stained tissue sections are first digitized to whole-slide images. These images are reviewed using a web-based platform to identify regions of interest (ROIs) that contain viable tumor with representative histologic characteristics and that are free of artifacts (Methods) (34, 35). High-power fields (HPFs) from these ROIs are then used to train a deep convolutional network that is seamlessly integrated with a Cox proportional hazards model to predict patient outcomes. The network is composed of interconnected layers of image processing operations and nonlinear functions that sequentially transform the HPF image into highly predictive prognostic features. Convolutional layers first extract visual features from the HPF at multiple scales using convolutional kernels and pooling operations. These image-derived features feed into fully connected layers that perform additional transformations, and then, a final Cox model layer outputs a prediction of patient risk. The interconnection weights and convolutional kernels are trained by comparing risk predicted by the network with survival or other time-to-event outcomes using a backpropagation technique to optimize the statistical likelihood of the network (Methods).

The SCNN model. The SCNN combines deep learning CNNs with traditional survival models to learn survival-related patterns from histology images. (A) Large whole-slide images are generated by digitizing H&E-stained glass slides. (B) A web-based viewer is used to manually identify representative ROIs in the image. (C) HPFs are sampled from these regions and used to train a neural network to predict patient survival. The SCNN consists of (i) convolutional layers that learn visual patterns related to survival using convolution and pooling operations, (ii) fully connected layers that provide additional nonlinear transformations of extracted image features, and (iii) a Cox proportional hazards layer that models time-to-event data, like overall survival or time to progression. Predictions are compared with patient outcomes to adaptively train the network weights that interconnect the layers.

To improve the performance of SCNN models, we developed a sampling and risk filtering technique to address intratumoral heterogeneity and the limited availability of training samples (Fig. 2). In training, new HPFs are randomly sampled from each ROI at the start of each training iteration, providing the SCNN model with a fresh look at each patient’s histology and capturing heterogeneity within the ROI. Each HPF is processed using standard data augmentation techniques that randomly transform the field to reinforce network robustness to tissue orientation and variations in staining (33). The SCNN is trained using multiple transformed HPFs for each patient (one for each ROI) to further account for intratumoral heterogeneity across ROIs. For prospective prediction, we first sample multiple HPFs within each ROI to generate a representative collection of fields for the patient. The median risk is calculated within each ROI, and then, these median risks are sorted and filtered to predict a robust patient-level risk that reflects the aggressiveness of their disease while rejecting any outlying risk predictions. These sampling and filtering procedures are described in detail in Methods.

SCNN uses image sampling and filtering to improve the robustness of training and prediction. (A) During training, a single 256 × 256-pixel HPF is sampled from each region, producing multiple HPFs per patient. Each HPF is subjected to a series of random transformations and is then used as an independent sample to update the network weights. New HPFs are sampled at each training epoch (one training pass through all patients). (B) When predicting the outcome of a newly diagnosed patient, nine HPFs are sampled from each ROI, and a risk is predicted for each field. The median HPF risk is calculated in each region, these median risks are then sorted, and the second highest value is selected as the patient risk. This sampling and filtering framework was designed to deal with tissue heterogeneity by emulating manual histologic evaluation, where prognostication is typically based on the most malignant region observed within a heterogeneous sample. Predictions based on the highest risk and the second highest risk had equal performance on average in our experiments, but the maximum risk produced some outliers with poor prediction accuracy.

Assessing the Prognostic Accuracy of SCNN.

To assess the prognostic accuracy of SCNN, we assembled whole-slide image tissue sections from formalin-fixed, paraffin-embedded specimens and clinical follow-up for 769 gliomas from the TCGA (Dataset S1). This dataset comprises lower-grade gliomas (WHO grades II and III) and glioblastomas (WHO grade IV), contains both astrocytomas and oligodendrogliomas, and has overall survivals ranging from less than 1 to 14 y or more. A summary of demographics, grades, survival, and molecular subtypes for this cohort is presented in Table S1. The Digital Slide Archive was used to identify ROIs in 1,061 H&E-stained whole-slide images from these tumors.

The prognostic accuracy of SCNN models was assessed using Monte Carlo cross-validation. We randomly split our cohort into paired training (80%) and testing (20%) sets to generate 15 training/testing set pairs. We trained an SCNN model using each training set and then, evaluated the prognostic accuracy of these models on the paired testing sets, generating a total of 15 accuracy measurements (Methods and Dataset S1). Accuracy was measured using Harrell’s c index, a nonparametric statistic that measures concordance between predicted risks and actual survival (36). A c index of 1 indicates perfect concordance between predicted risk and overall survival, and a c index of 0.5 corresponds to random concordance.

For comparison, we also assessed the prognostic accuracy of baseline linear Cox models generated using the genomic biomarkers and manual histologic grades from the WHO classification of gliomas (Fig. 3A). The WHO assigns the diffuse gliomas to three genomic subtypes defined by mutations in the isocitrate dehydrogenase (IDH) genes (IDH1/IDH2) and codeletion of chromosomes 1p and 19q. Within these molecular subtypes, gliomas are further assigned a histologic grade based on criteria that vary depending on cell of origin (either astrocytic or oligodendroglial). These criteria include mitotic activity, nuclear atypia, the presence of necrosis, and the characteristics of microvascular structures (microvascular proliferation). Histologic grade remains a significant determinant in planning treatment for gliomas, with grades III and IV typically being treated aggressively with radiation and concomitant chemotherapy.

Prognostication criteria for diffuse gliomas. (A) Prognosis in the diffuse gliomas is determined by genomic classification and manual histologic grading. Diffuse gliomas are first classified into one of three molecular subtypes based on IDH1/IDH2 mutations and the codeletion of chromosomes 1p and 19q. Grade is then determined within each subtype using histologic characteristics. Subtypes with an astrocytic lineage are split by IDH mutation status, and the combination of 1p/19q codeletion and IDH mutation defines an oligodendroglioma. These lineages have histologic differences however, histologic evaluation is not a reliable predictor of molecular subtype (37). Histologic criteria used for grading range from nuclear morphology to higher-level patterns, like necrosis or the presence of abnormal microvascular structures. (B) Comparison of the prognostic accuracy of SCNN models with that of baseline models based on molecular subtype or molecular subtype and histologic grade. Models were evaluated over 15 independent training/testing sets with randomized patient assignments and with/without training and testing sampling. (C) The risks predicted by the SCNN models correlate with both histologic grade and molecular subtype, decreasing with grade and generally trending with the clinical aggressiveness of genomic subtypes. (D) Kaplan–Meier plots comparing manual histologic grading and SCNN predictions. Risk categories (low, intermediate, high) were generated by thresholding SCNN risks. N/A, not applicable.

SCNN models showed substantial prognostic power, achieving a median c index of 0.754 (Fig. 3B). SCNN models also performed comparably with manual histologic-grade baseline models (median c index 0.745, P = 0.307) and with molecular subtype baseline models (median c index 0.746, P = 4.68e-2). Baseline models representing WHO classification that integrate both molecular subtype and manual histologic grade performed slightly better than SCNN, with a median c index of 0.774 (Wilcoxon signed rank P = 2.61e-3).

We also evaluated the impact of the sampling and ranking procedures shown in Fig. 2 in improving the performance of SCNN models. Repeating the SCNN experiments without these sampling techniques reduced the median c index of SCNN models to 0.696, significantly worse than for models where sampling was used (P = 6.55e-4).

SCNN Predictions Correlate with Molecular Subtypes and Manual Histologic Grade.

To further investigate the relationship between SCNN predictions and the WHO paradigm, we visualized how risks predicted by SCNN are distributed across molecular subtype and histologic grade (Fig. 3C). SCNN predictions were highly correlated with both molecular subtype and grade and were consistent with expected patient outcomes. First, within each molecular subtype, the risks predicted by SCNN increase with histologic grade. Second, predicted risks are consistent with the published expected overall survivals associated with molecular subtypes (37). IDH WT astrocytomas are, for the most part, highly aggressive, having a median survival of 18 mo, and the collective predicted risks for these patients are higher than for patients from other subtypes. IDH mutant astrocytomas are another subtype with considerably better overall survival ranging from 3 to 8 y, and the predicted risks for patients in this subtype are more moderate. Notably, SCNN risks for IDH mutant astrocytomas are not well-separated for grades II and III, consistent with reports of histologic grade being an inadequate predictor of outcome in this subtype (38). Infiltrating gliomas with the combination of IDH mutations and codeletion of chromosomes 1p/19q are classified as oligodendrogliomas in the current WHO schema, and these have the lowest overall predicted risks consistent with overall survivals of 10+ y (37, 39). Finally, we noted a significant difference in predicted risks when comparing the IDH mutant and IDH WT grade III astrocytomas (rank sum P = 6.56e-20). These subtypes share an astrocytic lineage and are graded using identical histologic criteria. Although some histologic features are more prevalent in IDH- mutant astrocytomas, these features are not highly specific or sensitive to IDH mutant tumors and cannot be used to reliably predict IDH mutation status (40). Risks predicted by SCNN are consistent with worse outcomes for IDH WT astrocytomas in this case (median survival 1.7 vs. 6.3 y in the IDH mutant counterparts), suggesting that SCNN models can detect histologic differences associated with IDH mutations in astrocytomas.

We also performed a Kaplan–Meier analysis to compare manual histologic grading with “digital grades” based on SCNN risk predictions (Fig. 3D). Low-, intermediate-, and high-risk categories were established by setting thresholds on SCNN predictions to reflect the proportions of manual histologic grades in each molecular subtype (Methods). We observed that, within each subtype, the differences in survival captured by SCNN risk categories are highly similar to manual histologic grading. SCNN risk categories and manual histologic grades have similar prognostic power in IDH WT astrocytomas (log rank P = 1.23e-12 vs. P = 7.56e-11, respectively). In IDH mutant astrocytomas, both SCNN risk categories and manual histologic grades have difficulty separating Kaplan–Meier curves for grades II and III, but both clearly distinguish grade IV as being associated with worse outcomes. Discrimination for oligodendroglioma survival is also similar between SCNN risk categories and manual histologic grades (log rank P = 9.73e-7 vs. P = 8.63e-4, respectively).

Improving Prognostic Accuracy by Integrating Genomic Biomarkers.

To integrate both histologic and genomic data into a single unified prediction framework, we developed a genomic survival convolutional neural network (GSCNN model). The GSCNN learns from genomics and histology simultaneously by incorporating genomic data into the fully connected layers of the SCNN (Fig. 4). Both data are presented to the network during training, enabling genomic variables to influence the patterns learned by the SCNN by providing molecular subtype information.

GSCNN models integrate genomic and imaging data for improved performance. (A) A hybrid architecture was developed to combine histology image and genomic data to make integrated predictions of patient survival. These models incorporate genomic variables as inputs to their fully connected layers. Here, we show the incorporation of genomic variables for gliomas however, any number of genomic or proteomic measurements can be similarly used. (B) The GSCNN models significantly outperform SCNN models as well as the WHO paradigm based on genomic subtype and histologic grading.

We repeated our experiments using GSCNN models with histology images, IDH mutation status, and 1p/19q codeletion as inputs and found that the median c index improved from 0.754 to 0.801. The addition of genomic variables improved performance by 5% on average, and GSCNN models significantly outperform the baseline WHO subtype-grade model trained on equivalent data (signed rank P = 1.06e-2). To assess the value of integrating genomic variables directly into the network during training, we compared GSCNN with a more superficial integration approach, where an SCNN model was first trained using histology images, and then, the risks from this model were combined with IDH and 1p/19q variables in a simple three-variable Cox model (Fig. S2). Processing genomic variables in the fully connected layers and including them in training provided a statistically significant benefit models trained using the superficial approach performed worse than GSCNN models with median c index decreasing to 0.785 (signed rank P = 4.68e-2).

To evaluate the independent prognostic power of risks predicted by SCNN and GSCNN, we performed a multivariable Cox regression analysis (Table 1). In a multivariable regression that included SCNN risks, subtype, grade, age, and sex, SCNN risks had a hazard ratio of 3.05 and were prognostic when correcting for all other features, including manual grade and molecular subtype (P = 2.71e-12). Molecular subtype was also significant in the SCNN multivariable regression model, but histologic grade was not. We also performed a multivariable regression with GSCNN risks and found GSCNN to be significant (P = 9.69e-12) with a hazard ratio of 8.83. In the GSCNN multivariable regression model, molecular subtype was not significant, but histologic grade was marginally significant. We also used Kaplan–Meier analysis to compare risk categories generated from SCNN and GSCNN (Fig. S3). Survival curves for SCNN and GSCNN were very similar when evaluated on the entire cohort. In contrast, their abilities to discriminate survival within molecular subtypes were notably different.

Hazard ratios for single- and multiple-variable Cox regression models

Visualizing Histologic Patterns Associated with Prognosis.

Deep learning networks are often criticized for being black box approaches that do not reveal insights into their prediction mechanisms. To investigate the visual patterns that SCNN models associate with poor outcomes, we used heat map visualizations to display the risks predicted by our network in different regions of whole-slide images. Transparent heat map overlays are frequently used for visualization in digital pathology, and in our study, these overlays enable pathologists to correlate the predictions of highly accurate survival models with the underlying histology over the expanse of a whole-slide image. Heat maps were generated using a trained SCNN model to predict the risk for each nonoverlapping HPF in a whole-slide image. The predicted risks were used to generate a color-coded transparent overlay, where red and blue indicate higher and lower SCNN risk, respectively.

A selection of risk heat maps from three patients is presented in Fig. 5, with inlays showing how SCNNs associate risk with important pathologic phenomena. For TCGA-DB-5273 (WHO grade III, IDH mutant astrocytoma), the SCNN heat map clearly and specifically highlights regions of early microvascular proliferation, an advanced form of angiogenesis that is a hallmark of malignant progression, as being associated with high risk. Risk in this heat map also increases with cellularity, heterogeneity in nuclear shape and size (pleomorphism), and the presence of abnormal microvascular structures. Regions in TCGA-S9-A7J0 have varying extents of tumor infiltration ranging from normal brain to sparsely infiltrated adjacent normal regions exhibiting satellitosis (where neoplastic cells cluster around neurons) to moderately and highly infiltrated regions. This heat map correctly associates the lowest risks with normal brain regions and can distinguish normal brain from adjacent regions that are sparsely infiltrated. Interestingly, higher risks are assigned to sparsely infiltrated regions (region 1, Top) than to regions containing relatively more tumor infiltration (region 2, Top). We observed a similar pattern in TCGA-TM-A84G, where edematous regions (region 1, Bottom) adjacent to moderately cellular tumor regions (region 1, Top) are also assigned higher risks. These latter examples provide risk features embedded within histologic sections that have been previously unrecognized and could inform and improve pathology practice.

Visualizing risk with whole-slide SCNN heat maps. We performed SCNN predictions exhaustively within whole-slide images to generate heat map overlays of the risks that SCNN associates with different histologic patterns. Red indicates relatively higher risk, and blue indicates lower risk (the scale for each slide is different). (Top) In TCGA-DB-5273, SCNN clearly and specifically predicts high risks for regions of early microvascular proliferation (region 1) and also, higher risks with increasing tumor infiltration and cell density (region 2 vs. 3). (Middle) In TCGA-S9-A7J0, SCNN can appropriately discriminate between normal cortex (region 1 in Bottom) and adjacent regions infiltrated by tumor (region 1 in Top). Highly cellular regions containing prominent microvascular structures (region 3) are again assigned higher risks than lower-density regions of tumor (region 2). Interestingly, low-density infiltrate in the cortex was associated with high risk (region 1 in Top). (Bottom) In TCGA-TM-A84G, SCNN assigns high risks to edematous regions (region 1 in Bottom) that are adjacent to tumor (region 1 in Top).

Microscopy Solutions for Histology and Histopathology

Pathology, histopathology or histology aims to study the manifestation of disease by microscopic examination of tissue morphology. In pathology, the sample to be examined under the microscope usually is the result of a surgery, biopsy or autopsy after fixation, clearing/embedding and sectioning of the tissue specimen. Alternatively, frozen section processing with a cryostat is done when rapid results are required (e.g. during surgery) or fixation would be detrimental to target structures such as lipids or certain antigens. The tissue sections after fixation and wax embedding are typically cut into two to five micron thin slices with a microtome before staining and transfer to a glass slide for examination with a light microscope. Typical specimens in pathology are colon, kidney, pancreas, cervix, lung, breast, prostate, or connective tissue.

While various staining procedures for human/animal and plant tissues have been developed as early as the 17th century it was the German physician Rudolf Virchow who is being considered the father of modern histopathology. Virchow realized the potential of the emerging new microscope techniques of the 19th century for his groundbreaking research, published a vast amount of scientific writing and created an impressive collection of thousands of histopathological sample slides, thus building the foundation of modern histology and cancer research.

Histology slide preparation begins with fixation of the tissue specimen. This is a crucial step in tissue preparation, and its purpose is to prevent tissue autolysis and putrefaction. For best results, the biological tissue samples should be transferred into fixative immediately after collection, usually in 10% neutral buffered formalin for 24 to 48 hours. After fixation, specimens are trimmed using a scalpel to enable them to fit into an appropriately labelled tissue cassette that is stored in formalin until processing begins.

The first step of processing is dehydration, which involves immersing your specimen in increasing concentrations of alcohol to remove the water and formalin from the tissue. Clearing is the next step, in which an organic solvent such as xylene is used to remove the alcohol and allow infiltration with paraffin wax. Embedding is the final step, where specimens are infiltrated with the embedding agent – usually paraffin wax which provides a support matrix that allows for very thin sectioning. A microtome is used to slice extremely thin tissue sections off the block in the form of a ribbon, following histochemical staining (typically haematoxylin and eosin - “HE stain”) to provide contrast to tissue sections, making tissue structures better visible and easier to evaluate. In certain cases immunohistochemical stainings (IHC), such as HER2 or Ki-67, are required for further analysis.

Digital Image Analyses on Whole-Lung Slides in Mouse Models of Acute Pneumonia

Descriptive histopathology of mouse models of pneumonia is essential in assessing the outcome of infections, molecular manipulations, or therapies in the context of whole lungs. Quantitative comparisons between experimental groups, however, have been limited to laborious stereology or ill-defined scoring systems that depend on the subjectivity of a more or less experienced observer. Here, we introduce self-learning digital image analyses that allow us to transform optical information from whole mouse lung sections into statistically testable data. A pattern-recognition–based software and a nuclear count algorithm were adopted to quantify user-defined pathologies from whole slide scans of lungs infected with Streptococcus pneumoniae or influenza A virus compared with PBS-challenged lungs. The readout parameters “relative area affected” and “nuclear counts per area” are proposed as relevant criteria for the quantification of lesions from hematoxylin and eosin–stained sections, also allowing for the generation of a heat map of, for example, immune cell infiltrates with anatomical assignments across entire lung sections. Moreover, when combined with immunohistochemical labeling of marker proteins, both approaches are useful for the identification and counting of, for example, immune cell populations, as validated here by direct comparisons with flow cytometry data. The solutions can easily and flexibly be adjusted to specificities of different models or pathogens. Automated digital analyses of whole mouse lung sections may set a new standard for the user-defined, high-throughput comparative quantification of histological and immunohistochemical images. Still, our algorithms established here are only a start, and need to be tested in additional studies and other applications in the future.

For many decades, classical morphologic histopathology has served as an essential readout tool for the evaluation of tissue lesions and immune cell infiltrations in animal lung infection experiments (e.g., in the assessment of treatment effects or in the discovery of molecular signaling pathways). For example, several defined histological lesion patterns are considered main diagnostic features in the assessment of experimental acute lung injury in mice (1). To further capitalize on this primarily descriptive information, scoring systems have been established to allow for a first semiquantitative assessment of experimentally induced pathologies and a rough comparison to controls (1, 2). In the lung, such scoring systems usually include several parameters reflecting characteristic tissue lesions and cell infiltrations that are semiquantified by subjective estimations and grading of lesion distribution and severity (3). To add a more quantitative approach, single scoring parameters may be quantified by manual counting or measuring of selected events or representative areas. However, these approaches are time consuming, unprecise, and inherently subjective, and therefore problematic for the analysis of large datasets or the reliable detection of small differences between groups (4, 5). Clearly, a more accurate, reproducible, and more sensitive yet practical approach for the quantification of lesional areas or events amenable to high-throughput studies would be desirable for the collection, analysis, interpretation, and communication of histopathological results, particularly for direct comparisons among experimental groups or with biochemical or molecular data (6). Today, stereology is regarded as the gold standard for the absolute estimation of numerical endpoints, such as cell numbers, surfaces, or volumes. However, several limitations apply to stereology, including high cost and time expenses, as well as the general lack of its usability for archival material in retrospective studies (5).

Recent developments in whole-slide digital imaging offer promising potential in the optimization and more computerized examination of histopathological specimens, including the retrospective examination of nonrandomly embedded samples (7, 8). This whole-slide scanning (WSS) technology permits an entire histological slide to be optically scanned into a binary image data file, which can be visualized, evaluated, and processed on a computer screen without a microscope (8, 9). In addition, rapidly developing two dimensional (2D) morphometric digital image analysis (DIA) tools accelerate and simplify the accurate conversion of descriptive or semiquantitative histopathology data into numerical data amenable to statistical tests (7, 10).

Computer-assisted pattern-recognition image analysis software offers a new dimension in the automated and self-learning identification and quantification of regions of interest within digitized histological images (11). Specifically in pneumonia, inhomogeneously distributed areas of inflammation characterized by various parameters, such as infiltration of immune cells, edema, structural changes, such as collapse or emphysema, or areas of necrosis could be considered potential regions of interest. The self-learning pattern-recognition image analysis software, GENIE, was designed for high-throughput analyses, enabling the investigation of large batches of digitally scanned histoslides in a practical time frame (7, 10). A second recent development in the computerized analysis of scanned histoslides, termed “whole-cell quantification,” allows for the identification and enumeration of a wide spectrum of histological patterns or events, such as hematoxylin and eosin (H&E)–stained cells or nuclei or immunohistochemical signals (9). Instead of yielding absolute estimations as obtained by stereology (12, 13), however, such quantifications of patterns and events on 2D tissue sections are particularly suitable for the generation of ratios or densities, such as cell numbers per defined anatomical structure or square millimeter.

To date, such developments in automated DIA, specifically the GENIE and whole cell quantification software systems, have only sporadically been used for the examination of mouse lungs, such as in oncology (14), but not in experimental pneumonia. Here, we introduce the pattern-recognition–based GENIE algorithm and the v9 nuclear count (v9 nc) algorithm for the relative morphometrical quantification of two-dimensional optical information obtained by digital scanning of whole-organ H&E-stained sections from PBS-challenged lungs or lungs infected with Streptococcus pneumoniae or influenza A virus (IAV). In our comparisons with data obtained by flow cytometry (FC), the nuclear count algorithm also proved operational for the relative quantification of immunohistochemically labeled subtypes of immune cells, expanding its applicability to the level of in situ quantification of molecular markers of disease.

Details on mice, infection procedures, processing, histology, immunohistochemistry (IHC), FC, and digital image analyses are in the M ethods in the data supplement.

All lung tissues were derived from experiments primarily conducted for purposes other than this study and published elsewhere (15–18). All animal procedures were approved by institutional ethics committees of Charité-Universitätsmedizin Berlin, Justus-Liebig University, Gießen, University Hospital of Jena, and local governmental authorities (LaGeSo Berlin, RP-Gießen, TLV-Thüringen approval ID: A-0050/15, G 0139/14, 02-067/11, 02-043/15 [S. pneumoniae], G 0358/11 [PBS], G 0044/11 [IAV]). Animal studies were conducted in accordance with the Federation of European Laboratory Animal Science Associations guidelines and recommendations for the care and use of laboratory animals, and all efforts were made to minimize animal discomfort and suffering.

Mice infected with 5 × 10 6 cfu S. pneumoniae PN36 (serotype 3)–, 5 × 10 7 cfu S. pneumoniae D39 (serotype 2)–, or 100 pfu PR8 influenza virus– and PBS-challenged mice were humanely killed at 24 or 48 hours (S. pneumoniae, PBS) or 7 days (IAV), as described previously (15–19). Lungs were carefully removed after tracheal ligation to prevent alveolar collapse, immersion fixed in formalin pH 7.0 for 24–48 hours, embedded in paraffin, and cut into 2-μm sections.

Whole-lung sections (n = 4–8 mice per group) were stained with H&E or processed for IHC. For the detection of CD68 (macrophages/monocytes), neutrophil elastase (neutrophils), CD3 (T lymphocytes) or CD45R (B lymphocytes), polyclonal rabbit antibodies, or a monoclonal rat antibody were used, respectively. IHC slides were counterstained with hemalaun. All slides were dehydrated through graded ethanols, cleared in xylene, and coverslipped.

Lung leukocytes were isolated and stained with anti-CD11c (N418 ATCC), anti-CD11b (M1/70 eBioscience), anti-F4/80 (BM8 eBioscience), anti-Ly6G (1A8 BD), anti-CD3 (17A2 eBioscience), anti-CD4 (RM4–5 BD), anti-B220 (RA3–6B2 eBioscience), and anti-CD19 (1D3 BD) antibodies. All stained cells were acquired using a BD FACS Canto II. Cells were analyzed with BD FACSDiva software.

Stained slides were automatically digitized (n = 4–8 per group) using the Aperio CS2 scanner (Leica Biosystems Imaging Inc.) at 400× magnification (0.25 μm/pixel resolution). For pattern recognition of affected versus unaffected lung tissue, the Aperio GENIE histology pattern recognition software (Leica Biosystems Imaging Inc.) was used. Representative areas of each tissue class of interest (Table 1) were identified by a trained experimental veterinary pathologist based upon histological features, and compiled in a digital montage at 100× magnification to establish a GENIE classifier. This classifier was trained on various lung slides of one batch to distinguish affected from unaffected areas, background, and glass, allowing for model-specific characterization and adaption (Table 1). The Aperio v9 nc algorithm was employed for the quantification of total numbers of nuclei or cells stained with hemalaun or by IHC with specific settings for each staining and antigen used (see Table E1 in the data supplement).

Table 1. Definition of Tissue Classes in the Models Analyzed

Data are expressed as mean (±SEM). Statistical analyses were performed using one-way ANOVA/Sidak’s multiple comparison test and two-way ANOVA/Turkey’s multiple comparison test. P values less than 0.05 were considered significant (GraphPad PRISM 7 Graph Pad Software Inc.).

One of the most commonly used quantitative readout parameters in lung histology is the estimation of affected tissue area (16, 17). However, both subjective estimations and simple measurements using maximal sizes in two dimensions are obviously prone to result in unprecise data, with limited reproducibility when performed by a human being. Here, horizontal whole-lung slides of consecutive planes of S. pneumoniae–infected lungs were digitized by a pathology histoslide scanner to create an image data file that can be used for further DIA ( Figure 1A e.g., using the GENIE tool). The establishment of such a new GENIE algorithm starts with the definition of different classes of lesions, including affected lung tissue ( Figure 1B , left panel, labeled in red), unaffected lung tissue ( Figure 1B , central panel, labeled in green), background ( Figure 1B , right panel, labeled in gray), and glass only ( Figure 1B , right panel, labeled in beige). The software offers different modalities of annotations, such as rectangular or circular shapes or a freehand tool. To this end, a montage was created ( Figure 1C ) that included the four differentially characterized tissue classes (Table 1) that subsequently were to be automatically identified by the software ( Figure 1C ), followed by precise quantification. The GENIE algorithm predominantly extracts multispectral information from images with additional image processing performed by spatial, logical, and threshold operators. The specific GENIE algorithm was refined during evolutionary computational learning by an iterative learning process, using a minimum of 500 iterations, to identify the unique spatial-spectral features for the discrimination of each class of target tissue. A final mean training accuracy of 95% or greater was predefined as sufficient. The final algorithm, termed “classifier” in the GENIE software, was applied to whole-lung slide scans ( Figure 1D ) after manual annotation (green line) to exclude nonlung tissues, such as adipose and lymphoid tissues. All four classes specified were accurately identified throughout the entire lung sections ( Figures 1D and 1E ) by the specific GENIE classifier, as expected by the board-certified pathologists ( Figure 1F ). Each GENIE classifier was generated separately for each infection model to capture their unique and pathogen-specific features.

Figure 1. Digital image analysis (DIA)—GENIE and v9 nuclear count (v9 nc). (AF) DIA of Streptococcus pneumoniae–infected lungs using the GENIE algorithm. (A) Lungs were visualized by whole-slide scanning (WSS) technology. (B) The determination of different classes of lung tissue by creating a training set resulted in a montage (C), which allows for differentiation between affected (B, left panel, red) and unaffected (B, central panel, green) lung areas, background (B, right panel, gray), and glass (B, right panel, beige). (D) After manual annotation of lungs (green line) and based on the previously generated training set and montage, the GENIE classifier for the discrimination of affected and unaffected lung areas was generated. (E) All of the defined classes, such as areas of inflammation (e.g., characterized by infiltration of immune cells into alveolar spaces arrowhead), were accurately identified by the newly generated GENIE algorithm. (GI) DIA of S. pneumoniae–infected lungs using the v9 nc algorithm. (G) After manual annotation of lungs (green line), the adapted and modified v9 algorithm for the quantification of nuclei was applied on whole-lung scans. (H and I) Total number of hemalaun-stained nuclei were accurately (insets, arrowhead) and reliably quantified. Scale bar (A): 1 mm, also applies to D and G scale bar (E): 100 μm, also applies to F scale bars (B and H): 50 μm, also applies to I and scale bar (insets in H): 10 μm, also applies to inset in I.

The determination of total or relative numbers of cells or specific subsets of cells such as neutrophils by manual counting or FC analyses is a typical parameter in the outcome assessment of experimental pneumonia (16, 17). To further improve the readout options for histological slides from lung infection models, the v9 nc algorithm for the quantification of nuclei or cells was adapted here to meet the specific conditions (Table E1) in H&E-stained WSS of S. pneumoniae–infected lungs ( Figure 1G ). After manual annotation of lung tissues ( Figure 1G , green line), H&E-stained nuclei ( Figure 1H , inset) were quantified by the modified algorithm ( Figure 1I , inset).

The precise digital quantification of affected lung areas, as well as the counting of cells or events per area in histological sections, could represent a powerful, objective, reliable, and highly reproducible yet practical tool for the relative comparison of experimental groups (e.g., treated versus untreated groups) or for the assessments of effects of genetic modifications of pathogens or hosts. WSS of lungs from PBS-challenged control mice and mice infected with S. pneumoniae or IAV ( Figure 2A ) were compared using specifically modified GENIE classifiers for the determination of affected lung areas and specifically adapted v9 nc algorithms for the enumeration of total nuclei per area within whole-lung sections ( Figure 2A ). When ratios or densities, such as cells per area, are quantified on 2D tissue sections containing inhomogeneously distributed lesions, it is imperative to equalize preparation, embedding, and sectioning procedures to minimize technical artifacts. In addition, unavoidable tissue shrinkage during paraffin embedding may be a source of error in group comparisons when shrinkage artifacts differ between experimental groups (5). To this end, total lung areas of equally processed WSS were measured before digital analyses and compared between experimental groups ( Figure 2B ), assuming equal initial sizes of lungs from mice of the same strain, age, and sex. No differences were recorded between whole-lung sectional areas of mice infected with S. pneumoniae (80.13 ± 21.89 mm 2 ), IAV (75.82 ± 16.20 mm 2 ), and challenged with PBS (77.96 ± 14.40 mm 2 Figure 2B ). In contrast, the areas of affected lung parenchyma were significantly increased in infected mice compared with PBS-challenged controls (PBS, 12.80 ± 5.53% S. pneumoniae, 44.53 ± 10.58% IAV, 28.97 ± 4.90% Figure 2C ). Consistently, numbers of total nuclei per entire lung section were also significantly increased after S. pneumoniae (750,404 ± 181,110 nuclei) or IAV (541,023 ± 157,223 nuclei) infection compared with PBS-challenged lungs (400,395 ± 51,519 nuclei Figure 2D ). Similar differences were recorded for the overall density of nuclei per square millimeter lung area (PBS, 5,282 ± 878 nuclei/mm 2 S. pneumoniae, 10,342 ± 1,163 nuclei/mm 2 and IAV, 7,733 ± 575 nuclei/mm 2 lung section Figure 2E ). Of note, lungs infected with S. pneumoniae were significantly more severely affected when compared with IAV-infected lungs in terms of both higher total cell counts and larger areas affected ( Figures 2A, 2C, and 2E ).

Figure 2. Quantitative comparisons between lung infection models and reproducibility of digital image analyses. (A) WSS of lungs of PBS-challenged and infected mice (Streptococcus pneumoniae [Spn] and influenza A virus [IAV]) were compared in terms of affected lung area and total nuclei count using the GENIE and v9 nuclear count (v9 nc) algorithms. (B) Total cross-sectional areas of all lungs were measured before digital image analyses. (C) The percentages of affected lung areas as well as (D) total nuclei per entire lung area were determined and (E) normalized to the cross-sectional area to assess total nuclei per area (mm 2 ). (F and G) Identical analyses of lungs infected with two different serotypes (ST) of pneumococci (ST2, D39 ST3, PN36) or PBS-challenged controls were performed at two different time points, 24 h and 48 h post-infection. Values are given as mean (±SEM n = 4–8 each group). ## P < 0.01, ### P < 0.001, #### P < 0.0001 versus PBS-controls. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001 as indicated using one-way ANOVA/Sidak’s multiple comparison test (B and E) and two-way ANOVA/Turkey’s multiple comparison test (F and G). Scale bar (WSS): 1 mm. (H) The entire procedure of quantification including the scanning of histoslides was repeated three times using four differently affected whole-lung slides from pneumococcal pneumonia and the v9 nc and GENIE algorithms. Values are given as mean (±SEM n = 4 samples with three repetitions per sample).

Digital image analyses of lungs infected with two different serotypes of pneumococci, serotypes 2 and 3, or PBS-challenged controls were performed at two different time points, 24 and 48 hours postinfection, to quantify and compare different severities of pneumonia in a strain- and time-dependent fashion. Severity parameters measured were the percentages of affected lung areas as well as absolute and relative numbers of cells in the lungs. The data indeed revealed a time- and strain-dependent increase in the percentages of affected areas ( Figure 2F ). as well as in the number of cells ( Figure 2G ). When compared with PBS controls, serotype 3 strain PN36 induced significantly larger areas of pneumonia in a timepoint-dependent manner (24 h, PBS, 9.80 ± 3.69% D39, 18.34 ± 10.38% PN36, 23.97 ± 8.21% 48 h, PBS, 10.36 ± 6.74% D39, 34.23 ± 3.56% PN36, 46.52 ± 9.52% Figure 2F ). A very similar effect was observed for the number of cells (24 h, PBS, 4,560 ± 378 nuclei/mm 2 D39, 5,779 ± 1,167 nuclei/mm 2 PN36, 7,584 ± 972 nuclei/mm 2 lung section 48 h, PBS, 5,147 ± 1,205 nuclei/mm 2 D39, 7,574 ± 848 nuclei/mm 2 PN36, 10,186 ± 995 nuclei/mm 2 lung section Figure 2G ). Moreover, consistent with its known lower capability of inducing pneumonia (20), serotype 2 strain D39 induced lower increases in both parameters than the PN36 strain at 24 and 48 hours, again in a time-dependent fashion ( Figures 2F and 2G ).

To scrutinize the precision in terms of reproducibility of both applications (GENIE and v9nc), the entire procedures were repeated three times using four differently affected pneumococcal pneumonia samples as templates. To this end, the same four lung slides were rescanned, manually reannotated, and reanalyzed by the v9 nc and the GENIE classifier three times ( Figure 2H ). Accordance between the final data obtained along the entire procedure reached means of 99.70 (±0.19)% for the v9 nc algorithm and 99.03 (±0.63)% for the GENIE classifier suggesting high reproducibility of the results obtained by both applications ( Figure 2H ).

We hypothesized that immunohistochemical signals may serve as countable events in the v9 nc algorithm, similar to hemalaun-stained cellular nuclei. For the counting of various subsets of immune cells, IHC was performed on S. pneumoniae–infected lungs using antibodies against CD68 (macrophages/monocytes, Figure 3A ), neutrophil elastase (neutrophils Figure 3B ), CD3 (T cells, Figure 3C ), or CD45R (B cells Figure 3D ). The v9 nc algorithm was adapted to each staining protocol (Table E1) and allowed for the reliable identification and counting of immunohistochemically labeled cells ( Figures 3E–3H ). Output data included the total and relative numbers of positive cells per square millimeter of lung area as well as different staining intensities, here defined as unstained (0+), weak (1+), moderate (2+), or strong (3+) signals, the average nuclear red, green, and blue intensities, the average nuclear sizes in pixels and square millimeters, and the area of analysis in pixels or square millimeters ( Figures 3E–3H ). Comparisons of relative immune cell populations in S. pneumoniae– versus IAV-infected lungs revealed that all cell types analyzed were significantly, but mostly oppositionally, recruited in the two infection models ( Figure 3I ). Although neutrophils represented the dominating leukocytes in S. pneumoniae infection (S. pneumoniae, 3,627 ± 228 labeled cells/mm 2 IAV, 885 ± 126 labeled cells/mm 2 ), T cells (S. pneumoniae, 1,164 ± 282 labeled cells/mm 2 IAV, 2,583 ± 235 labeled cells/mm 2 ), and macrophages (S. pneumoniae, 905 ± 93 labeled cells/mm 2 IAV, 1,770 ± 273 labeled cells/mm 2 ) were the major cell types in IAV-infected lungs. B cells (S. pneumoniae, 220 ± 88 labeled cells/mm 2 IAV, 370 ± 66 labeled cells/mm 2 ) were less recruited to S. pneumoniae– and IAV-infected lungs when compared with PBS controls and represented the lowest numbers of immune cells analyzed in both infection models ( Figure 3I ). T cells dominated in PBS controls (1,494 ± 179 labeled cells/mm 2 ), with similar numbers in S. pneumoniae–infected lungs, but significantly higher numbers after IAV infection ( Figure 3I ). Macrophages were slightly lower in number than T cells in PBS-challenged lungs (880 ± 79 labeled cells/mm 2 ), but also significantly increased in IAV-infected lungs. B cells (725 ± 90 labeled cells/mm 2 ) as the third common population in PBS-challenged lungs were significantly decreased after infection with both pathogens, whereas the number of neutrophils (512 ± 233 labeled cells/mm 2 ) was significantly lower in PBS lungs when compared with bacterial infection, as expected ( Figure 3I ).

Figure 3. Immunophenotyping and quantification of cells and validation by comparison with flow cytometry (FC) data. (AD) Immunohistochemical staining of Streptococcus pneumoniae–infected lung sections for detection of CD68 + (A, macrophages/monocytes), neutrophil elastase–positive (B, neutrophils) CD3 + (C, T cells), and CD45R + (D, B cells) cells. (EH) DIA of immunohistochemically stained lungs using the adapted v9 nc algorithm for each antigen (Table E1). Output data included several types of information, such as number and intensities of stained cells (insets). (I) Quantification results of labeled cells per square millimeter lung area in S. pneumoniae– or IAV-infected lungs and PBS-challenged controls. (J and K) Comparison of relative immune cell populations determined by DIA and FC analyses in PBS-challenged (J) and S. pneumoniae–infected lungs (K). Values are given as mean (±SEM n = 4–5 each group). #### P < 0.0001 versus PBS-controls. ****P < 0.0001 as indicated using one-way ANOVA/Sidak’s multiple comparison test. Scale bar (H): 50 μm, also applies to AG. Scale bar inset (H): 20 μm, also applies to inset in EG.

For the validation of digitally obtained data from 2D whole-lung sections, relative immune cell populations extracted from entire lungs after PBS challenge or S. pneumoniae infection were determined by FC (Figure E1) and calculated similarly for direct comparison with the ratios obtained by DIA. In PBS-challenged lungs, relative numbers of macrophages (DIA, 24.05 ± 1.83% FC, 27.58 ± 3.09%), neutrophils (DIA, 13.40 ± 4.83% FC, 10.50 ± 1.72%), T cells (DIA, 40.78 ± 4.13% FC, 31.72 ± 2.29%), and B cells (DIA, 21.77 ± 1.89% FC, 30.20 ± 2.11%) were almost similar in both methods ( Figure 3J ), which was also true for the numbers of macrophages (DIA, 14.93 ± 1.17% FC, 14.57 ± 1.47%), neutrophils (DIA, 60.27 ± 7.12% FC, 72.75 ± 1.50%), T cells (DIA, 19.02 ± 3.82% FC, 8.83 ± 0.86%), and B cells (DIA, 5.79 ± 3.77% FC, 3.85 ± 0.50%) in S. pneumoniae–infected lungs ( Figure 3K ). In fact, high concordance was observed for most data obtained by these two approaches ( Figures 3J and 3K ), suggesting that the results from 2D DIA may, in fact, be valid and, indeed, resemble the situation in whole lungs.

The two different approaches of automated digital pattern recognition established here allow for the quantification of two universally relevant aspects of the histopathology of pneumonia on 2D whole-lung slides, including the relative and absolute area of inflamed lung tissue and the number and density of cells. Moreover, the v9 nc algorithm was successfully used to quantify immunohistochemically labeled subsets of immune cells. Clearly, the parameters quantified here are only a start, and many other applications and target parameters are conceivable. The pattern-recognition software, GENIE, and the v9 nc algorithm thus provide self-learning tools to obtain high-throughput, automated quantification of user-defined, complex histological lesions from standard glass slides that are commonly used for descriptive histopathology. With this development, computerized image analysis on scanned whole-lung sections represents both technical and logistical advances in the precise acquisition of numerical data and statistical evaluation of morphological changes in mouse pneumonia. Previous studies on lung cancer and brain injury have shown that quantifications obtained by these technologies are more reliable, reproducible, and appropriate in time (6, 11, 21, 22) when compared with manual quantification of selected parameters. Thus, in addition to standard descriptive histopathology, semiquantitative scoring systems and stereology, DIA on whole organ sections is now available for lung research, and can be expected to widely broaden the readout options in mouse pneumonia studies, further strengthening the value of lung histology ( Figure 4 ).

Figure 4. Analytic options for whole-lung section histopathology. 2D = two-dimensional 3D = three-dimensional.

Here, we applied DIA to bacterial and viral lung infection models and compared the data to PBS-challenged controls. Separate and model-specific GENIE algorithms were generated for the objective determination and comparative quantification of affected lung areas, which, in the past, could only be roughly estimated on 2D sections (16, 17). Due to the extensive discrepancies between the histopathologies of different infection models and pathogens (3), however, specific GENIE algorithms need to be established for each model or pathogen to capture the unique model- or pathogen-specific features to be quantified. The model-specific algorithms generated here precisely identified the annotated tissue classes, which allowed us to distinguish between and quantify affected versus unaffected lung areas, background, and glass. Similarly, the number of nuclei or cells was quantified by v9 nc algorithms, enabling the precise quantification of both hemalaun-stained nuclei and immunohistochemical marker signals within nuclei or cells.

The high precision of both systems, GENIE and v9 nc algorithms, as assessed here via reproducibility of results over the entire procedures, including the scanning of slides and manual exclusions of adjacent adipose and lymphoid tissues indicate high data reliability in terms of reflection of the actual morphological information on the slides. Moreover, when we compared percentages of immune cell subtypes between DIA from 2D tissue sections of central lung planes and FC data obtained from whole-lung extracts, overall close similarities imply that relative data, such as cell ratios measured on representative whole-lung sections, may represent a useful and practical compromise when compared with more absolute numerical approaches, such as stereology or FC (12, 13, 23). Obvious advantages of DIA from 2D whole-lung sections include, once the required investments have been made, its relatively low time and cost expenses, high practicability for high-throughput analyses of large sample numbers, and its usefulness for retrospective studies with archival material that is usually nonrandomly embedded, as would be required for stereology (12). However, several critical technical prerequisites and limitations on data use have to be considered. First, very careful and highly standardized preparation of lungs appears critical to exclude artifacts due to compression atelectasis or uneven inflation between animals or experimental groups. To achieve this level of homogeneity, we here used the complete tracheal ligation method immediately after terminal exhalation to fully preserve the original lung volumes. Homogeneity of lung sizes was controlled by comparing the cross-sectional areas of embedded lungs, which failed to yield any evidence of variable degrees of deflation or atelectasis. An even more standardized approach would be the postmortal inflation of lungs to the same inflation pressure of 15–25 cm H2O as recommended by the American Thoracic Society (24). Second, the quality of the tissue sections to be scanned after manual cutting from paraffin blocks has a tremendous impact on the applicability of DIA. Tissue sections thicker than approx. 2 μm, sections with fissures or folds, compressed sections, or those that were overstretched on glass slides clearly yielded erroneous results in our hands. Ironically, manual skills become a critical factor for the success and trustworthiness of the data generated by the software. Third, intensity and homogeneity of tissue staining may also affect the data acquisition from the scanned slides, making highly standardized staining procedures imperative. Importantly, batch processing and analyses encompassing all experimental animals and groups in the same maximally standardized workflow are recommended. To this end, appropriate controls should be included for intraexperimental variations and differences in cutting or staining procedures. Furthermore, the resolution of approximately 0.25 μm/pixel resolution achieved by average histoslide scanners using a 40× objective is insufficient for a purely morphological differentiation of specific immune cell subtypes (e.g., neutrophils versus lymphocytes) without specific immunolabeling of cell markers. In that regard, the eye of the trained pathologist seems still superior in the recognition of complex optical patterns of small size, possibly also due to a z plane in real microscopy, which was not scanned here. In addition, the reliable identification and counting of individual bacterial structures is clearly beyond the resolution and optical sensitivity of the currently available technology, requiring approaches where pathogens are labeled immunohistochemically (16). Clearly, technical shortcomings, such as limited optical sensitivity and artificial effects due to manual variations in tissue processing, represent future challenges for further improvements of this methodology.

On the other hand, data obtained by DIA from 2D whole-lung sections are insufficient when absolute figures are required for the entire lungs. The value of a single plane clearly depends on its representativity for the entire organ, which is limited when lesions are inhomogeneously distributed, such as in pneumonia induced by most bacteria or viruses (16). We found a single central plane across the diameters of the main bronchi quite useful when all lungs to be compared were embedded in a highly standardized, nonrandom fashion. Obviously, the representativity for the entire lung and more information on the actual distribution of lesions can be improved by increasing the number of serial planes analyzed with predefined distances in microns or anatomical assignments. Still, even if several additional planes are analyzed per lung to increase precision, the data obtainable by DIA from whole-lung slides still underlie the general limitations that apply to 2D morphometry, including directional bias and sampling bias (5).

Although this technology is only just emerging, DIA of whole histopathology lung sections will substantially improve and broaden the readout options and scientific value of experimental pneumonia studies in mice and likely other species.

How can I download histological slide from Cancer Digital Slide Archive - Biology

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited.

Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review prior to publication.

The Feature Paper can be either an original research article, a substantial novel research study that often involves several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest progress in the field that systematically reviews the most exciting advances in scientific literature. This type of paper provides an outlook on future directions of research or possible applications.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to authors, or important in this field. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Accessing Digital Pathology Images for TCGA subjects in TCIA

Digitized TCGA pathology images can be found in the Genomic Data Commons — GDC Legacy Archive. GDC is the official source of TCGA genomic, clinical and molecular data.

Please note that where The Cancer Imaging Archive's collections reads:

Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes.

this means that patient identifier is matched across the GDC website (slides, genotype, clinical information) and TCIA (radiological imaging, clinical data) NOT case/control matching. Only patient data were included.

If you have downloaded radiology data of some TCGA subjects and want to access the pathology images, you'll need to follow the following steps

  1. Collect the Patient ID (e.g. TCGA-A6-2672) and download the radiology data from TCIA
  2. Go to the GDC Legacy Archive
  3. Confirm that you see something similar to the following screen:

    Data Type IS Tissue slide image or Diagnostic or both


Institutional review board approval was obtained for this study (University of Pittsburgh, Pennsylvania, STUDY18100084, PRO17120392 University of Witwatersrand, Johannesburg, South Africa clearance certificate M191003).

Tissue Floater Slide Creation

For the purposes of this experiment, 3 glass slides were prepared using freshly discarded adult human tumor tissue (Figure 1). Fabricated tissue floater slides were artificially created that contained (1) a large central portion of renal cell carcinoma obtained from a type 1 (basophilic) papillary renal cell carcinoma, and (2) placed toward the edge of the slide 2 separate additional smaller portions of tissue (“tissue floaters”) obtained from a moderately differentiated colon adenocarcinoma and a urinary bladder high-grade papillary urothelial carcinoma. To avoid having these slides stand out as extraordinary in the study the pathology cases selected were relatively easy to diagnose and typical of what would be encountered in routine practice. Also, the prepared slides were stained with hematoxylin and eosin (H&E) according to routine staining protocols. All 3 slides were then entirely digitized at ×40 magnification using an Aperio AT2 whole slide scanner (Leica Biosystems). The quality of these digital slides was checked to avoid inclusion of unique identifiers and/or artifacts.

Fabricated slide containing a section of renal cell carcinoma (A) and 2 adjacent separate colon cancer (B) and bladder cancer (C) tissue floaters (hematoxylin and eosin stain, insets shown at ×20 magnification).

Fabricated slide containing a section of renal cell carcinoma (A) and 2 adjacent separate colon cancer (B) and bladder cancer (C) tissue floaters (hematoxylin and eosin stain, insets shown at ×20 magnification).

Pathology Digital Slide Datasets

The aforementioned WSIs were embedded into 2 datasets of digital slides. The first dataset was established by randomly selecting 300 de-identified WSIs (.svs file format) of H&E-stained surgical pathology cases from the teaching files at the University of Pittsburgh Medical Center, Pittsburgh, PA (“UPMC dataset”). These archival slides were scanned at ×40 magnification using an Aperio ScanScope XT instrument (Leica Biosystems). These WSIs included cases from a wide variety of anatomic sites (eg, colon, brain, thyroid, prostate, breast, kidney, salivary gland, skin, soft tissue, etc) exhibiting varied diagnostic pathologic entities (ie, reactive, inflammatory, benign neoplasms, and malignancies). The other dataset employed in this study was obtained from the publicly available digital pathology slide archive offered by The Cancer Genome Atlas (TCGA) program ( A total of 2025 WSIs were randomly selected and downloaded from the TCGA (“TCGA dataset”). An average WSI was approximately 45 000 × 45 000 pixels. Digital slides of low quality (eg, very poor staining, low resolution, large regions out of focus) were eliminated magnifications less than ×20 and blurry patches understandably prevent the image search to perform well and poor staining negatively affects extraction of deep features. The TCGA dataset incorporated at least 33 different diagnostic entities from 25 anatomic locations. TCGA slides of frozen sections and those slides with manual annotations (pen markings) present were included. All digital slides were labeled with both the type of malignancy (primary diagnosis) and the affected organ (primary site). This label was assigned to the entire WSI and no individual region was delineated. Table 1 shows the top 20 primary sites with the highest number of WSIs in the combined dataset (ie, UPMC + TCGA datasets).

Top 20 Primary Sites With the Highest Number of Whole Slide Imaging (WSI) in the Dataset

BrdbvVh7dDXyun5J60iKvBkHdeCr1ukaubcBk4WI8BGoHZl6u4tyTHXNMnMr9UG5wDkKFVDcnfHCV-adcL2ebFBkcpuAvRe5UZcyQ8Gf9kFwUIITsPDZA7gmHx3ehEZZ1NAub-G2bVG7FNfGIlEriuQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA" />

Computing Platform

All experiments were performed on a Dell EdgeServer Ra with 2x Intel(R) Xeon(R) Gold 5118 (12 cores, 2.30 GHz), 4x Telsa V100 (v-RAM 32 GB each only 2 graphics processing units [GPUs] were used), and 394-GB random access memory. The code for indexing was written in C/C++. The user interface components were written in multiple languages, but mostly in Python and JavaScript. The high-end GPU power was necessary for indexing the large archives, which was a 1-time task for existing repositories. For daily usage of image search ordinary (low-cost) central processing unit/GPU power will suffice as the barcodes (see below) enable efficient search in large archives.

Image Search Tool

Through an ensemble approach (using a cohort of different algorithms) a reliable search engine prototype was developed that exploited the strengths of both supervised (trained deep networks) and unsupervised (clustering and search) computational methods for image processing. 23 This image search tool thus included segmentation and clustering algorithms, deep networks, and distance metrics for search and retrieval. Whereas deep networks are supervised methods and require extensive training with labeled data, the search itself is unsupervised with no prior training. Using a pretrained deep network without fine-tuning does not constitute direct supervision because it is used for feature extraction without any adjustment. This allows our approach to be independent of manually delineated WSIs. We used DenseNet-121, which is publicly available. Image segmentation was used to distinguish tissue from white background. WSIs were broken into patches or tiles at fixed sizes (eg, 500 × 500 μm 2 at ×20) with no overlap. The patches were grouped into categories via clustering methods (eg, k-means algorithm) and passed through pretrained artificial neural networks for feature mining. Each feature vector was converted into a linear barcode (Figure 2) and this “bunch of barcodes” indexing process was used to accelerate the search retrieval process. 24 The barcode generation only contained binarization of gradient change of deep features. Multiple similarity measures were examined to further increase the matching rate when comparing images. Retrieved image patches were ranked from most to least likelihood of being similar to the queried image (Figure 3) and these results were displayed in a gallery format for the end user to review and interpret. Rank (defined as the rating of a suitable match) was determined by the similarity of the suspected patch with all other patches in the archive, measured through distance calculation between their corresponding barcodes. The more similar (ie, best ranked result) the patch is the smaller is the difference between barcodes. All matched patches were sorted based on this difference (least different is ranked first and so on).

Schematic illustration of the general idea of using barcodes for image representation: whole slide image indexed by converting separate patches into barcodes.

Schematic illustration of the general idea of using barcodes for image representation: whole slide image indexed by converting separate patches into barcodes.

Schematic diagram showing how the origin of a suspected floater gets detected. The process starts with locating the suspicious tissue fragment. A selected patch from the fragment is then fed into a pretrained deep network to extract features. The search engine then receives a generated barcode to search within the “Yottixel Index” that contains barcodes of many patches of many whole slide images (WSIs). Finally, the origin of the floater is recognized by investigating the top ranked patches.

Schematic diagram showing how the origin of a suspected floater gets detected. The process starts with locating the suspicious tissue fragment. A selected patch from the fragment is then fed into a pretrained deep network to extract features. The search engine then receives a generated barcode to search within the “Yottixel Index” that contains barcodes of many patches of many whole slide images (WSIs). Finally, the origin of the floater is recognized by investigating the top ranked patches.

Any tissue fragment can potentially be selected by the end user (pathologist) for searching the archive. As such, a search is manually triggered by the pathologist. The smallest patch size that can be indexed and searched is 500 × 500 μm (∼1000 × 1000 pixels at ×20) any floater smaller than that may not be detectable.

Search Tool Evaluation

After the aforementioned 3 fabricated slides were scanned, indexed, and mixed among millions of image patches from the 2 datasets, the number of patches was reduced to approximately 16 000 patches through clustering (empirically set to 9 groups), whereas only 5% of each cluster was selected to represent a WSI. The search tool was then used to try and identify the matching slide that belonged to each tissue floater (Figure 4). The search was conducted using variable percentages of floater sampling (ie, 5%–100% of tissue floater region selected). The detection accuracy was measured by running each sample 100 times (manually and by automation) and calculating the median rank of a correct detection among search results, as well as the best and the worst rank of the detected floater among the search results using a 95% CI.

(A) Indexing of a sample whole slide imaging (scan with bladder tumor) (B) yielding 33 patches to build a mosaic. (C) Corresponding barcodes of the mosaic can be generated using a MinMax algorithm. The 3 barcodes that match the highlighted patches are shown.

(A) Indexing of a sample whole slide imaging (scan with bladder tumor) (B) yielding 33 patches to build a mosaic. (C) Corresponding barcodes of the mosaic can be generated using a MinMax algorithm. The 3 barcodes that match the highlighted patches are shown.

Prostate Fused-MRI-Pathology

Data collection and analysis was provided by Anant Madabhushi, PhD, Case Western Reserve University and Michael D. Feldman, MD, PhD, Hospital at the University of Pennsylvania. This work was supported by NIH R01CA136535.


  1. Singanamalli, A. , Rusu, M. , Sparks, R. E., Shih, N. N., Ziober, A. , Wang, L. , Tomaszewski, J. , Rosen, M. , Feldman, M. and Madabhushi, A. (2016), Identifying in vivo DCE MRI markers associated with microvessel architecture and gleason grades of prostate cancer. J. Magn. Reson. Imaging, 43: 149-158. doi: 10.1002/jmri.24975 (PMID:26110513).
  2. Toth, R, Feldman, M, Yu, D, Tomaszewski, J, Madabhushi, A. “Histostitcher™: An Informatics Software Platform for Reconstructing Whole-Mount Prostate Histology using the Extensible Imaging Platform (XIP™) Framework,” Journal of Pathology Informatics, vol. 5, pg. 8, 2014 (PMID: 24843820, PMCID: PMC4023035).
  3. Xiao, G, Bloch, N, Chappelow, J, Genega, E, Rofsky, N, Lenkinsky, R, Tomaszewski, J, Feldman, M, Rosen, M, Madabhushi, A. “Determining Histology-MRI Slice Correspondences for Defining MRI-based Disease Signatures of Prostate Cancer,” Special Issue of Computerized Medical Imaging and Graphics on Whole Slide Microscopic Image Processing, vol. 35[7-8], pp. 568-78, 2011 (PMID: 21255974).
  4. Chappelow, J, Bloch, N., Rofsky, N, Genega, E, Lenkinski, R, DeWolf, W, Madabhushi, A. “Elastic Registration of Multimodal Prostate MRI and Histology via Multi-Attribute Combined Mutual Information,” Medical Physics, vol. 38[4], pp. 2005-2018, 2011 (PMID: 21626933).

Data Access

Click the Download button to save a ".tcia" manifest file to your computer, which you must open with the NBIA Data Retriever . Click the Search button to open our Data Portal, where you can browse the data collection and/or download a subset of its contents.

Annotated Whole Slide Pathology Images & Annotations (Tiff, XML 76.8 GB)

Watch the video: Introduction: Neuroanatomy Video Lab - Brain Dissections (August 2022).