new

Get trending papers in your email inbox!

Subscribe

byAK and the research community

Jun 13

Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods (pi-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing CheXinstruct - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present CheXagent - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce CheXbench - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at https://stanford-aimi.github.io/chexagent.html.

An X-ray Significantly Variable, Luminous, Type 2 Quasar at z = 2.99 with a Massive Host Galaxy

We present a comprehensive X-ray analysis and spectral energy distribution (SED) fitting of WISEA J171419.96+602724.6, an extremely luminous type 2 quasar at z = 2.99. The source was suggested as a candidate Compton-thick (column density N_{rm H}>1.5 times 10^{24} cm^{-2}) quasar by a short XMM-Newton observation in 2011. We recently observed the source with deep NuSTAR and XMM-Newton exposures in 2021 and found that the source has a lower obscuration of N_{rm H}sim5 times 10^{22} cm^{-2} with an about four times lower flux. The two epochs of observations suggested that the source was significantly variable in X-ray obscuration, flux, and intrinsic luminosity at 2-3~sigma in less than 2.5 years (in the source rest frame). We performed SED fitting of this source using CIGALE thanks to its great availability of multiwavelength data (from hard X-rays to radio). The source is very luminous with a bolometric luminosity of L_{rm BOL}sim 2.5 times 10^{47} erg s^{-1}. Its host galaxy has a huge star formation rate (SFR) of sim1280 Solar mass yr^{-1} and a huge stellar mass of sim1.1 times 10^{12} Solar mass. The correlation between the SFR and stellar mass of this source is consistent with what was measured in the high-z quasars. It is also consistent with what was measured in the main-sequence star-forming galaxies, suggesting that the presence of the active nucleus in our target does not enhance or suppress the SFR of its host galaxy. The source is an Infrared hyper-luminous, obscured galaxy with significant amount of hot dust in its torus and shares many similar properties with hot, dust obscured galaxies.

The X-ray Integral Field Unit at the end of the Athena reformulation phase

The Athena mission entered a redefinition phase in July 2022, driven by the imperative to reduce the mission cost at completion for the European Space Agency below an acceptable target, while maintaining the flagship nature of its science return. This notably called for a complete redesign of the X-ray Integral Field Unit (X-IFU) cryogenic architecture towards a simpler active cooling chain. Passive cooling via successive radiative panels at spacecraft level is now used to provide a 50 K thermal environment to an X-IFU owned cryostat. 4.5 K cooling is achieved via a single remote active cryocooler unit, while a multi-stage Adiabatic Demagnetization Refrigerator ensures heat lift down to the 50 mK required by the detectors. Amidst these changes, the core concept of the readout chain remains robust, employing Transition Edge Sensor microcalorimeters and a SQUID-based Time-Division Multiplexing scheme. Noteworthy is the introduction of a slower pixel. This enables an increase in the multiplexing factor (from 34 to 48) without compromising the instrument energy resolution, hence keeping significant system margins to the new 4 eV resolution requirement. This allows reducing the number of channels by more than a factor two, and thus the resource demands on the system, while keeping a 4' field of view (compared to 5' before). In this article, we will give an overview of this new architecture, before detailing its anticipated performances. Finally, we will present the new X-IFU schedule, with its short term focus on demonstration activities towards a mission adoption in early 2027.

IXPE Observation of the Low-Synchrotron Peaked Blazar S4 0954+65 During An Optical-X-ray Flare

The X-ray polarization observations made possible with the Imaging X-ray Polarimetry Explorer (IXPE) offer new ways of probing high-energy emission processes in astrophysical jets from blazars. Here we report on the first X-ray polarization observation of the blazar S4 0954+65 in a high optical and X-ray state. During our multi-wavelength campaign on the source, we detected an optical flare whose peak coincided with the peak of an X-ray flare. This optical-X-ray flare most likely took place in a feature moving along the parsec-scale jet, imaged at 43 GHz by the Very Long Baseline Array. The 43 GHz polarization angle of the moving component underwent a rotation near the time of the flare. In the optical band, prior to the IXPE observation, we measured the polarization angle to be aligned with the jet axis. In contrast, during the optical flare the optical polarization angle was perpendicular to the jet axis; after the flare, it reverted to being parallel to the jet axis. Due to the smooth behavior of the optical polarization angle during the flare, we favor shocks as the main acceleration mechanism. We also infer that the ambient magnetic field lines in the jet were parallel to the jet position angle. The average degree of optical polarization during the IXPE observation was (14.3pm4.1)%. Despite the flare, we only detected an upper limit of 14% (at 3sigma level) on the X-ray polarization degree; although a reasonable assumption on the X-ray polarization angle results in an upper limit of 8.8% (3sigma). We model the spectral energy distribution (SED) and spectral polarization distribution (SPD) of S4 0954+65 with leptonic (synchrotron self-Compton) and hadronic (proton and pair synchrotron) models. The constraints we obtain with our combined multi-wavelength polarization observations and SED modeling tentatively disfavor hadronic models for the X-ray emission in S4 0954+65.

Probing X-ray Timing and Spectral Variability in the Blazar PKS 2155-304 Over a Decade of XMM-Newton Observations

Blazars, a class of active galactic nuclei (AGN) powered by supermassive black holes, are known for their remarkable variability across multiple timescales and wavelengths. With advancements in both ground- and space-based telescopes, our understanding of AGN central engines has significantly improved. However, the mechanisms driving this variability remain elusive, and continue to fascinate both theorists and observers alike. The primary objective of this study is to constrain the X-ray variability properties of the TeV blazar PKS 2155-304. We conduct a comprehensive X-ray spectral and timing analysis, focusing on both long-term and intra-day variability. This analysis uses data from 22 epochs of XMM-Newton EPIC-pn observations, collected over 15 years (2000-2014). To investigate the variability of the source, we applied both timing and spectral analyses. For the timing analysis, we estimated fractional variability, variability amplitude, minimum variability timescales, flux distribution, and power spectral density (PSD). In the spectral analysis, we fitted the X-ray spectra using power-law, log-parabola, and broken power-law (BPL) models to determine the best-fitting parameters. Additionally, we studied the hardness ratio (HR). We observed moderate intra-day variability in most of the light curves. Seven out of the twenty-two observations showed a clear bimodal flux distribution, indicating the presence of two distinct flux states. Our analysis revealed a variable power-law PSD slope. Most HR plots did not show significant variation with flux, except for one observation (OBSID 0124930501), where HR increased with flux (Count/s). The fitted X-ray spectra favored the BPL model for the majority of observations. The findings of this work shed light on the intraday variability of blazars, providing insights into the non-thermal jet processes that drive the observed flux variations.

BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images

Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in the clinic requires costly equipment and subjects being exposed to high radiation. To circumvent these issues, deep learning-based image generation algorithms have been proposed. However, existing methods fall short in terms of producing high-quality images and capturing texture details, particularly with pulmonary vessels. To address these issues, this paper proposes a new bone suppression framework, termed BS-Diff, that comprises a conditional diffusion model equipped with a U-Net architecture and a simple enhancement module to incorporate an autoencoder. Our proposed network cannot only generate soft tissue images with a high bone suppression rate but also possesses the capability to capture fine image details. Additionally, we compiled the largest dataset since 2010, including data from 120 patients with high-definition, high-resolution paired CXRs and soft tissue images collected by our affiliated hospital. Extensive experiments, comparative analyses, ablation studies, and clinical evaluations indicate that the proposed BS-Diff outperforms several bone-suppression models across multiple metrics. Our code can be accessed at https://github.com/Benny0323/BS-Diff.

Deep reproductive feature generation framework for the diagnosis of COVID-19 and viral pneumonia using chest X-ray images

The rapid and accurate detection of COVID-19 cases is critical for timely treatment and preventing the spread of the disease. In this study, a two-stage feature extraction framework using eight state-of-the-art pre-trained deep Convolutional Neural Networks (CNNs) and an autoencoder is proposed to determine the health conditions of patients (COVID-19, Normal, Viral Pneumonia) based on chest X-rays. The X-ray scans are divided into four equally sized sections and analyzed by deep pre-trained CNNs. Subsequently, an autoencoder with three hidden layers is trained to extract reproductive features from the concatenated ouput of CNNs. To evaluate the performance of the proposed framework, three different classifiers, which are single-layer perceptron (SLP), multi-layer perceptron (MLP), and support vector machine (SVM) are used. Furthermore, the deep CNN architectures are used to create benchmark models and trained on the same dataset for comparision. The proposed framework outperforms other frameworks wih pre-trained feature extractors in binary classification and shows competitive results in three-class classification. The proposed methodology is task-independent and suitable for addressing various problems. The results show that the discriminative features are a subset of the reproductive features, suggesting that extracting task-independent features is superior to the extraction only task-based features. The flexibility and task-independence of the reproductive features make the conceptive information approach more favorable. The proposed methodology is novel and shows promising results for analyzing medical image data.

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

The chest X-ray is one of the most commonly accessible radiological examinations for screening and diagnosis of many lung diseases. A tremendous number of X-ray imaging studies accompanied by radiological reports are accumulated and stored in many modern hospitals' Picture Archiving and Communication Systems (PACS). On the other side, it is still an open question how this type of hospital-size knowledge database containing invaluable imaging informatics (i.e., loosely labeled) can be used to facilitate the data-hungry deep learning paradigms in building truly large-scale high precision computer-aided diagnosis (CAD) systems. In this paper, we present a new chest X-ray database, namely "ChestX-ray8", which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels (where each image can have multi-labels), from the associated radiological reports using natural language processing. Importantly, we demonstrate that these commonly occurring thoracic diseases can be detected and even spatially-located via a unified weakly-supervised multi-label image classification and disease localization framework, which is validated using our proposed dataset. Although the initial quantitative results are promising as reported, deep convolutional neural network based "reading chest X-rays" (i.e., recognizing and locating the common disease patterns trained with only image-level labels) remains a strenuous task for fully-automated high precision CAD systems. Data download link: https://nihcc.app.box.com/v/ChestXray-NIHCC

DENTEX: An Abnormal Tooth Detection with Dental Enumeration and Diagnosis Benchmark for Panoramic X-rays

Panoramic X-rays are frequently used in dentistry for treatment planning, but their interpretation can be both time-consuming and prone to error. Artificial intelligence (AI) has the potential to aid in the analysis of these X-rays, thereby improving the accuracy of dental diagnoses and treatment plans. Nevertheless, designing automated algorithms for this purpose poses significant challenges, mainly due to the scarcity of annotated data and variations in anatomical structure. To address these issues, the Dental Enumeration and Diagnosis on Panoramic X-rays Challenge (DENTEX) has been organized in association with the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023. This challenge aims to promote the development of algorithms for multi-label detection of abnormal teeth, using three types of hierarchically annotated data: partially annotated quadrant data, partially annotated quadrant-enumeration data, and fully annotated quadrant-enumeration-diagnosis data, inclusive of four different diagnoses. In this paper, we present the results of evaluating participant algorithms on the fully annotated data, additionally investigating performance variation for quadrant, enumeration, and diagnosis labels in the detection of abnormal teeth. The provision of this annotated dataset, alongside the results of this challenge, may lay the groundwork for the creation of AI-powered tools that can offer more precise and efficient diagnosis and treatment planning in the field of dentistry. The evaluation code and datasets can be accessed at https://github.com/ibrahimethemhamamci/DENTEX

Deep Learning Based Defect Detection for Solder Joints on Industrial X-Ray Circuit Board Images

Quality control is of vital importance during electronics production. As the methods of producing electronic circuits improve, there is an increasing chance of solder defects during assembling the printed circuit board (PCB). Many technologies have been incorporated for inspecting failed soldering, such as X-ray imaging, optical imaging, and thermal imaging. With some advanced algorithms, the new technologies are expected to control the production quality based on the digital images. However, current algorithms sometimes are not accurate enough to meet the quality control. Specialists are needed to do a follow-up checking. For automated X-ray inspection, joint of interest on the X-ray image is located by region of interest (ROI) and inspected by some algorithms. Some incorrect ROIs deteriorate the inspection algorithm. The high dimension of X-ray images and the varying sizes of image dimensions also challenge the inspection algorithms. On the other hand, recent advances on deep learning shed light on image-based tasks and are competitive to human levels. In this paper, deep learning is incorporated in X-ray imaging based quality control during PCB quality inspection. Two artificial intelligence (AI) based models are proposed and compared for joint defect detection. The noised ROI problem and the varying sizes of imaging dimension problem are addressed. The efficacy of the proposed methods are verified through experimenting on a real-world 3D X-ray dataset. By incorporating the proposed methods, specialist inspection workload is largely saved.

The GRACE project: Hard X-ray giant radio galaxies and their duty cycle

The advent of new generation radio telescopes is opening new possibilities on the classification and study of extragalactic high-energy sources, specially the underrepresented ones like radio galaxies. Among these, Giant Radio Galaxies (GRG, larger than 0.7 Mpc) are among the most extreme manifestations of the accretion/ejection processes on supermassive black holes. Our recent studies have shown that GRG can be up to four times more abundant in hard X-ray selected (i.e. from INTEGRAL/IBIS and Swift/BAT at >20 keV) samples and, most interestingly, the majority of them present signs of restarted radio activity. This makes them the ideal test-bed to study the so far unknown duty cycle of jets in active galactic nuclei. Open questions in the field include: How and when jets are restarted? How jets evolve and what's their dynamic? What is the jet's duty cycle and what triggers them? Our group has recently collected a wealth of radio data on these high-energy selected GRGs, allowing us to study their jet formation and evolution from the pc to kpc scales, across different activity epochs. In particular, thanks to our EVN large programme, we were able to probe the new radio phase in the core of these giants. Furthermore, we are devoting an effort to the exploitation of new radio surveys data for the discovery of new classes of counterparts of Fermi/LAT catalogues. In particular, we are unveiling the hidden population of radio galaxies associated with gamma-ray sources.

Is planetary inward migration responsible for GJ 504's fast rotation and bright X-ray luminosity? New constraints from eROSITA

The discovery of an increasing variety of exoplanets in very close orbits around their host stars raised many questions about how stars and planets interact, and to which extent host stars' properties may be influenced by the presence of close-by companions. Understanding how the evolution of stars is impacted by the interactions with their planets is fundamental to disentangle their intrinsic evolution from Star-Planet Interactions (SPI)-induced phenomena. GJ 504 is a promising candidate for a star that underwent strong SPI. Its unusually short rotational period (3.4 days), while being in contrast with what is expected by single-star models, could result from the inward migration of a close-by, massive companion, pushed starward by tides. Moreover, its brighter X-ray luminosity may hint at a rejuvenation of the dynamo process sustaining the stellar magnetic field, consequent to the SPI-induced spin-up. We aim to study the evolution of GJ 504 and establish whether by invoking the engulfment of a planetary companion we can better reproduce its rotational period and X-ray luminosity. We simulate the past evolution assuming two different scenarios: 'Star without close-by planet', 'Star with close-by planet'. In the second scenario, we investigate how inward migration and planetary engulfment driven by tides spin up the stellar surface and rejuvenate its dynamo. We compare our tracks with rotational period and X-ray data collected from the all-sky surveys of the ROentgen Survey with an Imaging Telescope Array (eROSITA) on board the Russian Spektrum-Roentgen-Gamma mission (SRG). Despite the very uncertain stellar age, we found that the second evolutionary scenario is in better agreement with the short rotational period and the bright X-ray luminosity of GJ 504, thus strongly favouring the inward migration scenario over the one in which close-by planets have no tidal impact on the star.

Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin Transformers

The conversion from 2D X-ray to 3D shape holds significant potential for improving diagnostic efficiency and safety. However, existing reconstruction methods often rely on hand-crafted features, manual intervention, and prior knowledge, resulting in unstable shape errors and additional processing costs. In this paper, we introduce Swin-X2S, an end-to-end deep learning method for directly reconstructing 3D segmentation and labeling from 2D biplanar orthogonal X-ray images. Swin-X2S employs an encoder-decoder architecture: the encoder leverages 2D Swin Transformer for X-ray information extraction, while the decoder employs 3D convolution with cross-attention to integrate structural features from orthogonal views. A dimension-expanding module is introduced to bridge the encoder and decoder, ensuring a smooth conversion from 2D pixels to 3D voxels. We evaluate proposed method through extensive qualitative and quantitative experiments across nine publicly available datasets covering four anatomies (femur, hip, spine, and rib), with a total of 54 categories. Significant improvements over previous methods have been observed not only in the segmentation and labeling metrics but also in the clinically relevant parameters that are of primary concern in practical applications, which demonstrates the promise of Swin-X2S to provide an effective option for anatomical shape reconstruction in clinical scenarios. Code implementation is available at: https://github.com/liukuan5625/Swin-X2S.

A UV to X-ray view of soft excess in type 1 AGNs: I. sample selection and spectral profile

A core sample of 59 unobscured type 1 AGNs with simultaneous XMM-Newton X-ray and UV observations is compiled from archive to probe the nature of soft X-ray excess (SE). In the first paper of this series, our focus centers on scrutinizing the spectral profile of the soft excess. Of the sources, approx 71% (42/59) exhibit powerlaw-like (po-like) soft excess, while approx 29% (17/59) exhibit blackbody-like (bb-like) soft excess. We show a cut-off powerlaw could uniformly characterize both types of soft excesses, with median Ecut of 1.40 keV for po-like and 0.14 keV for bb-like. For the first time, we report a robust and quantitative correlation between the SE profile and SE strength (the ratio of SE luminosity to that of the primary powerlaw continuum in 0.5 - 2.0 keV), indicating that stronger soft excess is more likely to be po-like, or effectively has a higher Ecut. This correlation cannot be explained by ionized disk reflection alone, which produces mostly bb-like soft excess (Ecut sim 0.1 keV) as revealed by relxilllp simulation. Remarkably, we show with simulations that a toy hybrid scenario, where both ionized disk reflection (relxilllp, with all reflection parameters fixed at default values except for ionization of the disk) and warm corona (compTT, with temperature fixed at 1 keV) contribute to the observed soft excess, can successfully reproduce the observed correlation. This highlights the ubiquitous hybrid nature of the soft X-ray excess in AGNs, and underscores the importance of considering both components while fitting the spectra of soft excess.

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, Structural Entities extraction and patient indications Incorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.

Automated Chest X-Ray Report Generator Using Multi-Model Deep Learning Approach

Reading and interpreting chest X-ray images is one of the most radiologist's routines. However, it still can be challenging, even for the most experienced ones. Therefore, we proposed a multi-model deep learning-based automated chest X-ray report generator system designed to assist radiologists in their work. The basic idea of the proposed system is by utilizing multi binary-classification models for detecting multi abnormalities, with each model responsible for detecting one abnormality, in a single image. In this study, we limited the radiology abnormalities detection to only cardiomegaly, lung effusion, and consolidation. The system generates a radiology report by performing the following three steps: image pre-processing, utilizing deep learning models to detect abnormalities, and producing a report. The aim of the image pre-processing step is to standardize the input by scaling it to 128x128 pixels and slicing it into three segments, which covers the upper, lower, and middle parts of the lung. After pre-processing, each corresponding model classifies the image, resulting in a 0 (zero) for no abnormality detected and a 1 (one) for the presence of an abnormality. The prediction outputs of each model are then concatenated to form a 'result code'. The 'result code' is used to construct a report by selecting the appropriate pre-determined sentence for each detected abnormality in the report generation step. The proposed system is expected to reduce the workload of radiologists and increase the accuracy of chest X-ray diagnosis.

Xplainer: From X-Ray Observations to Explainable Zero-Shot Diagnosis

Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.

Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding

In clinical radiology reports, doctors capture important information about the patient's health status. They convey their observations from raw medical imaging data about the inner structures of a patient. As such, formulating reports requires medical experts to possess wide-ranging knowledge about anatomical regions with their normal, healthy appearance as well as the ability to recognize abnormalities. This explicit grasp on both the patient's anatomy and their appearance is missing in current medical image-processing systems as annotations are especially difficult to gather. This renders the models to be narrow experts e.g. for identifying specific diseases. In this work, we recover this missing link by adding human anatomy into the mix and enable the association of content in medical reports to their occurrence in associated imagery (medical phrase grounding). To exploit anatomical structures in this scenario, we present a sophisticated automatic pipeline to gather and integrate human bodily structures from computed tomography datasets, which we incorporate in our PAXRay: A Projected dataset for the segmentation of Anatomical structures in X-Ray data. Our evaluation shows that methods that take advantage of anatomical information benefit heavily in visually grounding radiologists' findings, as our anatomical segmentations allow for up to absolute 50% better grounding results on the OpenI dataset as compared to commonly used region proposals. The PAXRay dataset is available at https://constantinseibold.github.io/paxray/.

Jet-ISM Interaction in the Radio Galaxy 3C293: Jet-driven Shocks Heat ISM to Power X-ray and Molecular H2 emission

We present a 70ks Chandra observation of the radio galaxy 3C293. This galaxy belongs to the class of molecular hydrogen emission galaxies (MOHEGs) that have very luminous emission from warm molecular hydrogen. In radio galaxies, the molecular gas appears to be heated by jet-driven shocks, but exactly how this mechanism works is still poorly understood. With Chandra, we observe X-ray emission from the jets within the host galaxy and along the 100 kpc radio jets. We model the X-ray spectra of the nucleus, the inner jets, and the X-ray features along the extended radio jets. Both the nucleus and the inner jets show evidence of 10^7 K shock-heated gas. The kinetic power of the jets is more than sufficient to heat the X-ray emitting gas within the host galaxy. The thermal X-ray and warm H2 luminosities of 3C293 are similar, indicating similar masses of X-ray hot gas and warm molecular gas. This is consistent with a picture where both derive from a multiphase, shocked interstellar medium (ISM). We find that radio-loud MOHEGs that are not brightest cluster galaxies (BCGs), like 3C293, typically have LH2/LX~1 and MH2/MX~1, whereas MOHEGs that are BCGs have LH2/LX~0.01 and MH2/MX~0.01. The more massive, virialized, hot atmosphere in BCGs overwhelms any direct X-ray emission from current jet-ISM interaction. On the other hand, LH2/LX~1 in the Spiderweb BCG at z=2, which resides in an unvirialized protocluster and hosts a powerful radio source. Over time, jet-ISM interaction may contribute to the establishment of a hot atmosphere in BCGs and other massive elliptical galaxies.

Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation

Radiology reports convey detailed clinical observations and capture diagnostic reasoning that evolves over time. However, existing evaluation methods are limited to single-report settings and rely on coarse metrics that fail to capture fine-grained clinical semantics and temporal dependencies. We introduce LUNGUAGE,a benchmark dataset for structured radiology report generation that supports both single-report evaluation and longitudinal patient-level assessment across multiple studies. It contains 1,473 annotated chest X-ray reports, each reviewed by experts, and 80 of them contain longitudinal annotations to capture disease progression and inter-study intervals, also reviewed by experts. Using this benchmark, we develop a two-stage framework that transforms generated reports into fine-grained, schema-aligned structured representations, enabling longitudinal interpretation. We also propose LUNGUAGESCORE, an interpretable metric that compares structured outputs at the entity, relation, and attribute level while modeling temporal consistency across patient timelines. These contributions establish the first benchmark dataset, structuring framework, and evaluation metric for sequential radiology reporting, with empirical results demonstrating that LUNGUAGESCORE effectively supports structured report evaluation. The code is available at: https://github.com/SuperSupermoon/Lunguage

RoentGen: Vision-Language Foundation Model for Chest X-ray Generation

Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in generating high-quality images. Medical imaging data is fundamentally different to natural images, and the language used to succinctly capture relevant details in medical data uses a different, narrow but semantically rich, domain-specific vocabulary. Not surprisingly, multi-modal models trained on natural image-text pairs do not tend to generalize well to the medical domain. Developing generative imaging models faithfully representing medical concepts while providing compositional diversity could mitigate the existing paucity of high-quality, annotated medical imaging datasets. In this work, we develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays (CXR) and their corresponding radiology (text) reports. We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We assess the model outputs quantitatively using image quality metrics, and evaluate image quality and text-image alignment by human domain experts. We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images, and that the output can be controlled to a new extent by using free-form text prompts including radiology-specific language. Fine-tuning this model on a fixed training set and using it as a data augmentation method, we measure a 5% improvement of a classifier trained jointly on synthetic and real images, and a 3% improvement when trained on a larger but purely synthetic training set. Finally, we observe that this fine-tuning distills in-domain knowledge in the text-encoder and can improve its representation capabilities of certain diseases like pneumothorax by 25%.

CAvity DEtection Tool (CADET): Pipeline for automatic detection of X-ray cavities in hot galactic and cluster atmospheres

The study of jet-inflated X-ray cavities provides a powerful insight into the energetics of hot galactic atmospheres and radio-mechanical AGN feedback. By estimating the volumes of X-ray cavities, the total energy and thus also the corresponding mechanical jet power required for their inflation can be derived. Properly estimating their total extent is, however, non-trivial, prone to biases, nearly impossible for poor-quality data, and so far has been done manually by scientists. We present a novel and automated machine-learning pipeline called Cavity Detection Tool (CADET), developed to detect and estimate the sizes of X-ray cavities from raw Chandra images. The pipeline consists of a convolutional neural network trained for producing pixel-wise cavity predictions and a DBSCAN clustering algorithm, which decomposes the predictions into individual cavities. The convolutional network was trained using mock observations of early-type galaxies simulated to resemble real noisy Chandra-like images. The network's performance has been tested on simulated data obtaining an average cavity volume error of 14 % at an 89 % true-positive rate. For simulated images without any X-ray cavities inserted, we obtain a 5 % false-positive rate. When applied to real Chandra images, the pipeline recovered 91 out of 100 previously known X-ray cavities in nearby early-type galaxies and all 14 cavities in chosen galaxy clusters. Besides that, the CADET pipeline discovered 8 new cavity pairs in atmospheres of early-type galaxies and galaxy clusters (IC4765, NGC533, NGC2300, NGC3091, NGC4073, NGC4125, NGC4472, NGC5129) and a number of potential cavity candidates.

Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization

Tuberculosis (TB) is a chronic lung disease that occurs due to bacterial infection and is one of the top 10 leading causes of death. Accurate and early detection of TB is very important, otherwise, it could be life-threatening. In this work, we have detected TB reliably from the chest X-ray images using image pre-processing, data augmentation, image segmentation, and deep-learning classification techniques. Several public databases were used to create a database of 700 TB infected and 3500 normal chest X-ray images for this study. Nine different deep CNNs (ResNet18, ResNet50, ResNet101, ChexNet, InceptionV3, Vgg19, DenseNet201, SqueezeNet, and MobileNet), which were used for transfer learning from their pre-trained initial weights and trained, validated and tested for classifying TB and non-TB normal cases. Three different experiments were carried out in this work: segmentation of X-ray images using two different U-net models, classification using X-ray images, and segmented lung images. The accuracy, precision, sensitivity, F1-score, specificity in the detection of tuberculosis using X-ray images were 97.07 %, 97.34 %, 97.07 %, 97.14 % and 97.36 % respectively. However, segmented lungs for the classification outperformed than whole X-ray image-based classification and accuracy, precision, sensitivity, F1-score, specificity were 99.9 %, 99.91 %, 99.9 %, 99.9 %, and 99.52 % respectively. The paper also used a visualization technique to confirm that CNN learns dominantly from the segmented lung regions results in higher detection accuracy. The proposed method with state-of-the-art performance can be useful in the computer-aided faster diagnosis of tuberculosis.

Rapid patient-specific neural networks for intraoperative X-ray to volume registration

The integration of artificial intelligence in image-guided interventions holds transformative potential, promising to extract 3D geometric and quantitative information from conventional 2D imaging modalities during complex procedures. Achieving this requires the rapid and precise alignment of 2D intraoperative images (e.g., X-ray) with 3D preoperative volumes (e.g., CT, MRI). However, current 2D/3D registration methods fail across the broad spectrum of procedures dependent on X-ray guidance: traditional optimization techniques require custom parameter tuning for each subject, whereas neural networks trained on small datasets do not generalize to new patients or require labor-intensive manual annotations, increasing clinical burden and precluding application to new anatomical targets. To address these challenges, we present xvr, a fully automated framework for training patient-specific neural networks for 2D/3D registration. xvr uses physics-based simulation to generate abundant high-quality training data from a patient's own preoperative volumetric imaging, thereby overcoming the inherently limited ability of supervised models to generalize to new patients and procedures. Furthermore, xvr requires only 5 minutes of training per patient, making it suitable for emergency interventions as well as planned procedures. We perform the largest evaluation of a 2D/3D registration algorithm on real X-ray data to date and find that xvr robustly generalizes across a diverse dataset comprising multiple anatomical structures, imaging modalities, and hospitals. Across surgical tasks, xvr achieves submillimeter-accurate registration at intraoperative speeds, improving upon existing methods by an order of magnitude. xvr is released as open-source software freely available at https://github.com/eigenvivek/xvr.

MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression

Large vision-language models (LVLMs) have shown great promise in medical applications, particularly in visual question answering (MedVQA) and diagnosis from medical images. However, existing datasets and models often fail to consider critical aspects of medical diagnostics, such as the integration of historical records and the analysis of disease progression over time. In this paper, we introduce MMXU (Multimodal and MultiX-ray Understanding), a novel dataset for MedVQA that focuses on identifying changes in specific regions between two patient visits. Unlike previous datasets that primarily address single-image questions, MMXU enables multi-image questions, incorporating both current and historical patient data. We demonstrate the limitations of current LVLMs in identifying disease progression on MMXU-test, even those that perform well on traditional benchmarks. To address this, we propose a MedRecord-Augmented Generation (MAG) approach, incorporating both global and regional historical records. Our experiments show that integrating historical records significantly enhances diagnostic accuracy by at least 20\%, bridging the gap between current LVLMs and human expert performance. Additionally, we fine-tune models with MAG on MMXU-dev, which demonstrates notable improvements. We hope this work could illuminate the avenue of advancing the use of LVLMs in medical diagnostics by emphasizing the importance of historical context in interpreting medical images. Our dataset is released at https://github.com/linjiemu/MMXU{https://github.com/linjiemu/MMXU}.

Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback

Radiologists play a crucial role by translating medical images into medical reports. However, the field faces staffing shortages and increasing workloads. While automated approaches using vision-language models (VLMs) show promise as assistants, they require exceptionally high accuracy. Most current VLMs in radiology rely solely on supervised fine-tuning (SFT). Meanwhile, in the general domain, additional preference fine-tuning has become standard practice. The challenge in radiology lies in the prohibitive cost of obtaining radiologist feedback. We propose a scalable automated preference alignment technique for VLMs in radiology, focusing on chest X-ray (CXR) report generation. Our method leverages publicly available datasets with an LLM-as-a-Judge mechanism, eliminating the need for additional expert radiologist feedback. We evaluate and benchmark five direct alignment algorithms (DAAs). Our results show up to a 57.4% improvement in average GREEN scores, a LLM-based metric for evaluating CXR reports, and a 9.2% increase in an average across six metrics (domain specific and general), compared to the SFT baseline. We study reward overoptimization via length exploitation, with reports lengthening by up to 3.2x. To assess a potential alignment tax, we benchmark on six additional diverse tasks, finding no significant degradations. A reader study involving four board-certified radiologists indicates win rates of up to 0.62 over the SFT baseline, while significantly penalizing verbosity. Our analysis provides actionable insights for the development of VLMs in high-stakes fields like radiology.

CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats

Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus

CXR-LLaVA: Multimodal Large Language Model for Interpreting Chest X-ray Images

Purpose: Recent advancements in large language models (LLMs) have expanded their capabilities in a multimodal fashion, potentially replicating the image interpretation of human radiologists. This study aimed to develop open-source multimodal large language model for interpreting chest X-ray images (CXR-LLaVA). We also examined the effect of prompt engineering and model parameters such as temperature and nucleus sampling. Materials and Methods: For training, we collected 659,287 publicly available CXRs: 417,336 CXRs had labels for certain radiographic abnormalities (dataset 1); 241,951 CXRs provided free-text radiology reports (dataset 2). After pre-training the Resnet50 as an image encoder, the contrastive language-image pre-training was used to align CXRs and corresponding radiographic abnormalities. Then, the Large Language Model Meta AI-2 was fine-tuned using dataset 2, which were refined using GPT-4, with generating various question answering scenarios. The code can be found at https://github.com/ECOFRI/CXR_LLaVA. Results: In the test set, we observed that the model's performance fluctuated based on its parameters. On average, it achieved F1 score of 0.34 for five pathologic findings (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), which was improved to 0.46 through prompt engineering. In the independent set, the model achieved an average F1 score of 0.30 for the same pathologic findings. Notably, for the pediatric chest radiograph dataset, which was unseen during training, the model differentiated abnormal radiographs with an F1 score ranging from 0.84 to 0.85. Conclusion: CXR-LLaVA demonstrates promising potential in CXR interpretation. Both prompt engineering and model parameter adjustments can play pivotal roles in interpreting CXRs.

A systematic analysis of the radio properties of 22 X-ray selected tidal disruption event candidates with the Australia Telescope Compact Array

We present a systematic analysis of the radio properties of an X-ray selected sample of tidal disruption event (TDE) candidates discovered by the eROSITA telescope. We find radio sources coincident with half of the transient events (11 TDEs), with 8 radio sources showing statistically significant variability over a 6-month period. We model the radio spectra of 6 sources with sufficiently bright radio emission and find the sources show radio spectra consistent with optically thin synchrotron emission and radio outflow minimum radii of 10^{16}--10^{17} cm, velocities 0.01--0.05 c, and energies 10^{48}--10^{51} erg. On comparison with the radio properties of an optically-selected TDE sample at similar late times, we find no significant difference in the radio luminosity range or radio detection rate. We find a tentative positive trend with peak radio and X-ray luminosity, but require further observations to determine if this is real or due to observational bias due to the large range in distances of the events. Interestingly, none of the X-ray selected events show late rising radio emission, compared to 45% of radio-detected sources of an optically-selected sample that showed late rising radio emission. We propose that this may indicate that many TDEs launch radio outflows at or near peak X-ray luminosity, which can be significantly delayed from peak optical luminosity. This study presents the first systematic analysis of the radio properties of an X-ray selected sample of TDEs, and gives insight into the possible link between the physical processes that power X-ray and radio emission in TDEs.

Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

Automated radiology report generation offers an effective solution to alleviate radiologists' workload. However, most existing methods focus primarily on single or fixed-view images to model current disease conditions, which limits diagnostic accuracy and overlooks disease progression. Although some approaches utilize longitudinal data to track disease progression, they still rely on single images to analyze current visits. To address these issues, we propose enhanced contrastive learning with Multi-view Longitudinal data to facilitate chest X-ray Report Generation, named MLRG. Specifically, we introduce a multi-view longitudinal contrastive learning method that integrates spatial information from current multi-view images and temporal information from longitudinal data. This method also utilizes the inherent spatiotemporal information of radiology reports to supervise the pre-training of visual and textual representations. Subsequently, we present a tokenized absence encoding technique to flexibly handle missing patient-specific prior knowledge, allowing the model to produce more accurate radiology reports based on available prior knowledge. Extensive experiments on MIMIC-CXR, MIMIC-ABN, and Two-view CXR datasets demonstrate that our MLRG outperforms recent state-of-the-art methods, achieving a 2.3% BLEU-4 improvement on MIMIC-CXR, a 5.5% F1 score improvement on MIMIC-ABN, and a 2.7% F1 RadGraph improvement on Two-view CXR.

BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models

Lung diseases represent a significant global health challenge, with Chest X-Ray (CXR) being a key diagnostic tool due to their accessibility and affordability. Nonetheless, the detection of pulmonary lesions is often hindered by overlapping bone structures in CXR images, leading to potential misdiagnoses. To address this issue, we developed an end-to-end framework called BS-LDM, designed to effectively suppress bone in high-resolution CXR images. This framework is based on conditional latent diffusion models and incorporates a multi-level hybrid loss-constrained vector-quantized generative adversarial network which is crafted for perceptual compression, ensuring the preservation of details. To further enhance the framework's performance, we introduce offset noise and a temporal adaptive thresholding strategy. These additions help minimize discrepancies in generating low-frequency information, thereby improving the clarity of the generated soft tissue images. Additionally, we have compiled a high-quality bone suppression dataset named SZCH-X-Rays. This dataset includes 818 pairs of high-resolution CXR and dual-energy subtraction soft tissue images collected from a partner hospital. Moreover, we processed 241 data pairs from the JSRT dataset into negative images, which are more commonly used in clinical practice. Our comprehensive experimental and clinical evaluations reveal that BS-LDM excels in bone suppression, underscoring its significant clinical value.

MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report

In this paper, we introduce a novel Multi-Modal Contrastive Pre-training Framework that synergistically combines X-rays, electrocardiograms (ECGs), and radiology/cardiology reports. Our approach leverages transformers to encode these diverse modalities into a unified representation space, aiming to enhance diagnostic accuracy and facilitate comprehensive patient assessments. We utilize LoRA-Peft to significantly reduce trainable parameters in the LLM and incorporate recent linear attention dropping strategy in the Vision Transformer(ViT) for smoother attention. Furthermore, we provide novel multimodal attention explanations and retrieval for our model. To the best of our knowledge, we are the first to propose an integrated model that combines X-ray, ECG, and Radiology/Cardiology Report with this approach. By utilizing contrastive loss, MoRE effectively aligns modality-specific features into a coherent embedding, which supports various downstream tasks such as zero-shot classification and multimodal retrieval. Employing our proposed methodology, we achieve state-of-the-art (SOTA) on the Mimic-IV, CheXpert, Edema Severity, and PtbXl downstream datasets, surpassing existing multimodal approaches. Our proposed framework shows significant improvements in capturing intricate inter-modal relationships and its robustness in medical diagnosis that establishes a framework for future research in multimodal learning in the healthcare sector.

Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation

This paper investigates the use of deep learning approaches to estimate the femur caput-collum-diaphyseal (CCD) angle from X-ray images. The CCD angle is an important measurement in the diagnosis of hip problems, and correct prediction can help in the planning of surgical procedures. Manual measurement of this angle, on the other hand, can be time-intensive and vulnerable to inter-observer variability. In this paper, we present a deep-learning algorithm that can reliably estimate the femur CCD angle from X-ray images. To train and test the performance of our model, we employed an X-ray image dataset with associated femur CCD angle measurements. Furthermore, we built a prototype to display the resulting predictions and to allow the user to interact with the predictions. As this is happening in a sterile setting during surgery, we expanded our interface to the possibility of being used only by voice commands. Our results show that our deep learning model predicts the femur CCD angle on X-ray images with great accuracy, with a mean absolute error of 4.3 degrees on the left femur and 4.9 degrees on the right femur on the test dataset. Our results suggest that deep learning has the potential to give a more efficient and accurate technique for predicting the femur CCD angle, which might have substantial therapeutic implications for the diagnosis and management of hip problems.

Vision-Language Generative Model for View-Specific Chest X-ray Generation

Synthetic medical data generation has opened up new possibilities in the healthcare domain, offering a powerful tool for simulating clinical scenarios, enhancing diagnostic and treatment quality, gaining granular medical knowledge, and accelerating the development of unbiased algorithms. In this context, we present a novel approach called ViewXGen, designed to overcome the limitations of existing methods that rely on general domain pipelines using only radiology reports to generate frontal-view chest X-rays. Our approach takes into consideration the diverse view positions found in the dataset, enabling the generation of chest X-rays with specific views, which marks a significant advancement in the field. To achieve this, we introduce a set of specially designed tokens for each view position, tailoring the generation process to the user's preferences. Furthermore, we leverage multi-view chest X-rays as input, incorporating valuable information from different views within the same study. This integration rectifies potential errors and contributes to faithfully capturing abnormal findings in chest X-ray generation. To validate the effectiveness of our approach, we conducted statistical analyses, evaluating its performance in a clinical efficacy metric on the MIMIC-CXR dataset. Also, human evaluation demonstrates the remarkable capabilities of ViewXGen, particularly in producing realistic view-specific X-rays that closely resemble the original images.

Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark

Automatic security inspection using computer vision technology is a challenging task in real-world scenarios due to various factors, including intra-class variance, class imbalance, and occlusion. Most of the previous methods rarely solve the cases that the prohibited items are deliberately hidden in messy objects due to the lack of large-scale datasets, restricted their applications in real-world scenarios. Towards real-world prohibited item detection, we collect a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items. With an intensive amount of effort, our dataset contains 12 categories of prohibited items in 47,677 X-ray images with high-quality annotated segmentation masks and bounding boxes. To the best of our knowledge, it is the largest prohibited items detection dataset to date. Meanwhile, we design the selective dense attention network (SDANet) to construct a strong baseline, which consists of the dense attention module and the dependency refinement module. The dense attention module formed by the spatial and channel-wise dense attentions, is designed to learn the discriminative features to boost the performance. The dependency refinement module is used to exploit the dependencies of multi-scale features. Extensive experiments conducted on the collected PIDray dataset demonstrate that the proposed method performs favorably against the state-of-the-art methods, especially for detecting the deliberately hidden items.

Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images

The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing infected patients. Medical imaging, such as X-ray and Computed Tomography (CT), combined with the potential of Artificial Intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2 and DenseNet161) and their ensemble, using majority voting have been used to classify COVID-19, pneumoni{\ae} and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods - occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT, and using a global technique - neuron activation profiles. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model.

Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays

Due to the necessity for precise treatment planning, the use of panoramic X-rays to identify different dental diseases has tremendously increased. Although numerous ML models have been developed for the interpretation of panoramic X-rays, there has not been an end-to-end model developed that can identify problematic teeth with dental enumeration and associated diagnoses at the same time. To develop such a model, we structure the three distinct types of annotated data hierarchically following the FDI system, the first labeled with only quadrant, the second labeled with quadrant-enumeration, and the third fully labeled with quadrant-enumeration-diagnosis. To learn from all three hierarchies jointly, we introduce a novel diffusion-based hierarchical multi-label object detection framework by adapting a diffusion-based method that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. Specifically, to take advantage of the hierarchically annotated data, our method utilizes a novel noisy box manipulation technique by adapting the denoising process in the diffusion network with the inference from the previously trained model in hierarchical order. We also utilize a multi-label object detection method to learn efficiently from partial annotations and to give all the needed information about each abnormal tooth for treatment planning. Experimental results show that our method significantly outperforms state-of-the-art object detection methods, including RetinaNet, Faster R-CNN, DETR, and DiffusionDet for the analysis of panoramic X-rays, demonstrating the great potential of our method for hierarchically and partially annotated datasets. The code and the data are available at: https://github.com/ibrahimethemhamamci/HierarchicalDet.