current status - Global News One

Summary

Machine learning and deep learning, integral components of artificial intelligence, involve the process of training computers to learn and make informed decisions based on diverse datasets. Notably, recent strides in artificial intelligence predominantly emanate from the realm of deep learning, proving transformative across various domains, ranging from computer vision to health sciences. The impact of deep learning on medical practices has been particularly groundbreaking, reshaping traditional approaches to clinical applications. While certain medical subfields, such as pediatrics, initially lagged in harnessing the substantial advantages of deep learning, there is a noteworthy accumulation of related research in pediatric applications.

This paper undertakes a comprehensive review of newly developed machine learning and deep learning solutions tailored for neonatology applications. The assessment systematically explores the roles played by both classical machine learning and deep learning in neonatology, elucidating methodologies, including algorithmic advancements. Furthermore, it delineates the persisting challenges in evaluating neonatal diseases, adhering to PRISMA 2020 guidelines. The predominant areas of focus within neonatology’s AI applications encompass survival analysis, neuroimaging, scrutiny of vital parameters and biosignals, and the diagnosis of retinopathy of prematurity.

Drawing on 106 research articles spanning from 1996 to 2022, this systematic review meticulously categorizes and discusses their respective merits and drawbacks. The overarching goal of this study is to augment comprehensiveness, delving into potential directions for novel AI models and envisioning the future landscape of neonatology as AI continues to assert its influence. The narrative concludes by proposing roadmaps for the seamless integration of AI into neonatal intensive care units, foreseeing transformative implications for the field.

Commencement

The ongoing surge of artificial intelligence (AI) is reshaping diverse sectors, healthcare included, in an ever-evolving landscape. The dynamic nature of AI applications makes it challenging to keep pace with the constant innovations. Despite the profound impact of AI on daily life, many healthcare practitioners, especially in less-explored domains like neonatology, may not fully grasp the extent of AI integration into contemporary healthcare systems. This review aims to bridge this awareness gap, specifically targeting physicians navigating the intricate intersection of AI and neonatology.

The roots of AI, particularly within machine learning (ML), can be traced back to the 1950s, when Alan Turing conceptualized the “learning machine” alongside early military applications of rudimentary AI. During this era, computers were colossal, and the costs associated with expanding storage were exorbitant, limiting their capabilities. Over the ensuing decades, incremental advancements in both theoretical frameworks and technological infrastructure progressively enhanced the potency and adaptability of ML.

Understanding the mechanics of ML and its subset, deep learning (DL), is fundamental. ML, as a subset of AI, garnered attention due to its adeptness in handling data. ML algorithms and models possess the ability to learn from data, scrutinize, assess, and formulate predictions or decisions based on acquired insights. DL, a specialized form of ML, draws inspiration from the human brain’s neural networks, replicating their functionality through artificial neurons in computer neural networks. The distinctive feature of DL lies in its hierarchical architecture, facilitating the autonomous extraction of features from data, a departure from the reliance on human-engineered features in conventional ML.

Distinguishing ML from DL hinges on the complexity of models and the scale of datasets they can manage. ML algorithms prove effective across a spectrum of tasks, exhibiting simplicity in training and deployment. Conversely, DL algorithms necessitate larger datasets and intricate models but excel in tasks involving high-dimensional, intricate data, automatically identifying significant aspects without predefined elements of interest. The non-linear activation functions within DL architectures, such as artificial neural networks (ANN), contribute to the learning of complex features representative of provided data samples.

ML and DL fall into categories such as supervised, unsupervised, or reinforcement learning based on the nature of the input-output relationship. Their applications span classification, regression, and clustering tasks. DL’s success is contingent on large-scale data availability, innovative optimization algorithms, and the accessibility of Graphics Processing Units (GPUs). These autonomous learning algorithms, mirroring human learning processes, position DL as a pioneering ML method, catalyzing substantial transformations across medical and technological domains, marking it as the driving force propelling contemporary AI advancements.

Hierarchical Representation of AI: Illustration depicting the hierarchical structure of artificial intelligence. Exploring the Mechanics of Machine Learning (ML) and Deep Learning (DL): ML is a subset of AI, while DL, in turn, is a subset of ML. Overcoming Persistent Challenges in the Application of AI to Healthcare: Addressing ongoing obstacles faced by AI in healthcare applications. Key Concerns Impacting AI Outcomes in Neonatology: Identifying crucial concerns associated with AI in neonatology, including challenges in clinical interpretability, knowledge gaps in decision-making mechanisms necessitating human-in-the-loop systems, ethical considerations, limitations in data and annotations, and the absence of secure Cloud systems for data sharing and privacy.

In the field of deep learning (DL) applied to medical imaging, three primary problem categories exist: image segmentation, object detection (identifying organs or other anatomical/pathological entities), and image classification (e.g., diagnosis, prognosis, therapy response assessment)3. Numerous DL algorithms are commonly utilized in medical research, falling into specific algorithmic families:

Convolutional Neural Networks (CNNs): Mainly applied in computer vision and signal processing tasks, CNNs excel in tasks involving fixed spatial relationships, such as imaging data. The architecture comprises phases (layers) facilitating the acquisition of hierarchical features. Initial phases extract local features (corners, edges, lines), while subsequent phases extract more global features. Feature propagation involves nonlinearities and regularizations, enriching feature representation. Pooling operations reduce feature size, and the resulting features are employed for predictions (segmentation, detection, or classification)3,16.
Recurrent Neural Networks (RNNs): Tailored for retaining sequential data like text, speech, and time-series data (e.g., clinical or electronic health records). RNNs capture temporal relationships, aiding in predicting disease progression or treatment outcomes11,17,18. Long Short-Term Memory (LSTM) models, a subtype of RNNs, address limitations by learning long-term dependencies more effectively. LSTMs use a gated memory cell to store information, allowing them to learn complex patterns, particularly useful in audio classification17,19.
Generative Adversarial Networks (GANs): A DL model class used for generating new data resembling existing datasets. In healthcare, GANs create synthetic medical images using two CNNs (generator and discriminator). The generator produces synthetic images mimicking real ones, while the discriminator identifies artificially generated images. Adversarial training ensures the generator generates realistic data. GANs are versatile, employed for tasks like image enhancement, signal reconstruction, classification, and segmentation20,21,22.
Transfer Learning (TL): Rooted in cognitive science, TL minimizes annotation needs by transferring knowledge from pretrained models. However, using ImageNet pre-trained models for medical image classification can be inefficient due to differences between natural images and medical images25,26. Fine-tuning more layers in CNNs may improve accuracy, as the initial layers of ImageNet-pretrained networks may not efficiently detect low-level characteristics in medical images25,26.
Advanced DL Algorithms: Evolving daily, new methods such as Capsule Networks, Attention Mechanisms, and Graph Neural Networks (GNNs)27,28,29,30 are employed for imaging and non-imaging data analysis. Capsule Networks address CNN shortcomings, Attention Mechanisms enhance context understanding, and GNNs exhibit potential in both imaging and non-imaging data analysis28,34.

In the imaging field, both data-driven and physics-driven systems are essential. While DL methods show effectiveness, challenges persist, particularly in MRI construction where data-driven and physics-driven algorithms are employed based on the impracticality of acquiring fully sampled datasets.

The concept of Hybrid Intelligence, combining AI with human intellect, offers a promising collaborative approach. AI processes extensive data rapidly, while human expertise contributes context and intuition, leading to more precise decision-making. Hybrid intelligence systems hold promise for time-consuming tasks in healthcare and neonatology.

In the current landscape, AI in medicine has been in use for over a decade, with advancements in algorithms and hardware technologies contributing to its potential. Challenges remain as healthcare utilization increases, particularly in integrating AI into daily clinical practice.

Clinicians seek enhanced diagnostic tools from AI, anticipating reduced invasive tests and increased diagnostic accuracy. This systematic review aims to explore AI’s potential role in neonatology, envisioning a future where hybrid intelligence optimizes neonatal care. The study outlines AI applications, evaluation metrics, and challenges in neonatology, providing a comprehensive overview. The paper’s objectives encompass thorough explanations of AI models, categorization of neonatology-related applications, and discussions of challenges and future research directions.

To elucidate a comprehensive understanding, the objectives are:

Provide a detailed explanation of various AI models and thoroughly articulate evaluation metrics, elucidating the key features inherent in these models.
Categorize AI applications pertinent to neonatology into overarching macro-domains, expounding on their respective sub-domains and highlighting crucial aspects of the applicable AI models.
Scrutinize the contemporary landscape of studies, especially those in recent years, with a specific focus on the widespread utilization of machine learning (ML) across the entire spectrum of neonatology.
Furnish an exhaustive and well-structured overview, including a classification of Deep Learning (DL) applications deployed within the realm of neonatology.
Assess and engage in a discourse concerning prevailing challenges linked to AI implementation in neonatology, along with contemplating future avenues for research. This aims to provide clinicians with a comprehensive perspective on the current scenario.

Presented here is an outline of our paper’s structure and goals:
Elaborating on AI Models and Evaluation Metrics
Assessing Studies Applying ML in Neonatology
Assessing Studies Applying DL in Neonatology
Scrutinizing Challenges and Mapping Future Directions.

I encompass a comprehensive concept that involves the application of computational algorithms capable of categorizing, predicting, or deriving valuable conclusions from vast datasets. Over the past three decades, various algorithms, including Naive Bayes, Genetic Algorithms, Fuzzy Logic, Clustering, Neural Networks (NN), Support Vector Machines (SVM), Decision Trees, and Random Forests (RF), have been utilized for tasks such as detection, diagnosis, classification, and risk assessment in the field of medicine. Traditional machine learning (ML) methods often incorporate hand-engineered features, which consist of visual descriptions and annotations learned from radiologists and are encoded into algorithms for image classification.

Medical data encompasses diverse unstructured sources such as images, signals, genetic expressions, electronic health records (EHR), and vital signs (refer to Fig. 3). The intricate structures of these data types allow deep learning (DL) frameworks to leverage their heterogeneity, achieving high levels of abstraction in data analysis.

Diverse medical information, encompassing unstructured data like medical images, vital signals, genetic expressions, electronic health records (EHRs), and signal data, contributes to a broad spectrum of healthcare insights. Effectively analyzing and interpreting various data streams in neonatology demands a comprehensive strategy, given the distinctive characteristics and complexities inherent in each type of data.

While machine learning (ML) necessitates manual selection and crafted transformation of information from incoming data, deep learning (DL) executes these tasks more efficiently and with heightened efficacy. DL achieves this by autonomously uncovering components through the analysis of a large number of samples, a process that is highly automated. The literature extensively covers ML approaches predating the advent of DL.

For clinicians, understanding how the recommended ML model enhances patient care is crucial. Given that a single metric cannot encapsulate all desirable attributes, it is customary to describe a model’s performance using various metrics. Unfortunately, end-users often struggle to comprehend these measurements, making it challenging to objectively compare models across different research endeavors. Currently, there is no available method or tool to compare models based on the same performance measures. This section elucidates common ML and DL evaluation metrics to empower neonatologists in adapting them to their research and understanding upcoming articles and research design.

The widespread application of artificial intelligence (AI) spans daily life to high-risk medical scenarios. In neonatology, the incorporation of AI has been a gradual process, with numerous studies emerging in the literature. These studies leverage various imaging modalities, electronic health records, and ML algorithms, some of which are still in the early stages of integration into clinical workflows. Despite the absence of specific systematic reviews and future discussions in this field, several studies have focused on introducing AI systems to neonatology. However, their success has been limited. Recent advancements in DL have, however, shifted research in this field in a more promising direction. Evaluation metrics in these studies commonly include standard measures such as sensitivity (true-positive rate), specificity (true-negative rate), false-positive rate, false-negative rate, receiver operating characteristics (ROC), area under the ROC curves (AUC), and accuracy (Table 1).

Term	Definition
True Positive (TP)	The number of positive samples correctly identified.
True Negative (TN)	The number of samples accurately identified as negative.
False Positive (FP)	The number of samples incorrectly identified as positive.
False Negative (FN)	The number of samples incorrectly identified as negative.
Accuracy (ACC)	The proportion of correctly identified samples to the total sample count in the assessment dataset. The accuracy is limited to the range [0, 1], where 1 represents properly predicting all positive and negative samples, and 0 represents successfully predicting none of the positive or negative samples.
Recall (REC)	Also known as sensitivity or True Positive Rate (TPR), it is the proportion of correctly categorized positive samples to all samples allocated to the positive class. Computed as the ratio of correctly classified positive samples to all samples assigned to the positive class.
Specificity (SPEC)	The negative class form of recall (sensitivity) and reflects the proportion of properly categorized negative samples.
Precision (PREC)	The ratio of correctly classified samples to all samples assigned to the class.
Positive Predictive Value (PPV)	The proportion of correctly classified positive samples to all positive samples.
Negative Predictive Value (NPV)	The ratio of samples accurately identified as negative to all samples classified as negative.
F1 score (F1)	The harmonic mean of precision and recall, eliminating excessive levels of either.
Cross Validation	A validation technique often employed during the training phase of modeling, without duplication among validation components.
AUROC (Area under ROC curve – AUC)	A function representing the effect of various sensitivities (true-positive rate) on the false-positive rate. Limited to the range [0, 1], where 1 represents properly predicting all cases and 0 represents predicting none of the cases.
ROC	By displaying the effect of variable levels of sensitivity on specificity, it is possible to create a curve illustrating the performance of a particular predictive algorithm.
Overfitting	Modeling failure indicating extensive training and poor performance on tests.
Underfitting	Modeling failure indicating inadequate training and inadequate test performance.
Dice Similarity Coefficient	Used for image analysis. Limited to the range [0, 1], where 1 represents proper segmentation of all images and 0 represents successfully segmenting none of the images.

Results

This systematic review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol56. The search process was concluded on July 11, 2022. Initially, a substantial number of articles (approximately 9000) were identified, and a systematic approach was employed to screen and select pertinent articles based on their alignment with the research focus, study design, and relevance to the topic. After reviewing article abstracts, 987 studies were identified. Ultimately, our search resulted in the inclusion of 106 research articles spanning the years 1996 to 2022 (Fig. 4). The risk of bias was assessed using the QUADAS-2 tool.

The preliminary research, conducted on July 11, 2022, identified a total of 9000 articles, out of which 987 article abstracts underwent screening. Following this screening process, 106 research articles published between 1996 and 2022 met the criteria for inclusion in this systematic review. For a more detailed depiction of the study selection process, refer to the PRISMA flow diagram.

The QUADAS-2 tool was utilized for a comprehensive analysis of bias risk.

Our discoveries are condensed into two sets of tables: Tables 2–5 outline AI techniques from the pre-deep learning era (“Pre-DL Era”) in neonatal intensive care units, categorized by the type of data and applications. Conversely, Tables 6 and 7 encompass studies from the DL Era, focusing on applications such as classification (prediction and diagnosis), detection (localization), and segmentation (pixel-level classification in medical images).

Study	Approach	Purpose	Dataset	Type of Data	Performance	Pros(+)	Cons(-)
Hoshino et al., 2017[^194^]	CLAFIC, logistic regression analysis	To determine optimal color parameters predicting Biliary atresia (BA)	Stools	50 neonates	30 BA and 34 non-BA images	100% (AUC)	+ Effective and convenient modality for early detection of BA, and potentially for other related diseases
Dong et al., 2021[^195^]	Level Set algorithm	To evaluate postoperative enteral nutrition of neonatal high intestinal obstruction and analyze clinical treatment effect	60 neonates	CT images	84.7% (accuracy)	+ Segmentation algorithm can accurately segment the CT image, displaying the disease location and its contour more clearly.	– EHR (not included AI analysis) – Small sample size – Retrospective design
Ball et al., 2015[^90^]	Random Forest (RF)	To compare whole-brain functional connectivity in preterm newborns with healthy term-born neonates	105 preterm infants and 26 term controls	Resting state functional MRI and T2-weighted Brain MRI	80% (accuracy)	+ Prospective + Connectivity differences between term and preterm brain	– Not well-established model
Smyser et al., 2016[^88^]	Support vector machine (SVM)-multivariate pattern analysis (MVPA)	To compare resting state-activity of preterm-born infants to term infants	50 preterm infants and 50 term-born control infants	Functional MRI data + Clinical variables	84% (accuracy)	+ Prospective GA at birth used as an indicator of the degree of disruption of brain development + Optimal methods for rs-fMRI data acquisition and preprocessing not rigorously defined	– Small sample size
Zimmer et al., 2017[^93^]	NAF: Neighborhood approximation forest classifier of forests	To reduce the complexity of heterogeneous data population, manifold learning techniques are applied	111 infants (NC, 70 subjects), affected by IUGR (27 subjects) or VM (14 subjects)	3 T brain MRI	80% (accuracy)	+ Combining multiple distances related to the condition improves overall characterization and classification of the three clinical groups (Normal, IUGR, Ventriculomegaly)	– Lack of neonatal data due to challenges during acquisition and data accessibility – Small sample size
Krishnan et al., 2017[^100^]	Unsupervised machine learning: Sparse Reduced Rank Regression (sRRR)	Variability in the Peroxisome Proliferator Activated Receptor (PPAR) pathway related to brain development	272 infants born at less than 33 wk gestational age (GA)	Diffusion MR Imaging + Diffusion Tractography + Genome-wide Genotyping	63% (AUC)	+ Inhibited brain development controlled by genetic variables, and PPARG signaling plays a previously unknown cerebral function	– Further work required to characterize the exact relationship between PPARG and preterm brain development
Chiarelli et al., 2019[^91^]	Multivariate statistical analysis	To better understand the effect of prematurity on brain structure and function	88 newborns	3 Tesla BOLD and anatomical brain MRI + Few clinical variables	– Multivariate analysis using motion information could not significantly infer GA at birth + Prematurity associated with bidirectional alterations of functional connectivity and regional volume	– Retrospective design – Small sample size
Song et al., 2017[^94^]	Fuzzy nonlinear support vector machines (SVM)	Neonatal brain tissue segmentation in clinical magnetic resonance (MR) images	10 term neonates	Brain MRI T1 and T2 weighted	70%–80% (dice score-gray matter) 65%–80% (dice score-white matter)	+ Nonparametric modeling adapts to spatial variability in intensity statistics arising from variations in brain structure and image inhomogeneity + Produces reasonable segmentations even in the absence of atlas prior	– Small sample size
Taylor et al., 2017[^137^]	Machine Learning	Technology that uses a smartphone application for effectively screening newborns for jaundice	530 newborns	Paired BiliCam images + Total serum bilirubin (TSB) levels	High-risk zone TSB level was 95% for BiliCam and 92% for TcB (P = 0.30); for identifying newborns with a TSB level of ≥17.0, AUCs were 99% and 95%, respectively (P =0.09).	+ Inexpensive technology that uses commodity smartphones for effective jaundice screening + Multicenter data + Prospective design	– Method and algorithm name not explained
Ataer-Cansizoglu et al., 2015[^134^]	Gaussian Mixture Models	i-ROP To develop a novel computer-based image analysis system for grading plus diseases in ROP	77 wide-angle retinal images	95% (accuracy)	+ Arterial and venous tortuosity (combined), and a large circular cropped image provided the highest diagnostic accuracy + Comparable to the performance of individual experts	– Used manually segmented images with a tracing algorithm to avoid possible noise and bias – Low clinical applicability
Rani et al., 2016[^133^]	Back Propagation Neural Networks	To classify ROP	64 RGB images of these stages taken by RetCam with 120 degrees field of view and size of 640 × 480 pixels	90.6% (accuracy)	– No clinical information – Requires better segmentation – Clinical adaptation
Karayiannis et al., 2006[^101^]	Artificial Neural Networks (ANN)	To aim at the development of a seizure-detection system	54 patients + 240 video segments	Each of the training and testing sets contained 120 video segments (40 segments of myoclonic seizures, 40 segments of focal clonic seizures, and 40 segments of random movements	96.8% (sensitivity) 97.8% (specificity)	+ Video analysis	– Not capable of detecting neonatal seizures with subtle clinical manifestations (Subclinical seizures) or neonatal seizures with no clinical manifestations (electrical-only seizures – No EEG analysis – Small sample size – No additional clinical information

[^194^]: Hoshino et al., 2017
[^195^]: Dong et al., 2021
[^90^]: Ball et al., 2015
[^88^]: Smyser et al., 2016
[^93^]: Zimmer et al., 2017
[^100^]: Krishnan et al., 2017
[^91^]: Chiarelli et al., 2019
[^94^]: Song et al., 2017
[^137^]: Taylor et al., 2017
[^134^]: Ataer-Cansizoglu et al., 2015
[^133^]: Rani et al

Study	Approach	Purpose	Dataset	Type of data	Performance	Pros(+)	Cons(-)
Reed et al., 1996135	Recognition-based reasoning	Diagnosis of congenital heart defects	53 patients	Patient history, physical exam, blood tests, cardiac auscultation, X-ray, and EKG data	+ Useful in multiple defects	– Small sample size, Not real AI implementation
Aucouturier et al., 2011148	Hidden Markov model architecture (SVM, GMM)	Identify expiratory and inspiration phases from the audio recording of human baby cries	14 infants, spanning four vocalization contexts in their first 12 months	Voice record	86%-95% (accuracy)	+ Quantify expiration duration, crying rate, and other time-related characteristics for screening, diagnosis, and research	– More data needed, No clinical explanation, Small sample size, Required preprocessing
Cano Ortiz et al., 2004149	Artificial neural networks (ANN)	Detect CNS diseases in infant cry	35 neonates, nineteen healthy cases and sixteen sick neonates	Voice record (187 patterns)	85% (accuracy)	+ Preliminary result	– More data needed for correct classification
Hsu et al., 2010151	Support Vector Machine (SVM) Service-Oriented Architecture (SOA)	Diagnose Methylmalonic Acidemia (MMA)	360 newborn samples	Metabolic substances data collected from tandem mass spectrometry (MS/MS)	96.8% (accuracy)	+ Better sensitivity than classical screening methods	– Small sample size, SVM pilot stage education not integrated
Baumgartner et al., 2004152	Logistic regression analysis (LRA), Support vector machines (SVM), Artificial neural networks (ANN), Decision trees (DT), k-nearest neighbor classifier (k-NN)	Focus on phenylketonuria (PKU), medium chain acyl-CoA dehydrogenase deficiency (MCADD)	Bavarian newborn screening program all newborns	Metabolic substances data collected from tandem mass spectrometry (MS/MS)	99.5% (accuracy)	+ ML techniques delivered high predictive power	– Lacking direct interpretation of knowledge representation
Chen et al., 2013153	Support vector machine (SVM)	Diagnose phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency	347,312 infants (220 metabolic disease suspect)	Newborn dried blood samples	99.9% (accuracy) for each condition	+ Reduced false positive cases	– Feature selection strategies did not include total features
Temko et al., 2011105	Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method	Measure system performance for neonatal seizure detection using EEG	17 newborns	267 hours clinical dataset	89% (AUC)	+ SVM-based system assists clinical staff in interpreting EEG	– No clinical variable, Difficult to obtain large datasets
Temko et al., 2012104	SVM	Use recent advances in clinical understanding of seizure burden in neonates with hypoxic ischemic encephalopathy to improve automated detection	17 HIE patients	816.7 hours EEG recordings	96.7% (AUC)	+ Improved seizure detection	– Small sample size, No clinical information
Temko et al., 2013115	Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method	Validate robustness of Temko 2011105	Trained in 38 term neonates, Tested in 51 neonates	Trained in 479 hours EEG recording, Tested in 2540 hours	96.1% (AUC), Correct detection of seizure burden 70%	– Small sample size, No clinical information
Stevenson et al., 2013116	Multiclass linear classifier	Automatically grade one-hour EEG epoch	54 full term neonates	One-hour-long EEG recordings	77.8% (accuracy)	+ Involvement of clinical expert, Method explained in detail	– Retrospective design
Ahmed et al., 2016114	Gaussian mixture model, Universal Background Model (UBM), SVM	Grade hypoxic–ischemic encephalopathy (HIE) severity using EEG	54 full term neonates	One-hour-long EEG recordings	87% (accuracy)	+ Significant assistance to healthcare professionals	– Retrospective design
Mathieson et al., 2016103	Robusted Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method115	Validate Temko 2013115	70 babies from 2 centers (35 Seizure, 35 Non-Seizure)	Seizure detection algorithm thresholds clinically acceptable range	Detection rates 52.5%–75%	+ Clinical information and Cohen score added	– Retrospective design
Mathieson et al., 2016198	Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method.105	Analyze Seizure detection Algorithm and characterize false negative seizures	20 babies (10 seizure -10 non-seizure) (20 of 70 babies)103	Seizure detections evaluated sensitivity threshold	+ Clinical information and Cohen score added	– Retrospective design
Yassin et al., 2017150	Locally linear embedding (LLE)	Explore autoencoders for diagnosing infant asphyxia from infant cry	600 segmented signals (284 normal cries, 316 asphyxiated cries)	100% (accuracy)	+ 600 MFCC features distinguish normal and asphyxiated newborns	– No clinical information
Li et al., 2011136	Fuzzy backpropagation neural networks	Establish early diagnostic system for hypoxic ischemic encephalopathy (HIE)	140 cases (90 patients, 50 controls)	Medical records of newborns with HIE	100% correct recognition rate for training samples, 95% for test samples	+ High accuracy in early diagnosis of HIE	– Small sample size
Zernikow et al., 199884	ANN	Detect early occurrence of severe IVH in individual patient	890 preterm neonates (50%, 50%)	Validation and training EHR	93.5% (AUC)	+ Observational study	– No image, Skipped variables during ANN training
Ferreira et al., 2012138	Decision trees and neural networks	Identify neonatal jaundice	227 healthy newborns	70 variables analyzed	89% (accuracy), 84% (AUC)	+ Predicting subsequent hyperbilirubinemia with high accuracy	– Not all factors contributing to hyperbilirubinemia included
Porcelli et al., 2010228	Artificial neural network (ANN)	Compare accuracy of birth weight–based weight curves with

Certainly! Here is the rewritten table:

Study	Approach	Purpose	Dataset	Type of Data (Image/Non-Image)	Performance	Pros(+)	Cons(-)
Hauptmann et al., 2019187	3D (2D plus time) CNN architecture	Reconstruction of highly accelerated radial real-time data in patients with congenital heart disease	250 CHD patients. Cardiovascular MRI with cine images	Image	+Potential use of a CNN for reconstructing real-time radial data	–	–
Lei et al., 2022158	MobileNet-V2 CNN	Detect PDA with AI	300 patients 461 echocardiograms	Image	88% (AUC)	+Diagnosis of PDA with AI	–
Ornek et al., 2021189	VGG16 (CNN)	Monitoring neonates’ health status (healthy/unhealthy) by focusing on dedicated regions	38 neonates 3800 Neonatal thermograms	Image	95% (accuracy)	+Understanding how VGG16 decides on neonatal thermograms	–
Ervural et al., 2021190	Data Augmentation and CNN	Detect health status of neonates	44 neonates 880 images Neonatal thermograms	Image	62.2% to 94.5% (accuracy)	+Significant results with data augmentation	-Less clinically applicable -Small dataset
Ervural et al., 2021191	Deep Siamese Neural Network (D-SNN)	Prediagnosis to experts in disease detection in neonates	67 neonates, 1340 images Neonatal thermograms	Image	99.4% (infection diseases accuracy), 96.4% (oesophageal atresia accuracy), 97.4% (intestinal atresia accuracy), 94.02% (necrotizing enterocolitis accuracy)	+D-SNN is effective in the classification of neonatal diseases with limited data	-Small sample size
Ceschin et al., 2018188	3D CNNs	Automated classification of brain dysmaturation from neonatal MRI in CHD	90 term-born neonates with congenital heart disease and 40 term-born healthy controls	Image	98.5% (accuracy)	+3D CNN on a small sample size showing excellent performance using cross-validation +Cerebellar dysplasia in CHD patients	-Small sample size
Ding et al., 2020169	HyperDense-Net and LiviaNET	Neonatal brain segmentation	40 neonates 24 for training 16 for experiment 3T Brain MRI T1 and T2	Image	94%, 95%, 92% (Dice Score) 90%, 90%, 88% (Dice Score)	+Both neural networks can segment neonatal brains, achieving previously reported performance	-Small sample size
Liu et al., 202099	Graph Convolutional Network (GCN)	Brain age prediction from MRI	137 preterm 1.5-Tesla MRI Bayley-III Scales of Toddler Development at 3 years	Image	Show the GCN’s superior prediction accuracy compared to state-of-the-art methods	+The first study that uses GCN on brain surface meshes to predict neonatal brain age	-No clinical information
Hyun et al., 2016155	NLP and CNN (AlexNet and VGG16)	Classifying and annotating neonatal brain ultrasound scans using NLP and CNN	2372 de-identified NS reports 11,205 NS head images	Image	87% (AUC)	+Automated labeling	-No clinical variable
Kim et al., 2022157	CNN (VGG16)	Transfer learning	Assess whether a CNN can be trained via transfer learning to diagnose germinal matrix hemorrhage on head ultrasound	400 head ultrasounds (200 with GMH, 200 without hemorrhage)	Image	92% (AUC)	+First study to evaluate GMH with grade and saliency map +Not confirmed with MRI or labeling by radiologists
Li et al., 2021159	ResU-Net	Diffuse white matter abnormality (DWMA) on VPI’s MR images at term-equivalent age	98 VPI 28 VPI 3 Tesla Brain MRI T1 and T2 weighted	Image	87.7% (Dice Score), 92.3% (accuracy)	+Developed to segment diffuse white matter abnormality on T2-weighted brain MR images of very preterm infants +3D ResU-Net model achieved better DWMA segmentation performance than multiple peer deep learning models	-Small sample size -Limited clinical information
Greenbury et al., 2021170	Agnostic, unsupervised ML Dirichlet Process Gaussian Mixture Model (DPGMM)	Understanding nutritional practice in neonatal intensive care	45,679 patients over a six-year period in the UK National Neonatal Research Database (NNRD) EHR	Non-Image	Clustering on time analysis on daily nutritional intakes for extremely preterm infants born <32 weeks gestation	+Identifying relationships between nutritional practice and exploring associations with outcomes +Large national multi-center dataset	-Strong likelihood of multiple interactions between nutritional components that could be utilized in records
Ervural et al., 2021192	CNN Data augmentation	Detect respiratory abnormalities of neonates using limited thermal images	34 neonates 680 images 2060 thermal images (11 testing, 23 training) Thermal camera image	Image	85% (accuracy)	+CNN model and data enhancement methods used to determine respiratory system anomalies in neonates	-Small sample size -No follow-up and no clinical information
Wang et al., 2018174	DCNN	Classify and grade retinal hemorrhage automatically	3770 newborns with retinal hemorrhage and normal controls 48,996 digital fundus images	Image	97.85% to 99.96% (accuracy), 98.9% to 100% (AUC)	+First study to show that a DCNN can detect and grade neonatal retinal hemorrhage	–
Brown et al., 2018171	DCNN	Develop and test an algorithm to diagnose plus disease from retinal photographs	5511 retinal photographs (trained) and independent set of 100 images Retinal images	Image	94% (AUC), 98% (AUC)	+Outperforming 6 of 8 ROP experts +Completely automated algorithm detected plus disease in ROP with the same or greater accuracy as human doctors	-Disease detection, monitoring, and prognosis in ROP-prone neonates -No clinical information and no clinical variables
**Wang et al.,

Machine Learning Utilizations in Neonatal Mortality: A Comprehensive Overview

Neonatal mortality stands as a significant contributor to overall child mortality, representing 47 percent of deaths in children under the age of five, according to the World Health Organization60. The imperative to reduce global infant mortality by 203061 underscores the urgency of addressing this issue.

Machine Learning (ML) has been applied to investigate infant mortality, its determinants, and predictive modeling62,63,64,65,66,67,68. A recent study enrolled 1.26 million infants, predicting mortality as early as 5 minutes and as late as 7 days using an array of models, predominantly neural networks, random forests, and logistic regression (58.3%)67. While several studies reported favorable results, including AUC ranging from 58.3% to 97.0%, challenges such as small sample sizes and lack of dynamic parameter representation hindered broader clinical applicability67. Notably, gestational age, birth weight, and APGAR scores emerged as pivotal variables64,72. Future research recommendations emphasize external validation, calibration, and integration into healthcare practices67.

Neonatal sepsis, encompassing early and late onset, remains a formidable challenge in neonatal care, prompting ML applications for early detection. Studies predicted early sepsis using heart rate variability and clinical biomarkers with accuracies ranging from 64% to 94%74,75.

Advancements in neonatal healthcare have reduced severe prenatal brain injury incidence but underscored the need to predict neurodevelopmental outcomes. ML methods, including brain segmentation, connectivity analysis, and neurocognitive evaluations, have been employed to address this. Additionally, ML aids in neuromonitorization, such as automatic seizure detection from EEG and analyzing EEG biosignals in infants with hypoxic-ischemic encephalopathy (HIE)104,105,106,107,108.

ML applications extend to predicting complications in preterm infants, including Patent Ductus Arteriosus (PDA), Bronchopulmonary Dysplasia (BPD), and Retinopathy of Prematurity (ROP). PDA detection from electronic health records (EHR) and auscultation records demonstrated accuracies of 76% and 74%, respectively123,124. ML studies predicted BPD with accuracies up to 86%, and other research aimed to predict complications related to long-term invasive ventilation128,129,130.

ROP, a leading cause of childhood blindness, has seen ML applications in diagnosing and classifying from retinal fundus images, with systems achieving up to 95% accuracy132,133,134.

ML also finds utility in the diagnosis of various neonatal diseases, utilizing EHR and medical records for conditions like congenital heart defects, HIE, IVH, neonatal jaundice, NEC, and predicting rehospitalization135,136,84,85,137,138,139,142,143.

Furthermore, ML has been applied to analyze electronically captured physiologic data for artifact detection, late-onset sepsis prediction, and overall morbidity evaluation144,145,146.

In addressing metabolic disorders of newborns, ML methods, especially Support Vector Machines (SVM), have been employed for conditions like methylmalonic acidemia (MMA), phenylketonuria (PKU), and medium-chain acyl CoA dehydrogenase deficiency (MCADD)151,152,153. Notably, ML contributes to improving the positive predictive value in newborn screening programs for these disorders152.

In summary, ML applications span a wide spectrum in neonatal care, from predicting mortality and sepsis to assessing neurodevelopmental outcomes and complications in preterm infants, showcasing its diverse and impactful role in improving neonatal healthcare.

Deep Learning Advancements in Neonatology

Deep Learning in clinical image analysis serves three primary purposes: classification, detection, and segmentation. Classification focuses on identifying specific features in an image, detection involves locating multiple features within an image, and segmentation entails dividing an image into multiple parts7,9,154,155,156,157,158,159,160.

AI-Enhanced Neuroradiological Assessment in Neonatology

Neonatal neuroimaging plays a crucial role in identifying early signs of neurodevelopmental abnormalities, enabling timely intervention during a period of heightened neuroplasticity and rapid cognitive and motor development. The application of Deep Learning (DL) methods enhances the diagnostic process, providing earlier insights than traditional clinical signs would indicate.

However, imaging an infant’s brain using Magnetic Resonance Imaging (MRI) poses challenges due to lower tissue contrast, regional heterogeneity, age-related intensity variations, and the impact of partial volume effects. To address these issues, specialized computational neuroanatomy tools tailored for infant-specific MRI data are under development. The typical pipeline for predicting neurodevelopmental disorders from infant structural MRI involves image preprocessing, tissue segmentation, surface reconstruction, and feature extraction, followed by AI model training and prediction.

Segmenting a newborn’s brain is particularly challenging due to decreased signal-to-noise ratio, motion restrictions, and the smaller brain size. Various non-DL-based approaches, including parametric, classification, multi-atlas fusion, and deformable models, have been proposed for newborn brain segmentation. The evaluation metric, Dice Similarity Coefficient, measures segmentation accuracy.

The future of neonatal brain segmentation research involves developing more sophisticated neural segmentation networks. Despite advancements, the slow progress in the field of artificial intelligence in neonatology is attributed to a lack of open-source algorithms and limited datasets.

Further research should focus on enhancing DL accuracy in diagnosing conditions such as germinal matrix hemorrhage. Comparisons between DL and sonographers in identifying suspicious studies, grading hemorrhages accurately, and improving diagnostic capabilities of head ultrasound in diverse clinical scenarios warrant attention.

The evaluation of prematurity complications using DL in neonatology encompasses various applications, including disease prediction, MR image analysis, combined EHR data analysis, and predicting neurocognitive outcomes and mortality. DL proves effective in detecting conditions like PDA, IVH, BPD, ROP, and retinal hemorrhage. Additionally, DL contributes to treatment planning, NICU discharge, personalized medicine, and follow-up care.

DL’s potential in ROP screening programs is notable, offering cost-effective solutions for detecting severe cases that require therapy. Studies show DL outperforming experts in diagnosing plus disease and quantifying the clinical progression of ROP. DL applications extend to sleep protection in the NICU, real-time evaluation of cardiac MRI for congenital heart disease, classification of brain dysmaturation from neonatal brain MRI, and disease classification from thermal images.

Two groundbreaking studies emphasize the impact of DL on nutrition practices in NICU and the use of wireless sensors. ML techniques, unbiased and data-driven, showcase the potential to bring about clinical practice changes and improve monitoring, preventing iatrogenic injuries in neonatal care.

Discussion

The studies in neonatology involving AI were systematically categorized based on three primary criteria:

(i) Whether the studies utilized Machine Learning (ML) or Deep Learning (DL) methods,
(ii) Whether imaging data or non-imaging data were employed in the studies, and
(iii) According to the primary aim of the study, whether it focused on diagnosis or other predictive aspects.

In the pre-Deep Learning era, the majority of neonatology studies were conducted using ML methods. Specifically, we identified 12 studies that utilized ML along with imaging data for diagnostic purposes. Furthermore, 33 studies employed non-imaging data for diagnostic applications. The spectrum of imaging data studies covered diverse areas such as biliary atresia (BA) diagnosis based on stool color, postoperative enteral nutrition for neonatal high intestinal obstruction, functional brain connectivity in preterm infants, retinopathy of prematurity (ROP) diagnosis, neonatal seizure detection from video records, and newborn jaundice screening. Non-imaging studies for diagnosis included congenital heart defects diagnosis, baby cry analysis, inborn metabolic disorder diagnosis and screening, hypoxic-ischemic encephalopathy (HIE) grading, EEG analysis, patent ductus arteriosus (PDA) diagnosis, vital sign analysis, artifact detection, extubation and weaning analysis, and bronchopulmonary dysplasia (BPD) diagnosis.

In contrast, studies involving Deep Learning applications were less prevalent compared to Machine Learning. DL studies focused on brain segmentation, intraventricular hemorrhage (IVH) diagnosis, EEG analysis, neurocognitive outcome prediction, and PDA and ROP diagnosis. While the DL field is expected to witness increased research in upcoming articles, it is noteworthy that there have been several articles and studies on the application of AI in neonatology. However, many of these lack sufficient details, making it challenging to evaluate and compare them comprehensively, thus limiting their utility for clinicians.

Several limitations exist in the application of AI in neonatology, including the absence of prospective design, challenges in clinical integration, small sample sizes, and evaluations limited to single centers. DL has demonstrated potential in extracting information from clinical images, bioscience, and biosignals, as well as integrating unstructured and structured data in Electronic Health Records (EHR). However, key concerns related to DL in medicine include difficulties in clinical integration, the need for expertise in decision mechanisms, lack of data and annotations, insufficient explanations and reasoning capabilities, limited collaboration efforts across institutions, and ethical considerations. These challenges collectively impact the success of DL in the medical domain and are categorized into six components for further examination.

Challenges in Integrating Clinical Practices

“Challenges in Integrating AI into Neonatal Healthcare: A Perspective on Clinical Trials

Despite the significant advancements in AI accuracy within the healthcare domain, translating these achievements into practical treatment pathways faces multiple hurdles. One major concern among physicians is the lack of well-established randomized clinical trials, particularly in pediatrics, demonstrating the reliability and enhanced effectiveness of AI systems compared to traditional methods for diagnosing neonatal diseases and recommending suitable therapies. Comprehensive discussions on the pros and cons of such studies are presented in tables and relevant sections. Current research predominantly focuses on imaging-based or signal-based investigations, often centered around a specific variable or disease. Neonatologists and pediatricians express the need for evidence-based algorithms with proven efficacy. Remarkably, there are only six prospective clinical trials in neonatology involving AI. One notable trial, supported by the European Union Cost Program, explores the detection of neonatal seizures using conventional EEG in the NICU197. Another study investigates the physiological effects of music in premature infants208, though it does not employ AI analysis. A recent trial, “Rebooting Infant Pain Assessment: Using Machine Learning to Exponentially Improve Neonatal Intensive Care Unit Practice (BabyAI),” is currently recruiting209. Another ongoing study aims to collect real-time data on pain signals in non-verbal infants using sensor-fusion and machine learning algorithms210. However, no results have been submitted yet. Similarly, the “Prediction of Extubation Readiness in Extreme Preterm Infants by the Automated Analysis of Cardiorespiratory Behavior: APEX study” completed recruitment with 266 infants, but results are pending211. In summary, there is a notable scarcity of prospective multicenter randomized AI studies with published results in the neonatology field. Addressing this gap requires planning clinically integrated prospective studies that incorporate real-time data collection, considering the rapidly changing clinical circumstances of infants and the inclusion of multimodal data with both imaging and non-imaging components.”

Requisite Expertise in Decision-Making Mechanisms

“In the realm of neonatology, considering whether to heed a system’s recommendation may necessitate the presentation of corroborative evidence95,96,125,202. Many proposed AI solutions in the medical domain are not meant to supplant the decision-making or expertise of physicians but rather serve as valuable aids. In the challenging landscape of neonatal survival without sequelae, AI could revolutionize neonatology. The diverse spectrum of neonatal diseases, coupled with varying clinical presentations based on gestational age and postnatal age, complicates accurate diagnoses for neonatologists. AI holds the potential for early disease detection, offering crucial assistance to clinicians for prompt responses and favorable therapeutic outcomes.

Neonatology involves collaborative efforts across multiple disciplines in patient management, presenting an opportunity for AI to achieve unprecedented levels of efficacy. With increased resources and support from physicians, AI could make significant contributions to neonatology. Collaboration extends to various pediatric specialties, such as perinatology, pediatric surgery, radiology, pediatric cardiology, pediatric neurology, pediatric infectious disease, neurosurgery, cardiovascular surgery, and other subspecialties. These multidisciplinary workflows, involving patient follow-up and family engagement, could benefit from AI-based predictive analysis tools to address potential risks and neurological issues. AI-supported monitoring systems, capable of analyzing real-time data from monitors and detecting changes simultaneously, would be valuable not only for routine Neonatal Intensive Care Unit (NICU) care but also for fostering “family-centered care”212,213 initiatives. While neonatologists remain central to decision-making and communication with parents, AI could actively contribute to NICU practices. Hybrid intelligence offers a platform for monitoring both abrupt and subtle clinical changes in infants’ conditions.

The limited understanding of Deep Learning (DL) among many medical professionals poses a challenge in establishing effective communication between data scientists and medical specialists. A considerable number of medical professionals, including pediatricians and neonatologists, lack familiarity with AI and its applications due to limited exposure to the field. However, efforts are underway to bridge this gap, with clinicians taking the lead in AI initiatives through conferences, workshops, courses, and even coding schools214,215,216,217,218.

Looking ahead, neonatal critical conditions are likely to be monitored by human-in-the-loop systems, and AI-empowered risk classification systems may assist clinicians in prioritizing critical care and allocating resources precisely. While AI cannot replace neonatologists, it can serve as a clinical decision support system in the dynamic and urgent environment of the NICU, calling for prompt responses.”

Challenges Stemming from Insufficient Imaging Data, Annotations, and Reproducibility Issues

“There is a growing interest in leveraging deep learning methodologies for predicting neurological abnormalities using connectome data; however, their application in preterm populations has been restricted. Similar to most deep learning (DL) applications, these models often necessitate extensive datasets. Yet, obtaining large neuroimaging datasets, especially in pediatric settings, remains challenging and costly. DL’s success depends on well-labeled, high-capacity models trained with numerous examples, posing a significant challenge in the realm of neonatal AI applications.

Accurate labeling demands considerable physician effort and time, exacerbating the existing challenges. Unfortunately, there is a lack of large-scale collaboration between physicians and data scientists that could streamline data gathering, sharing, and labeling processes. Overcoming these challenges holds the promise of utilizing DL in prevention and diagnosis programs, fundamentally transforming clinical practice. Here, we explore the potential of DL in revolutionizing various imaging modalities within neonatology and child health.

The need for a massive volume of data is a significant impediment, especially with the increasing sophistication of DL architectures. However, collecting a substantial amount of clean, verified, and diverse data for various neonatal applications is challenging. Data augmentation techniques and building models with shallow networks are proposed solutions, though they may not be universally applicable. Additionally, issues arise in generalizing models to new data, especially considering variations in MRI contrasts, scanners, and sequences between institutions. Continuous learning strategies are suggested to address this challenge.

Most studies lack open-source algorithms and fail to clarify validation methods, introducing methodological bias. Reproducibility becomes a crucial concern for comparing algorithm success. Furthermore, the lack of explanations and reasoning in widely used DL models poses a risk, especially in high-stakes medical settings. Trustworthiness is imperative for the widespread adoption of AI in neonatology.

Collaboration efforts across multiple institutions face privacy concerns related to cross-site sharing of imaging data. Federated learning has been proposed to address privacy issues, but data heterogeneity may impact model efficacy. Ethical concerns, including informed consent, bias, safety, transparency, patient privacy, and allocation, add complexity to health AI. The implementation of an ethics framework in neonatology AI is yet to be reported.

Despite these challenges, the potential benefits of AI in healthcare are substantial, including increased speed, cost reduction, improved diagnostic accuracy, enhanced efficiency, and increased access to clinical information. AI’s impact on neonatal intensive care units and healthcare, while promising, requires clinicians’ support, emphasizing the need for collaboration between AI researchers and clinicians for successful implementation in neonatal care.”

Approaches Employed

“Review of Literature and Search Methodology
We conducted a comprehensive literature search using databases such as PubMed™, IEEEXplore™, Google Scholar™, and ScienceDirect™ to identify publications related to the applications of AI, ML, and DL in neonatology. Our search strategy involved various combinations of keywords, including technical terms (AI, DL, ML, CNN) and clinical terms (infant, neonate, prematurity, preterm infant, hypoxic ischemic encephalopathy, neonatology, intraventricular hemorrhage, infant brain segmentation, NICU mortality, infant morbidity, bronchopulmonary dysplasia, retinopathy of prematurity). The inclusion criteria were publications dated between 1996 and 2022, focusing on AI in neonatology, written in English, published in peer-reviewed journals, and objectively assessing AI applications in neonatology. Exclusions comprised review papers, commentaries, letters to the editor, purely technical studies without clinical context, animal studies, statistical models like linear regression, non-English language studies, dissertation theses, posters, biomarker prediction studies, simulation-based studies, studies involving infants older than 28 days, perinatal death studies, and obstetric care studies. The initial search yielded around 9000 articles, from which 987 relevant research papers were identified through careful abstract examination (Fig. 4). Ultimately, 106 studies meeting our criteria were selected for inclusion in this systematic review (see Supplementary file). The evaluation covered diverse aspects, including sample size, methodology, data types, evaluation metrics, as well as the strengths and limitations of the studies (Tables 2–7).

Availability of Data
Dr. E. Keles and Dr. U. Bagci have complete access to all study data, ensuring data integrity and accuracy. All study materials are accessible upon reasonable request from the corresponding author.”

Tag: current status

A comprehensive review exploring the historical developments, current status, and future prospects of neonatal intensive care units leveraging artificial intelligence: A systematic examination.