In the last two decades, technology has revolutionized every corner of business and daily life. What once required human effort โ entering data, copying files, verifying reports โ can now be handled automatically by software bots. These digital workers can perform tasks 24/7, without fatigue, errors, or complaints.
Robocorp
This revolution is driven by Robotic Process Automation (RPA) โ a technology that lets computers mimic human actions on digital systems. From banks processing thousands of transactions to hospitals handling patient records, RPA has become a foundation of modern business efficiency.
However, until recently, most RPA platforms were proprietary, expensive, and closed-source. Tools like UiPath, Blue Prism, and Automation Anywhere dominated the market but limited developer flexibility. That changed when Robocorp entered the scene โ offering an open-source, Python-based RPA platform that combines the power of traditional automation with the flexibility of programming.
This article explores that world โ what RPA is, how Robocorp works, how to build bots, and why open-source automation is shaping the future of intelligent work.
2. What Is RPA (Robotic Process Automation)?
Robotic Process Automation (RPA) refers to using software robots (โbotsโ) to automate repetitive, rule-based tasks that humans perform on computers.
Imagine you have an employee who logs into a website daily, downloads an Excel file, copies some numbers into another system, and sends a report by email. With RPA, you can train a bot to do that โ faster and error-free.
๐งฉ Key Concept: Mimicking Human Actions
Unlike traditional automation (which connects systems via APIs or backend logic), RPA operates at the user interface (UI) level โ just like a human would. Bots can:
Click buttons
Type into forms
Read data from PDFs
Copy and paste between applications
Interact with browsers, Excel, databases, and more
This makes RPA extremely valuable for legacy systems that lack APIs or modern integration options.
RPA reduces human error, speeds up processes, and allows employees to focus on creative or analytical work instead of repetitive tasks.
3. The Evolution of RPA: From Legacy Tools to Open Source
RPA started in the early 2000s with enterprise tools like Blue Prism. It offered a โvisual drag-and-dropโ interface to automate workflows without coding. Later came UiPath and Automation Anywhere, which added more sophistication, AI integrations, and enterprise-level orchestration.
However, these tools shared common limitations:
High licensing costs โ often priced per bot or per user.
Closed ecosystems โ developers couldnโt freely extend or integrate custom Python or JavaScript code.
Complex deployment โ requiring enterprise servers and administrators.
As Python became the most popular automation language, the developer community started exploring open-source RPA โ automation thatโs scriptable, portable, and transparent.
Thatโs where Robocorp entered.
4. What Is Robocorp?
Robocorp is a company and open-source platform that provides the tools to build, run, and manage software robots using Python and the Robot Framework ecosystem.
It bridges the gap between low-code RPA and traditional scripting โ giving developers a structured yet flexible environment for building robust automations.
๐ข A Brief History
Robocorp was founded in 2019 by Antti Karjalainen in Finland. The companyโs mission is simple:
โTo democratize RPA by making it open, developer-friendly, and affordable for everyone.โ
Since its launch, Robocorp has gained attention from both enterprises and developers because it combines:
Open-source tools (no licensing costs)
Python flexibility (build custom logic)
Cloud orchestration (Control Room for deployment)
5. Robocorpโs Ecosystem: The Core Components
Robocorp isnโt a single tool โ itโs a complete ecosystem of components that work together for end-to-end RPA development.
๐งฐ 1. Robocorp Code (VS Code Extension)
This is the official VS Code plugin that allows you to create, test, and debug robots directly in Visual Studio Code. It provides templates, syntax highlighting, and integrated Control Room connections.
๐ป 2. Robocorp Lab (Legacy)
Previously, Robocorp offered a standalone desktop IDE called Robocorp Lab. It has since been replaced by the more flexible Robocorp Code extension.
โ๏ธ 3. Control Room
This is Robocorpโs cloud orchestration platform. It lets you:
Deploy and schedule robots
Manage credentials and environments
Track logs and bot performance
Trigger bots via API or webhook
Think of it as the โmission controlโ for your digital workforce.
๐ฆ 4. RPA Framework Libraries
These are open-source Python libraries built specifically for automation. Examples include:
RPA.Browser.Playwright โ for browser automation
RPA.Excel.Files โ for Excel automation
RPA.PDF โ for extracting data from PDFs
RPA.Email.ImapSmtp โ for handling emails
Each library is modular and can be used directly in Python scripts or Robot Framework tasks.
6. Robocorp, Robot Framework, and RPA Framework โ How They Connect
One key strength of Robocorp is how it builds upon Robot Framework, an established open-source tool originally developed for test automation.
โ๏ธ Robot Framework
A generic automation framework written in Python.
Uses human-readable syntax like:
*** Tasks ***
Open Website
Open Browser https://example.com Chrome
Input Text username admin
Input Text password 1234
Click Button login
This made it ideal for business process automation as well.
๐ค RPA Framework
Robocorp extends Robot Framework with the RPA Framework โ a collection of Python libraries and tools built for RPA-specific tasks: file handling, email, browser control, Excel, PDF, etc.
tasks.robot โ main entry file (contains automation steps)
conda.yaml โ defines Python dependencies
robot.yaml โ describes task metadata for Control Room
libraries/ โ your custom Python code
output/ โ logs and reports
๐ง Execution Flow
Define the automation logic in .robot or .py files.
Run locally using Robocorp Code or the CLI.
Test and debug logs in output/log.html.
Upload to Control Room for scheduling and remote execution.
8. Setting Up Robocorp (Step-by-Step Guide for Beginners)
Letโs go through the setup process.
๐ช Step 1: Install VS Code and Robocorp Extension
Download Visual Studio Code and install the Robocorp Code extension from the marketplace.
๐ช Step 2: Install Python
Ensure Python 3.8+ is installed and added to your PATH.
๐ช Step 3: Create a New Robot Project
Open the Command Palette in VS Code โ choose โRobocorp: Create Robotโ โ select a template such as โBrowser Automation.โ
This creates a folder with a pre-configured robot.yaml, conda.yaml, and task file.
๐ช Step 4: Run Your Robot Locally
Use the VS Code โRun Robotโ button โ the output log appears in the terminal or browser.
๐ช Step 5: Connect to Control Room
Sign up at robocorp.com, create a workspace, and link your robot for cloud execution.
You can now schedule, trigger via API, and monitor execution remotely.
9. Example: Building Your First Robocorp RPA Bot
Letโs automate a simple business process โ extracting invoice data from emails and storing it in Excel.
๐งฉ Step 1: Define the Workflow
Connect to an email inbox.
Download PDF attachments.
Extract key data (invoice number, date, total).
Append the data to an Excel file.
๐ป Step 2: Example Code (Python)
from RPA.Email.ImapSmtp import ImapSmtp
from RPA.PDF import PDF
from RPA.Excel.Files import Files
# Connect to email
email = ImapSmtp()
email.connect("imap.gmail.com", "myemail@gmail.com", "mypassword")
# Search and download attachments
emails = email.list_messages(criteria="UNSEEN")
for msg in emails:
attachments = email.save_attachments(msg)
for file in attachments:
pdf = PDF()
data = pdf.get_text_from_pdf(file)
# extract fields from text (simplified)
print("Extracted:", data)
# Save to Excel
excel = Files()
excel.create_workbook("invoices.xlsx")
excel.append_rows_to_worksheet([["Invoice001", "2025-10-01", "$300"]])
excel.save_workbook()
โ๏ธ Step 3: Schedule in Control Room
Upload your bot to Control Room โ Create a process โ Set it to run daily โ Monitor output and logs online.
10. Why Developers Love Robocorp
๐ Python-based: Build custom logic easily.
๐ธ Free & open-source: No license fees.
๐งฉ Modular libraries: Reusable across projects.
โ๏ธ Cloud orchestration: Built-in scheduling and monitoring.
๐ง Integrates with AI: You can use Python ML/AI libraries for intelligent automation.
Robocorp bridges the gap between low-code RPA tools and true developer-grade automation frameworks.
11. Comparing Robocorp with Traditional RPA Tools
Feature
Robocorp
UiPath
Blue Prism
Automation Anywhere
Language
Python
Visual
Visual
Visual
Pricing
Free / Open Source
Expensive
Enterprise only
Enterprise only
Flexibility
Very High
Medium
Low
Medium
Deployment
Cloud or Local
Cloud/On-Prem
On-Prem
Cloud
Custom Code
Full Python Support
Limited
None
Partial
Community
Growing Fast
Mature
Moderate
Moderate
Robocorp stands out because it gives developers full freedom and businesses a cost-effective path to automation.
12. Advanced Capabilities in Robocorp RPA
After building your first robot, youโll quickly discover that Robocorp is not just about automating clicks and keystrokes. Its strength lies in deep Python integration and rich libraries that let you handle complex end-to-end processes.
12.1 Browser Automation with Playwright and Selenium
Robocorp supports both Playwright and Selenium for browser control. Playwright is faster, headless-friendly, and works across Chromium, Firefox, and WebKit. Example (Playwright):
Automation often involves manipulating spreadsheets or databases.
from RPA.Excel.Files import Files
from RPA.Database import Database
excel = Files()
excel.open_workbook("sales.xlsx")
data = excel.read_worksheet_as_table("Q1")
excel.close_workbook()
db = Database()
db.connect_to_database("sqlite", database="data.db")
for row in data:
db.query("INSERT INTO sales VALUES (?, ?, ?)", tuple(row.values()))
db.disconnect_from_database()
12.3 PDF Processing and Document Automation
Using RPA.PDF, you can extract structured data from invoices, contracts, or reports.
from RPA.PDF import PDF
pdf = PDF()
content = pdf.get_text_from_pdf("invoice.pdf")
print(content)
Combine this with regular expressions or AI OCR (Tesseract, Azure Vision) for document understanding.
12.4 Email and API Automation
Bots can read emails, send notifications, or call REST APIs.
from RPA.Email.ImapSmtp import ImapSmtp
from RPA.HTTP import HTTP
email = ImapSmtp()
email.connect("imap.gmail.com", "user@gmail.com", "password")
messages = email.list_messages(criteria="UNSEEN")
api = HTTP()
for msg in messages:
body = email.get_message_text(msg)
api.post("https://api.company.com/tickets", json={"message": body})
12.5 Integrating AI and Machine Learning
Because Robocorp runs pure Python, you can import transformers, scikit-learn, OpenAI, or TensorFlow directly. For example, sentiment analysis on incoming customer emails before routing them to agents.
from transformers import pipeline
sentiment = pipeline("sentiment-analysis")
result = sentiment("The delivery was late and support was unhelpful.")
print(result)
13. Real-World Use Cases of Robocorp RPA
13.1 Finance and Accounting
Invoice digitization and posting to ERP.
Bank statement reconciliation.
Automated expense approvals.
13.2 Human Resources and Payroll
Onboarding employees by creating accounts and sending welcome emails.
Generating monthly payslips from HR databases.
Updating compliance forms and tracking training completion.
13.3 Healthcare and Insurance
Extracting patient data from forms and uploading to EHR systems.
Automating claims validation.
Scheduling appointment reminders via email or SMS.
13.4 Customer Support Automation
Reading incoming emails and auto-creating tickets in Zendesk or Freshdesk.
Summarizing ticket content using AI and tagging priority levels.
13.5 IT and Operations
Monitoring server logs and creating alerts.
Resetting passwords or creating user accounts via API calls.
Regular back-ups and system health reports.
14. Best Practices for Developing Robocorp RPA Bots
14.1 Write Reusable Code
Organize your robots into modules and libraries. Example structure:
Integrate Git and GitHub or GitLab for tracking changes and collaboration. Each robot should have its own repository and CI/CD workflow.
14.3 Credentials and Security
Never hard-code passwords. Use Control Roomโs Vault or environment variables. Encrypt sensitive files and audit access regularly.
14.4 Error Handling and Logging
Every bot should log its actions and recover from failures.
try:
run_main_process()
except Exception as e:
logger.error(f"Process failed: {e}")
send_alert_email(str(e))
14.5 Testing and Debugging
Use unit tests for Python components.
Run robots locally before uploading to Control Room.
Review output/log.html for detailed traces.
14.6 Scalability and Performance
Schedule robots in parallel or on multiple workers to handle large volumes. Leverage Robocorpโs API to trigger bots on events rather than timers.
15. Challenges When Starting with Robocorp RPA
Learning curve: Requires basic Python knowledge.
UI changes: Bots can break if web elements change frequently.
Process selection: Not every task is worth automating โ focus on rule-based, repetitive ones.
Maintenance: Bots need monitoring and updates as business rules evolve.
Overcoming these challenges means building a culture of continuous automation and documentation.
16. Future of Robocorp and the RPA Industry
16.1 The Rise of Hyperautomation
RPA is no longer limited to rule-based tasks. The new wave โ Hyperautomation โ combines RPA with AI, machine learning, process mining, and analytics.
16.2 Robocorpโs AI Vision
Robocorp is working toward integrating AI assistants that help bots make decisions autonomously (e.g., classifying emails, understanding documents, detecting anomalies).
16.3 Open Source as the Standard
As enterprises seek transparency and cost control, open-source RPA like Robocorp will become mainstream. It gives developers freedom and companies ownership of their code.
16.4 Integration with Low-Code Platforms
Expect hybrid environments โ business users create workflows visually while developers extend them in Python for complex logic.
17. Educational Path for Learners and Teams
Start with Python fundamentals (variables, loops, modules).
Learn Robot Framework syntax for readable task files.
Certifications like Robocorp Developer Level I and II can help validate skills professionally.
18. How Businesses Can Adopt Robocorp Strategically
Identify process candidates: Look for high-volume, repetitive tasks.
Start small: Build a pilot automation to prove ROI.
Train staff: Empower developers with Python and RPA knowledge.
Scale gradually: Deploy Control Room and introduce governance.
Measure impact: Track time saved, error reduction, and cost efficiency.
19. Why Open-Source RPA Is the Future
Transparency: You own the code and data.
Cost Efficiency: No per-bot licenses.
Community Support: Continuous innovation through open libraries.
Integration: Easily connect to modern AI and API services.
Robocorp proves that automation can be both powerful and accessible.
20. Conclusion: Empowering the Next Generation of Digital Workers
Robocorp represents the future of automation โ an ecosystem where developers, business leaders, and students collaborate to build a digital workforce. By combining Pythonโs flexibility, open-source philosophy, and cloud orchestration, Robocorp enables organizations to create smart bots that scale with their growth.
Whether youโre a developer seeking technical depth, a manager seeking ROI, or a student exploring automation careers โ Robocorp RPA is your gateway to the automation revolution.
โThe real power of RPA is not replacing humans โ itโs freeing them to focus on what humans do best: thinking, creating, and innovating.โ
Machine learning and deep learning, integral components of artificial intelligence, involve the process of training computers to learn and make informed decisions based on diverse datasets. Notably, recent strides in artificial intelligence predominantly emanate from the realm of deep learning, proving transformative across various domains, ranging from computer vision to health sciences. The impact of deep learning on medical practices has been particularly groundbreaking, reshaping traditional approaches to clinical applications. While certain medical subfields, such as pediatrics, initially lagged in harnessing the substantial advantages of deep learning, there is a noteworthy accumulation of related research in pediatric applications.
This paper undertakes a comprehensive review of newly developed machine learning and deep learning solutions tailored for neonatology applications. The assessment systematically explores the roles played by both classical machine learning and deep learning in neonatology, elucidating methodologies, including algorithmic advancements. Furthermore, it delineates the persisting challenges in evaluating neonatal diseases, adhering to PRISMA 2020 guidelines. The predominant areas of focus within neonatology’s AI applications encompass survival analysis, neuroimaging, scrutiny of vital parameters and biosignals, and the diagnosis of retinopathy of prematurity.
Drawing on 106 research articles spanning from 1996 to 2022, this systematic review meticulously categorizes and discusses their respective merits and drawbacks. The overarching goal of this study is to augment comprehensiveness, delving into potential directions for novel AI models and envisioning the future landscape of neonatology as AI continues to assert its influence. The narrative concludes by proposing roadmaps for the seamless integration of AI into neonatal intensive care units, foreseeing transformative implications for the field.
Commencement
The ongoing surge of artificial intelligence (AI) is reshaping diverse sectors, healthcare included, in an ever-evolving landscape. The dynamic nature of AI applications makes it challenging to keep pace with the constant innovations. Despite the profound impact of AI on daily life, many healthcare practitioners, especially in less-explored domains like neonatology, may not fully grasp the extent of AI integration into contemporary healthcare systems. This review aims to bridge this awareness gap, specifically targeting physicians navigating the intricate intersection of AI and neonatology.
The roots of AI, particularly within machine learning (ML), can be traced back to the 1950s, when Alan Turing conceptualized the “learning machine” alongside early military applications of rudimentary AI. During this era, computers were colossal, and the costs associated with expanding storage were exorbitant, limiting their capabilities. Over the ensuing decades, incremental advancements in both theoretical frameworks and technological infrastructure progressively enhanced the potency and adaptability of ML.
Understanding the mechanics of ML and its subset, deep learning (DL), is fundamental. ML, as a subset of AI, garnered attention due to its adeptness in handling data. ML algorithms and models possess the ability to learn from data, scrutinize, assess, and formulate predictions or decisions based on acquired insights. DL, a specialized form of ML, draws inspiration from the human brain’s neural networks, replicating their functionality through artificial neurons in computer neural networks. The distinctive feature of DL lies in its hierarchical architecture, facilitating the autonomous extraction of features from data, a departure from the reliance on human-engineered features in conventional ML.
Distinguishing ML from DL hinges on the complexity of models and the scale of datasets they can manage. ML algorithms prove effective across a spectrum of tasks, exhibiting simplicity in training and deployment. Conversely, DL algorithms necessitate larger datasets and intricate models but excel in tasks involving high-dimensional, intricate data, automatically identifying significant aspects without predefined elements of interest. The non-linear activation functions within DL architectures, such as artificial neural networks (ANN), contribute to the learning of complex features representative of provided data samples.
ML and DL fall into categories such as supervised, unsupervised, or reinforcement learning based on the nature of the input-output relationship. Their applications span classification, regression, and clustering tasks. DL’s success is contingent on large-scale data availability, innovative optimization algorithms, and the accessibility of Graphics Processing Units (GPUs). These autonomous learning algorithms, mirroring human learning processes, position DL as a pioneering ML method, catalyzing substantial transformations across medical and technological domains, marking it as the driving force propelling contemporary AI advancements.
Hierarchical Representation of AI: Illustration depicting the hierarchical structure of artificial intelligence. Exploring the Mechanics of Machine Learning (ML) and Deep Learning (DL): ML is a subset of AI, while DL, in turn, is a subset of ML. Overcoming Persistent Challenges in the Application of AI to Healthcare: Addressing ongoing obstacles faced by AI in healthcare applications. Key Concerns Impacting AI Outcomes in Neonatology: Identifying crucial concerns associated with AI in neonatology, including challenges in clinical interpretability, knowledge gaps in decision-making mechanisms necessitating human-in-the-loop systems, ethical considerations, limitations in data and annotations, and the absence of secure Cloud systems for data sharing and privacy.
In the field of deep learning (DL) applied to medical imaging, three primary problem categories exist: image segmentation, object detection (identifying organs or other anatomical/pathological entities), and image classification (e.g., diagnosis, prognosis, therapy response assessment)3. Numerous DL algorithms are commonly utilized in medical research, falling into specific algorithmic families:
Convolutional Neural Networks (CNNs): Mainly applied in computer vision and signal processing tasks, CNNs excel in tasks involving fixed spatial relationships, such as imaging data. The architecture comprises phases (layers) facilitating the acquisition of hierarchical features. Initial phases extract local features (corners, edges, lines), while subsequent phases extract more global features. Feature propagation involves nonlinearities and regularizations, enriching feature representation. Pooling operations reduce feature size, and the resulting features are employed for predictions (segmentation, detection, or classification)3,16.
Recurrent Neural Networks (RNNs): Tailored for retaining sequential data like text, speech, and time-series data (e.g., clinical or electronic health records). RNNs capture temporal relationships, aiding in predicting disease progression or treatment outcomes11,17,18. Long Short-Term Memory (LSTM) models, a subtype of RNNs, address limitations by learning long-term dependencies more effectively. LSTMs use a gated memory cell to store information, allowing them to learn complex patterns, particularly useful in audio classification17,19.
Generative Adversarial Networks (GANs): A DL model class used for generating new data resembling existing datasets. In healthcare, GANs create synthetic medical images using two CNNs (generator and discriminator). The generator produces synthetic images mimicking real ones, while the discriminator identifies artificially generated images. Adversarial training ensures the generator generates realistic data. GANs are versatile, employed for tasks like image enhancement, signal reconstruction, classification, and segmentation20,21,22.
Transfer Learning (TL): Rooted in cognitive science, TL minimizes annotation needs by transferring knowledge from pretrained models. However, using ImageNet pre-trained models for medical image classification can be inefficient due to differences between natural images and medical images25,26. Fine-tuning more layers in CNNs may improve accuracy, as the initial layers of ImageNet-pretrained networks may not efficiently detect low-level characteristics in medical images25,26.
Advanced DL Algorithms: Evolving daily, new methods such as Capsule Networks, Attention Mechanisms, and Graph Neural Networks (GNNs)27,28,29,30 are employed for imaging and non-imaging data analysis. Capsule Networks address CNN shortcomings, Attention Mechanisms enhance context understanding, and GNNs exhibit potential in both imaging and non-imaging data analysis28,34.
In the imaging field, both data-driven and physics-driven systems are essential. While DL methods show effectiveness, challenges persist, particularly in MRI construction where data-driven and physics-driven algorithms are employed based on the impracticality of acquiring fully sampled datasets.
The concept of Hybrid Intelligence, combining AI with human intellect, offers a promising collaborative approach. AI processes extensive data rapidly, while human expertise contributes context and intuition, leading to more precise decision-making. Hybrid intelligence systems hold promise for time-consuming tasks in healthcare and neonatology.
In the current landscape, AI in medicine has been in use for over a decade, with advancements in algorithms and hardware technologies contributing to its potential. Challenges remain as healthcare utilization increases, particularly in integrating AI into daily clinical practice.
Clinicians seek enhanced diagnostic tools from AI, anticipating reduced invasive tests and increased diagnostic accuracy. This systematic review aims to explore AI’s potential role in neonatology, envisioning a future where hybrid intelligence optimizes neonatal care. The study outlines AI applications, evaluation metrics, and challenges in neonatology, providing a comprehensive overview. The paper’s objectives encompass thorough explanations of AI models, categorization of neonatology-related applications, and discussions of challenges and future research directions.
To elucidate a comprehensive understanding, the objectives are:
Provide a detailed explanation of various AI models and thoroughly articulate evaluation metrics, elucidating the key features inherent in these models.
Categorize AI applications pertinent to neonatology into overarching macro-domains, expounding on their respective sub-domains and highlighting crucial aspects of the applicable AI models.
Scrutinize the contemporary landscape of studies, especially those in recent years, with a specific focus on the widespread utilization of machine learning (ML) across the entire spectrum of neonatology.
Furnish an exhaustive and well-structured overview, including a classification of Deep Learning (DL) applications deployed within the realm of neonatology.
Assess and engage in a discourse concerning prevailing challenges linked to AI implementation in neonatology, along with contemplating future avenues for research. This aims to provide clinicians with a comprehensive perspective on the current scenario.
Presented here is an outline of our paper’s structure and goals: Elaborating on AI Models and Evaluation Metrics Assessing Studies Applying ML in Neonatology Assessing Studies Applying DL in Neonatology Scrutinizing Challenges and Mapping Future Directions.
I encompass a comprehensive concept that involves the application of computational algorithms capable of categorizing, predicting, or deriving valuable conclusions from vast datasets. Over the past three decades, various algorithms, including Naive Bayes, Genetic Algorithms, Fuzzy Logic, Clustering, Neural Networks (NN), Support Vector Machines (SVM), Decision Trees, and Random Forests (RF), have been utilized for tasks such as detection, diagnosis, classification, and risk assessment in the field of medicine. Traditional machine learning (ML) methods often incorporate hand-engineered features, which consist of visual descriptions and annotations learned from radiologists and are encoded into algorithms for image classification.
Medical data encompasses diverse unstructured sources such as images, signals, genetic expressions, electronic health records (EHR), and vital signs (refer to Fig. 3). The intricate structures of these data types allow deep learning (DL) frameworks to leverage their heterogeneity, achieving high levels of abstraction in data analysis.
Diverse medical information, encompassing unstructured data like medical images, vital signals, genetic expressions, electronic health records (EHRs), and signal data, contributes to a broad spectrum of healthcare insights. Effectively analyzing and interpreting various data streams in neonatology demands a comprehensive strategy, given the distinctive characteristics and complexities inherent in each type of data.
While machine learning (ML) necessitates manual selection and crafted transformation of information from incoming data, deep learning (DL) executes these tasks more efficiently and with heightened efficacy. DL achieves this by autonomously uncovering components through the analysis of a large number of samples, a process that is highly automated. The literature extensively covers ML approaches predating the advent of DL.
For clinicians, understanding how the recommended ML model enhances patient care is crucial. Given that a single metric cannot encapsulate all desirable attributes, it is customary to describe a model’s performance using various metrics. Unfortunately, end-users often struggle to comprehend these measurements, making it challenging to objectively compare models across different research endeavors. Currently, there is no available method or tool to compare models based on the same performance measures. This section elucidates common ML and DL evaluation metrics to empower neonatologists in adapting them to their research and understanding upcoming articles and research design.
The widespread application of artificial intelligence (AI) spans daily life to high-risk medical scenarios. In neonatology, the incorporation of AI has been a gradual process, with numerous studies emerging in the literature. These studies leverage various imaging modalities, electronic health records, and ML algorithms, some of which are still in the early stages of integration into clinical workflows. Despite the absence of specific systematic reviews and future discussions in this field, several studies have focused on introducing AI systems to neonatology. However, their success has been limited. Recent advancements in DL have, however, shifted research in this field in a more promising direction. Evaluation metrics in these studies commonly include standard measures such as sensitivity (true-positive rate), specificity (true-negative rate), false-positive rate, false-negative rate, receiver operating characteristics (ROC), area under the ROC curves (AUC), and accuracy (Table 1).
Term
Definition
True Positive (TP)
The number of positive samples correctly identified.
True Negative (TN)
The number of samples accurately identified as negative.
False Positive (FP)
The number of samples incorrectly identified as positive.
False Negative (FN)
The number of samples incorrectly identified as negative.
Accuracy (ACC)
The proportion of correctly identified samples to the total sample count in the assessment dataset. The accuracy is limited to the range [0, 1], where 1 represents properly predicting all positive and negative samples, and 0 represents successfully predicting none of the positive or negative samples.
Recall (REC)
Also known as sensitivity or True Positive Rate (TPR), it is the proportion of correctly categorized positive samples to all samples allocated to the positive class. Computed as the ratio of correctly classified positive samples to all samples assigned to the positive class.
Specificity (SPEC)
The negative class form of recall (sensitivity) and reflects the proportion of properly categorized negative samples.
Precision (PREC)
The ratio of correctly classified samples to all samples assigned to the class.
Positive Predictive Value (PPV)
The proportion of correctly classified positive samples to all positive samples.
Negative Predictive Value (NPV)
The ratio of samples accurately identified as negative to all samples classified as negative.
F1 score (F1)
The harmonic mean of precision and recall, eliminating excessive levels of either.
Cross Validation
A validation technique often employed during the training phase of modeling, without duplication among validation components.
AUROC (Area under ROC curve – AUC)
A function representing the effect of various sensitivities (true-positive rate) on the false-positive rate. Limited to the range [0, 1], where 1 represents properly predicting all cases and 0 represents predicting none of the cases.
ROC
By displaying the effect of variable levels of sensitivity on specificity, it is possible to create a curve illustrating the performance of a particular predictive algorithm.
Overfitting
Modeling failure indicating extensive training and poor performance on tests.
Underfitting
Modeling failure indicating inadequate training and inadequate test performance.
Dice Similarity Coefficient
Used for image analysis. Limited to the range [0, 1], where 1 represents proper segmentation of all images and 0 represents successfully segmenting none of the images.
Results
This systematic review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol56. The search process was concluded on July 11, 2022. Initially, a substantial number of articles (approximately 9000) were identified, and a systematic approach was employed to screen and select pertinent articles based on their alignment with the research focus, study design, and relevance to the topic. After reviewing article abstracts, 987 studies were identified. Ultimately, our search resulted in the inclusion of 106 research articles spanning the years 1996 to 2022 (Fig. 4). The risk of bias was assessed using the QUADAS-2 tool.
The preliminary research, conducted on July 11, 2022, identified a total of 9000 articles, out of which 987 article abstracts underwent screening. Following this screening process, 106 research articles published between 1996 and 2022 met the criteria for inclusion in this systematic review. For a more detailed depiction of the study selection process, refer to the PRISMA flow diagram.The QUADAS-2 tool was utilized for a comprehensive analysis of bias risk.
Our discoveries are condensed into two sets of tables: Tables 2โ5 outline AI techniques from the pre-deep learning era (“Pre-DL Era”) in neonatal intensive care units, categorized by the type of data and applications. Conversely, Tables 6 and 7 encompass studies from the DL Era, focusing on applications such as classification (prediction and diagnosis), detection (localization), and segmentation (pixel-level classification in medical images).
Study
Approach
Purpose
Dataset
Type of Data
Performance
Pros(+)
Cons(-)
Hoshino et al., 2017[^194^]
CLAFIC, logistic regression analysis
To determine optimal color parameters predicting Biliary atresia (BA)
Stools
50 neonates
30 BA and 34 non-BA images
100% (AUC)
+ Effective and convenient modality for early detection of BA, and potentially for other related diseases
Dong et al., 2021[^195^]
Level Set algorithm
To evaluate postoperative enteral nutrition of neonatal high intestinal obstruction and analyze clinical treatment effect
60 neonates
CT images
84.7% (accuracy)
+ Segmentation algorithm can accurately segment the CT image, displaying the disease location and its contour more clearly.
– EHR (not included AI analysis) – Small sample size – Retrospective design
Ball et al., 2015[^90^]
Random Forest (RF)
To compare whole-brain functional connectivity in preterm newborns with healthy term-born neonates
105 preterm infants and 26 term controls
Resting state functional MRI and T2-weighted Brain MRI
80% (accuracy)
+ Prospective + Connectivity differences between term and preterm brain
– Not well-established model
Smyser et al., 2016[^88^]
Support vector machine (SVM)-multivariate pattern analysis (MVPA)
To compare resting state-activity of preterm-born infants to term infants
50 preterm infants and 50 term-born control infants
Functional MRI data + Clinical variables
84% (accuracy)
+ Prospective GA at birth used as an indicator of the degree of disruption of brain development + Optimal methods for rs-fMRI data acquisition and preprocessing not rigorously defined
– Small sample size
Zimmer et al., 2017[^93^]
NAF: Neighborhood approximation forest classifier of forests
To reduce the complexity of heterogeneous data population, manifold learning techniques are applied
111 infants (NC, 70 subjects), affected by IUGR (27 subjects) or VM (14 subjects)
3 T brain MRI
80% (accuracy)
+ Combining multiple distances related to the condition improves overall characterization and classification of the three clinical groups (Normal, IUGR, Ventriculomegaly)
– Lack of neonatal data due to challenges during acquisition and data accessibility – Small sample size
+ Inhibited brain development controlled by genetic variables, and PPARG signaling plays a previously unknown cerebral function
– Further work required to characterize the exact relationship between PPARG and preterm brain development
Chiarelli et al., 2019[^91^]
Multivariate statistical analysis
To better understand the effect of prematurity on brain structure and function
88 newborns
3 Tesla BOLD and anatomical brain MRI + Few clinical variables
– Multivariate analysis using motion information could not significantly infer GA at birth + Prematurity associated with bidirectional alterations of functional connectivity and regional volume
– Retrospective design – Small sample size
Song et al., 2017[^94^]
Fuzzy nonlinear support vector machines (SVM)
Neonatal brain tissue segmentation in clinical magnetic resonance (MR) images
+ Nonparametric modeling adapts to spatial variability in intensity statistics arising from variations in brain structure and image inhomogeneity + Produces reasonable segmentations even in the absence of atlas prior
– Small sample size
Taylor et al., 2017[^137^]
Machine Learning
Technology that uses a smartphone application for effectively screening newborns for jaundice
530 newborns
Paired BiliCam images + Total serum bilirubin (TSB) levels
High-risk zone TSB level was 95% for BiliCam and 92% for TcB (P = 0.30); for identifying newborns with a TSB level of โฅ17.0, AUCs were 99% and 95%, respectively (P =0.09).
+ Inexpensive technology that uses commodity smartphones for effective jaundice screening + Multicenter data + Prospective design
– Method and algorithm name not explained
Ataer-Cansizoglu et al., 2015[^134^]
Gaussian Mixture Models
i-ROP To develop a novel computer-based image analysis system for grading plus diseases in ROP
77 wide-angle retinal images
95% (accuracy)
+ Arterial and venous tortuosity (combined), and a large circular cropped image provided the highest diagnostic accuracy + Comparable to the performance of individual experts
– Used manually segmented images with a tracing algorithm to avoid possible noise and bias – Low clinical applicability
Rani et al., 2016[^133^]
Back Propagation Neural Networks
To classify ROP
64 RGB images of these stages taken by RetCam with 120 degrees field of view and size of 640 ร 480 pixels
90.6% (accuracy)
– No clinical information – Requires better segmentation – Clinical adaptation
Karayiannis et al., 2006[^101^]
Artificial Neural Networks (ANN)
To aim at the development of a seizure-detection system
54 patients + 240 video segments
Each of the training and testing sets contained 120 video segments (40 segments of myoclonic seizures, 40 segments of focal clonic seizures, and 40 segments of random movements
96.8% (sensitivity) 97.8% (specificity)
+ Video analysis
– Not capable of detecting neonatal seizures with subtle clinical manifestations (Subclinical seizures) or neonatal seizures with no clinical manifestations (electrical-only seizures – No EEG analysis – Small sample size – No additional clinical information
[^194^]: Hoshino et al., 2017 [^195^]: Dong et al., 2021 [^90^]: Ball et al., 2015 [^88^]: Smyser et al., 2016 [^93^]: Zimmer et al., 2017 [^100^]: Krishnan et al., 2017 [^91^]: Chiarelli et al., 2019 [^94^]: Song et al., 2017 [^137^]: Taylor et al., 2017 [^134^]: Ataer-Cansizoglu et al., 2015 [^133^]: Rani et al
Study
Approach
Purpose
Dataset
Type of data
Performance
Pros(+)
Cons(-)
Reed et al., 1996135
Recognition-based reasoning
Diagnosis of congenital heart defects
53 patients
Patient history, physical exam, blood tests, cardiac auscultation, X-ray, and EKG data
+ Useful in multiple defects
– Small sample size, Not real AI implementation
Aucouturier et al., 2011148
Hidden Markov model architecture (SVM, GMM)
Identify expiratory and inspiration phases from the audio recording of human baby cries
14 infants, spanning four vocalization contexts in their first 12 months
Voice record
86%-95% (accuracy)
+ Quantify expiration duration, crying rate, and other time-related characteristics for screening, diagnosis, and research
– More data needed, No clinical explanation, Small sample size, Required preprocessing
Cano Ortiz et al., 2004149
Artificial neural networks (ANN)
Detect CNS diseases in infant cry
35 neonates, nineteen healthy cases and sixteen sick neonates
Voice record (187 patterns)
85% (accuracy)
+ Preliminary result
– More data needed for correct classification
Hsu et al., 2010151
Support Vector Machine (SVM) Service-Oriented Architecture (SOA)
Diagnose Methylmalonic Acidemia (MMA)
360 newborn samples
Metabolic substances data collected from tandem mass spectrometry (MS/MS)
96.8% (accuracy)
+ Better sensitivity than classical screening methods
– Small sample size, SVM pilot stage education not integrated
Baumgartner et al., 2004152
Logistic regression analysis (LRA), Support vector machines (SVM), Artificial neural networks (ANN), Decision trees (DT), k-nearest neighbor classifier (k-NN)
Focus on phenylketonuria (PKU), medium chain acyl-CoA dehydrogenase deficiency (MCADD)
Bavarian newborn screening program all newborns
Metabolic substances data collected from tandem mass spectrometry (MS/MS)
99.5% (accuracy)
+ ML techniques delivered high predictive power
– Lacking direct interpretation of knowledge representation
Chen et al., 2013153
Support vector machine (SVM)
Diagnose phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency
347,312 infants (220 metabolic disease suspect)
Newborn dried blood samples
99.9% (accuracy) for each condition
+ Reduced false positive cases
– Feature selection strategies did not include total features
Temko et al., 2011105
Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method
Measure system performance for neonatal seizure detection using EEG
17 newborns
267 hours clinical dataset
89% (AUC)
+ SVM-based system assists clinical staff in interpreting EEG
– No clinical variable, Difficult to obtain large datasets
Temko et al., 2012104
SVM
Use recent advances in clinical understanding of seizure burden in neonates with hypoxic ischemic encephalopathy to improve automated detection
17 HIE patients
816.7 hours EEG recordings
96.7% (AUC)
+ Improved seizure detection
– Small sample size, No clinical information
Temko et al., 2013115
Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method
Validate robustness of Temko 2011105
Trained in 38 term neonates, Tested in 51 neonates
Trained in 479 hours EEG recording, Tested in 2540 hours
96.1% (AUC), Correct detection of seizure burden 70%
– Small sample size, No clinical information
Stevenson et al., 2013116
Multiclass linear classifier
Automatically grade one-hour EEG epoch
54 full term neonates
One-hour-long EEG recordings
77.8% (accuracy)
+ Involvement of clinical expert, Method explained in detail
– Retrospective design
Ahmed et al., 2016114
Gaussian mixture model, Universal Background Model (UBM), SVM
Grade hypoxicโischemic encephalopathy (HIE) severity using EEG
54 full term neonates
One-hour-long EEG recordings
87% (accuracy)
+ Significant assistance to healthcare professionals
– Retrospective design
Mathieson et al., 2016103
Robusted Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method115
Validate Temko 2013115
70 babies from 2 centers (35 Seizure, 35 Non-Seizure)
Seizure detection algorithm thresholds clinically acceptable range
Detection rates 52.5%โ75%
+ Clinical information and Cohen score added
– Retrospective design
Mathieson et al., 2016198
Support Vector Machine (SVM) classifier leave-one-out (LOO) cross-validation method.105
Analyze Seizure detection Algorithm and characterize false negative seizures
20 babies (10 seizure -10 non-seizure) (20 of 70 babies)103
137 preterm 1.5-Tesla MRI Bayley-III Scales of Toddler Development at 3 years
Image
Show the GCNโs superior prediction accuracy compared to state-of-the-art methods
+The first study that uses GCN on brain surface meshes to predict neonatal brain age
-No clinical information
Hyun et al., 2016155
NLP and CNN (AlexNet and VGG16)
Classifying and annotating neonatal brain ultrasound scans using NLP and CNN
2372 de-identified NS reports 11,205 NS head images
Image
87% (AUC)
+Automated labeling
-No clinical variable
Kim et al., 2022157
CNN (VGG16)
Transfer learning
Assess whether a CNN can be trained via transfer learning to diagnose germinal matrix hemorrhage on head ultrasound
400 head ultrasounds (200 with GMH, 200 without hemorrhage)
Image
92% (AUC)
+First study to evaluate GMH with grade and saliency map +Not confirmed with MRI or labeling by radiologists
Li et al., 2021159
ResU-Net
Diffuse white matter abnormality (DWMA) on VPIโs MR images at term-equivalent age
98 VPI 28 VPI 3 Tesla Brain MRI T1 and T2 weighted
Image
87.7% (Dice Score), 92.3% (accuracy)
+Developed to segment diffuse white matter abnormality on T2-weighted brain MR images of very preterm infants +3D ResU-Net model achieved better DWMA segmentation performance than multiple peer deep learning models
-Small sample size -Limited clinical information
Greenbury et al., 2021170
Agnostic, unsupervised ML Dirichlet Process Gaussian Mixture Model (DPGMM)
Understanding nutritional practice in neonatal intensive care
45,679 patients over a six-year period in the UK National Neonatal Research Database (NNRD) EHR
Non-Image
Clustering on time analysis on daily nutritional intakes for extremely preterm infants born <32 weeks gestation
+Identifying relationships between nutritional practice and exploring associations with outcomes +Large national multi-center dataset
-Strong likelihood of multiple interactions between nutritional components that could be utilized in records
Ervural et al., 2021192
CNN Data augmentation
Detect respiratory abnormalities of neonates using limited thermal images
+CNN model and data enhancement methods used to determine respiratory system anomalies in neonates
-Small sample size -No follow-up and no clinical information
Wang et al., 2018174
DCNN
Classify and grade retinal hemorrhage automatically
3770 newborns with retinal hemorrhage and normal controls 48,996 digital fundus images
Image
97.85% to 99.96% (accuracy), 98.9% to 100% (AUC)
+First study to show that a DCNN can detect and grade neonatal retinal hemorrhage
–
Brown et al., 2018171
DCNN
Develop and test an algorithm to diagnose plus disease from retinal photographs
5511 retinal photographs (trained) and independent set of 100 images Retinal images
Image
94% (AUC), 98% (AUC)
+Outperforming 6 of 8 ROP experts +Completely automated algorithm detected plus disease in ROP with the same or greater accuracy as human doctors
-Disease detection, monitoring, and prognosis in ROP-prone neonates -No clinical information and no clinical variables
**Wang et al.,
Machine Learning Utilizations in Neonatal Mortality: A Comprehensive Overview
Neonatal mortality stands as a significant contributor to overall child mortality, representing 47 percent of deaths in children under the age of five, according to the World Health Organization60. The imperative to reduce global infant mortality by 203061 underscores the urgency of addressing this issue.
Machine Learning (ML) has been applied to investigate infant mortality, its determinants, and predictive modeling62,63,64,65,66,67,68. A recent study enrolled 1.26 million infants, predicting mortality as early as 5 minutes and as late as 7 days using an array of models, predominantly neural networks, random forests, and logistic regression (58.3%)67. While several studies reported favorable results, including AUC ranging from 58.3% to 97.0%, challenges such as small sample sizes and lack of dynamic parameter representation hindered broader clinical applicability67. Notably, gestational age, birth weight, and APGAR scores emerged as pivotal variables64,72. Future research recommendations emphasize external validation, calibration, and integration into healthcare practices67.
Neonatal sepsis, encompassing early and late onset, remains a formidable challenge in neonatal care, prompting ML applications for early detection. Studies predicted early sepsis using heart rate variability and clinical biomarkers with accuracies ranging from 64% to 94%74,75.
Advancements in neonatal healthcare have reduced severe prenatal brain injury incidence but underscored the need to predict neurodevelopmental outcomes. ML methods, including brain segmentation, connectivity analysis, and neurocognitive evaluations, have been employed to address this. Additionally, ML aids in neuromonitorization, such as automatic seizure detection from EEG and analyzing EEG biosignals in infants with hypoxic-ischemic encephalopathy (HIE)104,105,106,107,108.
ML applications extend to predicting complications in preterm infants, including Patent Ductus Arteriosus (PDA), Bronchopulmonary Dysplasia (BPD), and Retinopathy of Prematurity (ROP). PDA detection from electronic health records (EHR) and auscultation records demonstrated accuracies of 76% and 74%, respectively123,124. ML studies predicted BPD with accuracies up to 86%, and other research aimed to predict complications related to long-term invasive ventilation128,129,130.
ROP, a leading cause of childhood blindness, has seen ML applications in diagnosing and classifying from retinal fundus images, with systems achieving up to 95% accuracy132,133,134.
ML also finds utility in the diagnosis of various neonatal diseases, utilizing EHR and medical records for conditions like congenital heart defects, HIE, IVH, neonatal jaundice, NEC, and predicting rehospitalization135,136,84,85,137,138,139,142,143.
Furthermore, ML has been applied to analyze electronically captured physiologic data for artifact detection, late-onset sepsis prediction, and overall morbidity evaluation144,145,146.
In addressing metabolic disorders of newborns, ML methods, especially Support Vector Machines (SVM), have been employed for conditions like methylmalonic acidemia (MMA), phenylketonuria (PKU), and medium-chain acyl CoA dehydrogenase deficiency (MCADD)151,152,153. Notably, ML contributes to improving the positive predictive value in newborn screening programs for these disorders152.
In summary, ML applications span a wide spectrum in neonatal care, from predicting mortality and sepsis to assessing neurodevelopmental outcomes and complications in preterm infants, showcasing its diverse and impactful role in improving neonatal healthcare.
Deep Learning Advancements in Neonatology
Deep Learning in clinical image analysis serves three primary purposes: classification, detection, and segmentation. Classification focuses on identifying specific features in an image, detection involves locating multiple features within an image, and segmentation entails dividing an image into multiple parts7,9,154,155,156,157,158,159,160.
AI-Enhanced Neuroradiological Assessment in Neonatology
Neonatal neuroimaging plays a crucial role in identifying early signs of neurodevelopmental abnormalities, enabling timely intervention during a period of heightened neuroplasticity and rapid cognitive and motor development. The application of Deep Learning (DL) methods enhances the diagnostic process, providing earlier insights than traditional clinical signs would indicate.
However, imaging an infant’s brain using Magnetic Resonance Imaging (MRI) poses challenges due to lower tissue contrast, regional heterogeneity, age-related intensity variations, and the impact of partial volume effects. To address these issues, specialized computational neuroanatomy tools tailored for infant-specific MRI data are under development. The typical pipeline for predicting neurodevelopmental disorders from infant structural MRI involves image preprocessing, tissue segmentation, surface reconstruction, and feature extraction, followed by AI model training and prediction.
Segmenting a newborn’s brain is particularly challenging due to decreased signal-to-noise ratio, motion restrictions, and the smaller brain size. Various non-DL-based approaches, including parametric, classification, multi-atlas fusion, and deformable models, have been proposed for newborn brain segmentation. The evaluation metric, Dice Similarity Coefficient, measures segmentation accuracy.
The future of neonatal brain segmentation research involves developing more sophisticated neural segmentation networks. Despite advancements, the slow progress in the field of artificial intelligence in neonatology is attributed to a lack of open-source algorithms and limited datasets.
Further research should focus on enhancing DL accuracy in diagnosing conditions such as germinal matrix hemorrhage. Comparisons between DL and sonographers in identifying suspicious studies, grading hemorrhages accurately, and improving diagnostic capabilities of head ultrasound in diverse clinical scenarios warrant attention.
The evaluation of prematurity complications using DL in neonatology encompasses various applications, including disease prediction, MR image analysis, combined EHR data analysis, and predicting neurocognitive outcomes and mortality. DL proves effective in detecting conditions like PDA, IVH, BPD, ROP, and retinal hemorrhage. Additionally, DL contributes to treatment planning, NICU discharge, personalized medicine, and follow-up care.
DL’s potential in ROP screening programs is notable, offering cost-effective solutions for detecting severe cases that require therapy. Studies show DL outperforming experts in diagnosing plus disease and quantifying the clinical progression of ROP. DL applications extend to sleep protection in the NICU, real-time evaluation of cardiac MRI for congenital heart disease, classification of brain dysmaturation from neonatal brain MRI, and disease classification from thermal images.
Two groundbreaking studies emphasize the impact of DL on nutrition practices in NICU and the use of wireless sensors. ML techniques, unbiased and data-driven, showcase the potential to bring about clinical practice changes and improve monitoring, preventing iatrogenic injuries in neonatal care.
Discussion
The studies in neonatology involving AI were systematically categorized based on three primary criteria:
(i) Whether the studies utilized Machine Learning (ML) or Deep Learning (DL) methods, (ii) Whether imaging data or non-imaging data were employed in the studies, and (iii) According to the primary aim of the study, whether it focused on diagnosis or other predictive aspects.
In the pre-Deep Learning era, the majority of neonatology studies were conducted using ML methods. Specifically, we identified 12 studies that utilized ML along with imaging data for diagnostic purposes. Furthermore, 33 studies employed non-imaging data for diagnostic applications. The spectrum of imaging data studies covered diverse areas such as biliary atresia (BA) diagnosis based on stool color, postoperative enteral nutrition for neonatal high intestinal obstruction, functional brain connectivity in preterm infants, retinopathy of prematurity (ROP) diagnosis, neonatal seizure detection from video records, and newborn jaundice screening. Non-imaging studies for diagnosis included congenital heart defects diagnosis, baby cry analysis, inborn metabolic disorder diagnosis and screening, hypoxic-ischemic encephalopathy (HIE) grading, EEG analysis, patent ductus arteriosus (PDA) diagnosis, vital sign analysis, artifact detection, extubation and weaning analysis, and bronchopulmonary dysplasia (BPD) diagnosis.
In contrast, studies involving Deep Learning applications were less prevalent compared to Machine Learning. DL studies focused on brain segmentation, intraventricular hemorrhage (IVH) diagnosis, EEG analysis, neurocognitive outcome prediction, and PDA and ROP diagnosis. While the DL field is expected to witness increased research in upcoming articles, it is noteworthy that there have been several articles and studies on the application of AI in neonatology. However, many of these lack sufficient details, making it challenging to evaluate and compare them comprehensively, thus limiting their utility for clinicians.
Several limitations exist in the application of AI in neonatology, including the absence of prospective design, challenges in clinical integration, small sample sizes, and evaluations limited to single centers. DL has demonstrated potential in extracting information from clinical images, bioscience, and biosignals, as well as integrating unstructured and structured data in Electronic Health Records (EHR). However, key concerns related to DL in medicine include difficulties in clinical integration, the need for expertise in decision mechanisms, lack of data and annotations, insufficient explanations and reasoning capabilities, limited collaboration efforts across institutions, and ethical considerations. These challenges collectively impact the success of DL in the medical domain and are categorized into six components for further examination.
Challenges in Integrating Clinical Practices
“Challenges in Integrating AI into Neonatal Healthcare: A Perspective on Clinical Trials
Despite the significant advancements in AI accuracy within the healthcare domain, translating these achievements into practical treatment pathways faces multiple hurdles. One major concern among physicians is the lack of well-established randomized clinical trials, particularly in pediatrics, demonstrating the reliability and enhanced effectiveness of AI systems compared to traditional methods for diagnosing neonatal diseases and recommending suitable therapies. Comprehensive discussions on the pros and cons of such studies are presented in tables and relevant sections. Current research predominantly focuses on imaging-based or signal-based investigations, often centered around a specific variable or disease. Neonatologists and pediatricians express the need for evidence-based algorithms with proven efficacy. Remarkably, there are only six prospective clinical trials in neonatology involving AI. One notable trial, supported by the European Union Cost Program, explores the detection of neonatal seizures using conventional EEG in the NICU197. Another study investigates the physiological effects of music in premature infants208, though it does not employ AI analysis. A recent trial, “Rebooting Infant Pain Assessment: Using Machine Learning to Exponentially Improve Neonatal Intensive Care Unit Practice (BabyAI),” is currently recruiting209. Another ongoing study aims to collect real-time data on pain signals in non-verbal infants using sensor-fusion and machine learning algorithms210. However, no results have been submitted yet. Similarly, the “Prediction of Extubation Readiness in Extreme Preterm Infants by the Automated Analysis of Cardiorespiratory Behavior: APEX study” completed recruitment with 266 infants, but results are pending211. In summary, there is a notable scarcity of prospective multicenter randomized AI studies with published results in the neonatology field. Addressing this gap requires planning clinically integrated prospective studies that incorporate real-time data collection, considering the rapidly changing clinical circumstances of infants and the inclusion of multimodal data with both imaging and non-imaging components.”
Requisite Expertise in Decision-Making Mechanisms
“In the realm of neonatology, considering whether to heed a system’s recommendation may necessitate the presentation of corroborative evidence95,96,125,202. Many proposed AI solutions in the medical domain are not meant to supplant the decision-making or expertise of physicians but rather serve as valuable aids. In the challenging landscape of neonatal survival without sequelae, AI could revolutionize neonatology. The diverse spectrum of neonatal diseases, coupled with varying clinical presentations based on gestational age and postnatal age, complicates accurate diagnoses for neonatologists. AI holds the potential for early disease detection, offering crucial assistance to clinicians for prompt responses and favorable therapeutic outcomes.
Neonatology involves collaborative efforts across multiple disciplines in patient management, presenting an opportunity for AI to achieve unprecedented levels of efficacy. With increased resources and support from physicians, AI could make significant contributions to neonatology. Collaboration extends to various pediatric specialties, such as perinatology, pediatric surgery, radiology, pediatric cardiology, pediatric neurology, pediatric infectious disease, neurosurgery, cardiovascular surgery, and other subspecialties. These multidisciplinary workflows, involving patient follow-up and family engagement, could benefit from AI-based predictive analysis tools to address potential risks and neurological issues. AI-supported monitoring systems, capable of analyzing real-time data from monitors and detecting changes simultaneously, would be valuable not only for routine Neonatal Intensive Care Unit (NICU) care but also for fostering “family-centered care”212,213 initiatives. While neonatologists remain central to decision-making and communication with parents, AI could actively contribute to NICU practices. Hybrid intelligence offers a platform for monitoring both abrupt and subtle clinical changes in infants’ conditions.
The limited understanding of Deep Learning (DL) among many medical professionals poses a challenge in establishing effective communication between data scientists and medical specialists. A considerable number of medical professionals, including pediatricians and neonatologists, lack familiarity with AI and its applications due to limited exposure to the field. However, efforts are underway to bridge this gap, with clinicians taking the lead in AI initiatives through conferences, workshops, courses, and even coding schools214,215,216,217,218.
Looking ahead, neonatal critical conditions are likely to be monitored by human-in-the-loop systems, and AI-empowered risk classification systems may assist clinicians in prioritizing critical care and allocating resources precisely. While AI cannot replace neonatologists, it can serve as a clinical decision support system in the dynamic and urgent environment of the NICU, calling for prompt responses.”
Challenges Stemming from Insufficient Imaging Data, Annotations, and Reproducibility Issues
“There is a growing interest in leveraging deep learning methodologies for predicting neurological abnormalities using connectome data; however, their application in preterm populations has been restricted. Similar to most deep learning (DL) applications, these models often necessitate extensive datasets. Yet, obtaining large neuroimaging datasets, especially in pediatric settings, remains challenging and costly. DL’s success depends on well-labeled, high-capacity models trained with numerous examples, posing a significant challenge in the realm of neonatal AI applications.
Accurate labeling demands considerable physician effort and time, exacerbating the existing challenges. Unfortunately, there is a lack of large-scale collaboration between physicians and data scientists that could streamline data gathering, sharing, and labeling processes. Overcoming these challenges holds the promise of utilizing DL in prevention and diagnosis programs, fundamentally transforming clinical practice. Here, we explore the potential of DL in revolutionizing various imaging modalities within neonatology and child health.
The need for a massive volume of data is a significant impediment, especially with the increasing sophistication of DL architectures. However, collecting a substantial amount of clean, verified, and diverse data for various neonatal applications is challenging. Data augmentation techniques and building models with shallow networks are proposed solutions, though they may not be universally applicable. Additionally, issues arise in generalizing models to new data, especially considering variations in MRI contrasts, scanners, and sequences between institutions. Continuous learning strategies are suggested to address this challenge.
Most studies lack open-source algorithms and fail to clarify validation methods, introducing methodological bias. Reproducibility becomes a crucial concern for comparing algorithm success. Furthermore, the lack of explanations and reasoning in widely used DL models poses a risk, especially in high-stakes medical settings. Trustworthiness is imperative for the widespread adoption of AI in neonatology.
Collaboration efforts across multiple institutions face privacy concerns related to cross-site sharing of imaging data. Federated learning has been proposed to address privacy issues, but data heterogeneity may impact model efficacy. Ethical concerns, including informed consent, bias, safety, transparency, patient privacy, and allocation, add complexity to health AI. The implementation of an ethics framework in neonatology AI is yet to be reported.
Despite these challenges, the potential benefits of AI in healthcare are substantial, including increased speed, cost reduction, improved diagnostic accuracy, enhanced efficiency, and increased access to clinical information. AI’s impact on neonatal intensive care units and healthcare, while promising, requires clinicians’ support, emphasizing the need for collaboration between AI researchers and clinicians for successful implementation in neonatal care.”
Approaches Employed
“Review of Literature and Search Methodology We conducted a comprehensive literature search using databases such as PubMedโข, IEEEXploreโข, Google Scholarโข, and ScienceDirectโข to identify publications related to the applications of AI, ML, and DL in neonatology. Our search strategy involved various combinations of keywords, including technical terms (AI, DL, ML, CNN) and clinical terms (infant, neonate, prematurity, preterm infant, hypoxic ischemic encephalopathy, neonatology, intraventricular hemorrhage, infant brain segmentation, NICU mortality, infant morbidity, bronchopulmonary dysplasia, retinopathy of prematurity). The inclusion criteria were publications dated between 1996 and 2022, focusing on AI in neonatology, written in English, published in peer-reviewed journals, and objectively assessing AI applications in neonatology. Exclusions comprised review papers, commentaries, letters to the editor, purely technical studies without clinical context, animal studies, statistical models like linear regression, non-English language studies, dissertation theses, posters, biomarker prediction studies, simulation-based studies, studies involving infants older than 28 days, perinatal death studies, and obstetric care studies. The initial search yielded around 9000 articles, from which 987 relevant research papers were identified through careful abstract examination (Fig. 4). Ultimately, 106 studies meeting our criteria were selected for inclusion in this systematic review (see Supplementary file). The evaluation covered diverse aspects, including sample size, methodology, data types, evaluation metrics, as well as the strengths and limitations of the studies (Tables 2โ7).
Availability of Data Dr. E. Keles and Dr. U. Bagci have complete access to all study data, ensuring data integrity and accuracy. All study materials are accessible upon reasonable request from the corresponding author.”
“Elon Musk Introduces His Own AI Chatbot to Challenge ChatGPT, Asserting Prototype’s Superiority Over ChatGPT 3.5 in Multiple Benchmarks.
Named ‘Grok,’ this AI bot is the inaugural creation of Musk’s xAI company and is currently undergoing testing with a limited group of users in the United States. Grok is being developed using data from Musk’s X, formerly known as Twitter, which enables it to stay better informed about the latest developments compared to other bots relying on static datasets, as mentioned on the company’s website. Additionally, Grok is designed to respond with a touch of humor and a hint of rebellion, as stated in the official announcement.”
“Earlier this year, Musk joined a group of signatories in a petition urging a temporary halt to the progression of AI models, in favor of creating collaborative safety protocols.
“I added my signature to that letter fully aware of its limited impact,” shared the billionaire who owns X and serves as the CEO of Tesla Inc. in a recent Sunday post. “I merely wanted to formally express my support for a temporary pause.”
President Joe Biden of the United States has recently issued an executive order concerning AI oversight, with the aim of establishing standards for security and privacy safeguards. Simultaneously, technology leaders and academics engaged in discussions about the risks associated with this technology at the AI Safety Summit in the United Kingdom last week.”
“Grok has been created within a two-month development timeframe, as stated in the xAI announcement. Once it completes the testing phase, it will be accessible to all X Premium+ users. Elon Musk has articulated his aspiration to expand X beyond its initial role as a social platform and transform it into a versatile application, similar to Tencent Holding Ltd.’s WeChat in China. Grok will play a crucial role in this endeavor. Although xAI is an independent company, it has expressed its intention to collaborate closely with X, Tesla, and other enterprises.”
Artificial Intelligence (AI) has made significant advancements in recent years, transforming various industries and revolutionizing the way we live and work. From virtual assistants to autonomous vehicles, AI has undoubtedly improved efficiency and convenience in many aspects of our lives. However, amidst the remarkable progress, we must also analyze and understand the potential dangers that AI poses. In this article, we will explore some of the ethical concerns, privacy risks, job displacement, and issues of bias and discrimination associated with AI technology.
As AI becomes increasingly capable of making independent decisions, ethical concerns arise regarding the use and deployment of such technology. One major concern is the potential for AI to be weaponized or used for malicious purposes. For example, autonomous drones equipped with AI algorithms could be used to carry out targeted attacks or surveillance, raising concerns about the lack of human control and accountability.
Additionally, AI algorithms can amplify existing social biases, leading to unethical decisions and actions. Biased training data or biased algorithm designs can result in discriminatory outcomes, particularly in areas such as criminal justice, hiring processes, and financial lending. The responsibility lies with developers and policymakers to ensure fairness and transparency in AI systems.
The widespread adoption of AI often involves collecting vast amounts of personal data. This data can range from personal preferences and browsing habits to highly sensitive information. Privacy risks arise when this data is exploited or mishandled.
AI algorithms rely on massive datasets to make accurate predictions or decisions. While data anonymization techniques are typically employed, there is always a potential risk of re-identification. If an individual’s identity can be linked to their data, it can be used for targeted advertising, manipulation, or even identity theft. The increasing integration of AI into everyday devices and services further amplifies the potential privacy risks, emphasizing the need for robust data protection and privacy regulations.
One of the major concerns regarding AI is its impact on the labor market. As AI technologies continue to advance, there is a fear that automation will lead to significant job displacement across various industries.
Tasks that were once performed by humans can now increasingly be handled by AI-powered systems. Jobs in sectors such as manufacturing, transportation, and customer service are already witnessing the effects of automation. While new jobs may be created in AI development and maintenance, the transition period can be challenging for those whose jobs are at risk of being replaced.
Efforts to address job displacement involve upskilling and reskilling programs to equip individuals with the skills needed for the jobs of the future. Furthermore, policymakers must consider measures such as universal basic income to support individuals who may find it difficult to adapt to the changing job landscape.
AI algorithms are trained on historical data, which often reflects the biases and prejudices present in society. Consequently, AI systems can perpetuate and even amplify these biases, leading to discriminatory outcomes.
One prominent example is the use of AI in predictive policing. If historical crime data is biased against certain demographic groups, the AI system will learn and reinforce these biases, leading to racial profiling and unjust outcomes. Similarly, biased algorithms in hiring processes can result in discrimination against certain individuals or groups.
Addressing bias and discrimination in AI systems requires careful attention to the data used for training and the design of the algorithms. Regular audits, diversity in AI development teams, and input from affected communities are crucial in mitigating these issues and ensuring fairness in decision-making processes.
Conclusion:-
While AI has immense potential for positive impact, it is essential to acknowledge and address the potential risks and dangers it brings. Ethical concerns, privacy risks, job displacement, and issues of bias and discrimination are key areas that require continual evaluation and regulation to ensure the responsible development and deployment of AI systems. By understanding and actively working to mitigate these risks, we can harness the benefits of AI while safeguarding against its darker implications.