What drives the world more than data?


Alright, before we get into the nitty-gritty of data, let’s check out some mind-blowing quotes that really capture what data is all about. People have shared some seriously inspiring insights over the years. So, here are a few famous quotes that resonate deeply when we think about the power of data:


"Data is the new oil."

- Clive Humby, Data Science Entrepreneur

"Data is the currency of the 21st century."

- Thomas Redman, Data Management Expert

"Data is a precious thing and will last longer than the systems themselves."

- Tim Berners-Lee, Inventor of the World Wide Web
Data influence decisions

Data is the lifeblood of today’s world. Whether you realize it or not, data is constantly shaping your experiences—from the ads you see online to how businesses customize their services to meet their needs.

For businesses, data means smarter decisions. By analyzing trends, companies can refine their products and reduce costs. Retailers, for example, can optimize inventory management and minimize waste—saving money while enhancing customer satisfaction. According to a McKinsey report, data-driven organizations are 23 times more likely to acquire customers and 19 times more likely to be profitable.

In the software world, data helps developers spot patterns, write better code, and fix bugs faster. It's like having a map when you're lost—it just makes everything easier. GitHub’s use of data analytics, for instance, has significantly improved code review processes and collaboration among developers.

Similarly, in healthcare, data can literally save lives. Access to accurate data ensures that resources are allocated efficiently, trends are identified, and targeted interventions are developed. This data-driven approach empowers policymakers and healthcare professionals to make informed decisions that can significantly improve patient outcomes.

In short, data is not just a resource; it’s a powerful tool that drives progress across all sectors of society.

Existing Chest X-ray Datasets

There are some popular chest X-ray datasets out there, like the NIH Chest X-ray and ChestX-ray8 datasets. These have been great for developing machine learning algorithms for cardiorespiratory diseases, but they come with some drawbacks.

The NIH Chest X-ray dataset includes over 100,000 images labeled for 14 diseases, but it excludes patients under 18 and lacks demographic information, limiting its generalizability, especially since it mostly features U.S. images. The ChestX-ray8 dataset, with 108,948 images for eight common diseases, is collected from a single hospital, which may introduce bias and lacks representation for other diseases and diverse populations. This bias limits the generalizability of the datasets to other populations, particularly those in low- and middle-income countries. The MIMIC-CXR dataset offers chest X-rays linked to clinical notes for richer analyses; however, its focus on a single institution and high-income patients can affect its global applicability. Lastly, the CheXpert dataset contains over 200,000 labeled images, including details about medical devices, but is limited to adult patients and is primarily sourced from Stanford, which may not reflect a broader demographic.

On top of that, some datasets face selection bias because the images are chosen based on specific criteria like image quality or how common a disease is. This can result in certain diseases being over-represented while others are left out, which limits the dataset's usefulness for creating models that can accurately diagnose a full range of diseases.


We are currently working on expanding the Tikur Chest X-ray Image Dataset as part of our efforts to improve the quality of the existing datasets, which will help researchers and healthcare providers develop accurate diagnostic tools for the diagnosis of cardiovascular and respiratory diseases. To better equip medical professionals and researchers in the fight against cardiovascular and respiratory diseases, we are proud to announce the upcoming launch of new medical imaging datasets by expanding the Tikur Chest X-ray Dataset that will provide access to high-quality medical imaging data. For instance, the expanded dataset will include images from various sources to help researchers and providers detect diseases such as pneumothorax, pleural effusion, pulmonary edema and congestive heart failure. We hope that this dataset will help address the challenges faced by healthcare providers in low- and middle-income countries and improve health outcomes for individuals and communities affected by cardiovascular and respiratory diseases. Link to the Tikur Chest X-ray Image Dataset is
https://github.com/kibromhft/Tikur_ChestX-ray14.



We are grateful to Lacuna Fund for initiating our collaborative efforts, which help us double our impact in medical imaging research. We look forward to making an even greater difference in this important field.

Stay tuned for our upcoming large representative dataset that will make a significant impact in medical imaging.