In 2024, dengue fever cases across Central and South America surged to 12.6 million, nearly triple the previous year. Since 2000, Brazil alone has documented 18 million cases of dengue.
Faced with growing concerns, researchers in Brazil asked themselves: how might data be used to help improve the health of a society? They began to explore new frontiers of data, particularly “dark data” that had previously been left unused and unanalyzed.
In two studies, researchers utilized data in non-traditional data streams, like social media, partnered with more traditional government data to create time-sensitive nowcast reports. These reports identified likely outbreak locations in order to enhance government response time. A separate study focused on the early 2024 outbreaks in Brazil using a neural network to forecast outbreaks by combining historical legacy data on dengue outbreaks, climate information, spatial effects and cyclic patterns in a scalable manner.
Clearly, dark data can be useful - but what is it?
Dark data refers to information that has been collected and stored, but has never been used. Gartner originally defined it as “information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.” This can include paper records, legacy systems, or even unstructured digital files that sit unanalyzed.
And it’s not rare. According to a policy brief from Digital Decarbonization, an estimated 65% of all data collected is dark. It exists across all government departments, from labor to health to defense.
When governments responsibly explore and analyze this underused information, with the right safeguards in place, they can uncover insights that save lives, improve services, strengthen economies, and restore public trust.
When correctly harnessed, dark data can fuel AI systems to provide valuable insights.
To help uncover patterns hidden in their dark data, governments are increasingly turning to artificial intelligence (AI). AI can point out trends and gaps we didn’t even know to look for. But, for AI to be effective, it needs to consume A LOT of data, and the data it is fed needs to have rules and structure to it. AI is only as good as the data it’s trained on.
That’s why investing in dark data infrastructure through digitizing legacy systems, improving data quality, and building cross-agency access is essential. Without a strong data foundation, even the most advanced AI tools will fall short of finding the solutions they are asked to look for. Governments don’t just need smarter technology; they need more visible, and cleaner, data to feed it.
Dark data is full of untapped value for all government sectors. One corporate analysis pegged the global data analytics market at $65 billion in 2024, with projections exceeding $400 billion by 2032. This valuation should matter to the public sector just as much as it does to the private sector, because it underscores how better data use, including dark data, can fuel innovation, efficiency, and economic resilience.
But, to maximize this value, effective safeguards are needed.
The promise of dark data must be balanced with responsible governance. Otherwise, systems could fail to achieve their intended goals. In 2022, for example, the Indian Railways Catering and Tourism Corporation (IRCTC) attempted to monetize customer and vendor data to improve services. But without strong data protection laws in place, the move sparked public backlash and was ultimately abandoned.
Responsible exploration is important for governments to earn and maintain public trust. That means building safeguards alongside innovation, privacy protections, transparency, and human-centered data policies.
To fully unlock the promise of dark data, governments must build strong data foundations that enable responsible, impactful use of artificial intelligence in the public sector, including:
- Digitizing and modernizing legacy data systems
- Ensuring interoperability between agencies
- Implementing strong data protection and governance frameworks
- Designing AI systems with transparency and accountability in mind
Without these foundations, even well-intentioned initiatives may fall short, or worse, reinforce existing biases and blind spots. And for low- and middle-income countries, the gap in resources must be addressed through international cooperation and investment. For example, the entire continent of Africa has roughly the same amount of data centers as the Netherlands. Many of these have been developed via international funding support, which can ebb and flow even though the need remains consistent.
Dark data has the potential to benefit people, communities, and governments – it’s time to start uncovering it.
Governments can’t afford to ignore what they can’t see. Dark data may be hidden, but its consequences and opportunities are already shaping lives, economies, and institutions. Responsible exploration isn’t just a technical issue, it’s a governance imperative.
We’ll explore these complexities further in future articles, from implementation strategies to data equity issues. But for now, the message is simple:
Invest in the systems, safeguards, and skills needed to illuminate dark data, because what you don’t know can’t help you.