Like archeological exploration, the pursuit of dark data dives into the unknown and the long forgotten. Yet, unlike archeology, dark data doesn’t lie dormant for free. Any sustainable expedition into the unknown needs to scale and finance the proper infrastructure to accomplish its mission. This requires ambitious financing from those who understand the value of such ventures.
In my first piece, I delved into dark data – a.k.a. information that has been collected and stored, but never used – and the benefits of processing it. For example, cleaning and analyzing dark data can enhance government services and address critical social issues related to every aspect of governance. Addressing dark data collection is part of good digital hygiene for the strength of a government’s digital ecosystem. After all, poor data leads to poor results, which lead to poorer organizations. This is why the cost of illuminating dark data should be a financial priority for governments.
Globally, dark data is costing billions annually.
Identifying the cost of data to governments across the global minority can be hard to pin down, partly due to a lack of accurate or transparent reporting. However, based on several sources available, including the Federal IT budget of the United States, along with available US state, local, and education (SLED) IT budgets, it can be safely estimated that data storage costs for governments globally are in the billions, if not more.
The cost of data storage ranges geographically and by provider, whether on the cloud or not. Networking, storage, compute costs, database and security services, hidden costs, and more means even small collections of dark data can compound into budgetary burdens. Not to mention the opportunity costs of choosing not to analyze potentially useful data.
Cost to governments includes both quantitative and qualitative components. For example: trust. Specifically, trust that governing bodies will honor and protect citizen data. Trust is a valuable ingredient for reducing volatility, easing citizen anxiety, and creating stability in any society. As the amount of data grows, so too does cost risk and potential to erode citizen trust, as in the case of cyber breaches. The predicted global cost of cybercrime will reach $1.5 trillion annually by the end of 2025. From a risk assessment (VUCA) standpoint, dark data adds complexity when dealing with cybersecurity costs from data breaches.
Additional costs such as compliance to privacy regulations are also key to maintaining trust. Lack of trust has led to costly and time-consuming challenges, as in Kenya’s digital ID Huduma Card case, which raised concerns over the Constitutionally protected right of privacy in the processing and collection of personal data.* While these examples are not comprehensive, they serve as a sampling of potential costs compounded by holding unaddressed dark data. To mitigate such costs, robust financing in two key areas – skills training and infrastructure – is essential for effectively managing and harnessing dark data.
Skills training is needed to both manage dark data and address pressing needs.
As countries across the world increasingly integrate digital tools and technology into daily life, data production will only continue to grow. In Africa, for example, the digital divide is (slowly) shrinking,with the continent experiencing the fastest growth in internet penetration rates worldwide. That’s a lot of new, and likely dark, data being collected that’s going to need cleaning, storing, and protecting for both enterprise and government. Closing the digital gap means there’s a need to fund digital skills training for basic internet use all the way up to technical specialists; from training those who are sharing their data to training those who can then process it.
Trained individuals can design best practices for efficient data collection moving forward, as well as address the current collection of historical dark data within government information storage systems.** Digital infrastructure requires data engineers, data analysts, AI training specialists, and privacy officers just to name a few. In addition, records managers are essential for digitizing physical legacy data, which is a formidable issue in the realm of dark data for every government. If we imagine a digital play, these workers represent the behind-the-scenes crew, who ensure the whole production goes off smoothly, safely, and according to the script.
This skilled workforce must implement and enforce a data lifecycle calendar that reduces dark data retention backlogs from the beginning. They decide which information needs to be held, for how long, and how this fits into the budget. Reducing dark data in one area may free up the money to store essential data, like health data, in another area.
While AI is increasingly capable of processing and analyzing data, there is still, and will continue to be for many years, the need for human intelligence. Ensuring the establishment of a dedicated dark data force made up of these qualified professionals, using AI tools, should be a priority for any government.
To promote this capacity strengthening, governments can join in public-private partnerships that invest in the skills training needed to close the digital skills divide, such as with Nigeria’s Edo Innovation Hub. Other public-private partnerships technical skills training across Africa include Hitachi’s investment in the African Cyber Security Innovation Center (ACSIC) in Cote d’Ivoire and International Finance Corporation (IFC)’s equity partnership with Andela.
Developing physical infrastructure is crucial for long-term data efficiency.
Investment in digital infrastructure helps to further close the digital divide, unlocking new data-driven economic opportunities for citizens and governments alike. Dark data will take time to analyze, and some unanalyzed data should be held for even longer periods of time, such as in the cases of legal and health data. Local data centers reduce costs of storing data over the long term, allowing for the cost of valuable dark data to become less financially burdensome.
Additionally, with the increase of regional data collection, and soon-to-be digitized dark legacy data, there’s a need to bolster digital capacity closer to home. Local data centers also improve services, which assists in providing greater access for citizens while allowing for more government oversight for the cyber protection of their data. Plus, investing in local data centers and other necessary digital infrastructure provides more opportunities to develop a localized skilled labor force to manage these systems into the future.
As it stands, the global minority is underrepresented in data infrastructure and capacity. Again, using Africa as an example, there are only around 200 data centers operating across the vast continent, such as Nigeria’s recently unveiled first state-owned data center in Benin City. However, this number is only about the same amount as the very small European country of the Netherlands.
Unfortunately, developing digital infrastructure isn’t cheap. The cost of building data centers has been increasing over the past couple of years, partly due to supply chain bottlenecks, as well as increasing IT and staffing costs. HVAC and electrical construction can be up to 65% of a project’s cost. This doesn’t even take into account potential costs from building fiber optics to connect a region to a data center.
Additionally, increasing energy prices is one of the largest factors in managing ongoing data center costs. The current power capacity of data centers in Africa is 400 MW and is expected to double by the end of 2025, with growth predicted to hit less than 2GW by 2027 (as compared to US predicted growth from 25GW to 55GW).***
While African governments are accountable for “40% of the current $80 billion annual investment” in Africa, there is still a projected need for an additional $170 billion annually. This is where public-private partnerships can fill in the gap. Microsoft and G42 are investing $1 billion into Kenya’s data ecosystem, which includes a geothermal powered green data center. Recently, the World Bank announced a deal with data center developer Raxio of $100 million for development across the continent.
In short, dark data analysis presents an enormous opportunity, but it must be sufficiently financed.
Dark data will remain a costly, missed opportunity unless governments continue to invest in infrastructure, skills, and strategic partnerships. The benefits of dark data management result in stronger public trust, less surface area for cyber exploitation, better public outcomes through intentional analysis, more efficient governance, and new avenues for economic growth.
An expedition into the unknown needs strong financing to build sustainable infrastructure with a skilled team running the operations. This requires a thoughtfully developed strategy to overcome any challenges faced during such a mission. In my next article, I will explore how digital public infrastructure (DPI) relates directly to each of these issues and can offer solutions to some of the challenges in organizing dark data expeditions.
*Please refer to DIAL fellow Risper Onyango’s work on the importance of involving CSO’s in the development of national digital initiatives, such as digital ID implementation and adoption.
*Another concern in this case was the use of third-party vendors to process sensitive citizen data. If you’re curious about vendor relationships to government data collection, please see DIAL fellow Manuel Aguilera’s work.
**If you’d like to learn more about design in the data collection process, take a look at DIAL fellow Kassim Vera’s work.
***And, for more information on addressing the sustainability of data center energy usage, I recommend following the work of my peer and current DIAL fellow Arjun Gargeyas.