Go directly to the content of the page Go to main navigation Go to research

Billions of information assets lie dormant on servers and in data centres. The financial and environmental cost of this so-called “dark data” has become a problem that can no longer be ignored.

According to a study conducted by the Enterprise Strategy Group institute for software publisher MEGA International, the average amount of data in organisations doubles every two years. Currently, businesses generate 1.3 billion gigabytes of data worldwide every day.

A large proportion of this information is dark or cold data, also called “dormant data”. Rarely, if ever, accessed or used, it is generated by countless interactions of users of information systems in businesses and organisations. Server log files, geolocation data, emails and attachments are some examples.

The expansion of the cloud and the increasing use of the Internet of Things (IoT) will only accelerate this mass production of cold data. At global level, The State of Dark Data 2019 report published by TRUE Global Intelligence for Splunk, a software company, estimated that dark data accounted for 52% of data stored in the world.

A hefty bill

As dormant data accumulate on business servers and in data centres, they create a considerable financial burden. A study by American firm International Data Corporation (IDC) finds that costs amount to about €2 billion each month worldwide. Add to that, a high and increasing environmental cost: according to a Veritas study, dark data were responsible in 2020 for the emission of 6.4 million tonnes of CO2, the equivalent of the carbon footprint of a car travelling 575,000 times around the world.

The data sector already accounts for 4% of greenhouse gas emissions. Data centres alone have a larger carbon footprint than that of the aviation industry (2.5% of CO2 emissions compared with 2.1%).

A third aspect that senior executives should consider is the explosion in vulnerabilities caused by large amounts of data, which could threaten the security of business information systems.

Waking up to the problem

The issue is clearly not a priority for businesses. “Who is going to take responsibility for deleting this data? No one even wants to half-open the door to clean it all up. It’s often easier to keep the data,” says Cor Bonda, data & analytics lead consultant at Axians Netherlands.

“You should establish a data management policy that is shared by everyone in the business. But you should use your needs as a starting point, rather than the data.”

It is time, however, to give the matter serious thought, now that energy prices and data-centre storage costs are soaring. What’s more, regulations like GDPR on the management of personal data, of which there are an increasing number, require that information is not retained indefinitely.

But businesses and especially SMEs with limited resources are often left wondering how to find this data. “In many cases, organisations don’t even know they have dark data! So the first thing to do is to identify those assets,” notes Cor Bonda. The next step is to classify which cold datasets need to be kept in cold storage, which can be tapped into and which should be permanently deleted.

What solutions are there?

“To do this, you should use your needs as a starting point, rather than the data. And establish a data management policy that is shared by everyone in the business,” recommends Axians’ data & analytics lead consultant. He adds however that, “first and foremost, you should focus on generating less data; then you have less dark data.”

Training teams in these issues, carrying out regular audits so as to identify and eliminate dark data, mapping and creating processing records of personal data to monitor the asset life cycle are all solutions that can help. This kind of work in locating, identifying and classifying data can be optimised by artificial intelligence (AI). Furthermore, AI may be an attractive tool for unlocking the value of dark data. It can significantly improve client knowledge and relationships by using and better leveraging hitherto dispersed client-related data.