MirrorWeb awarded contract to archive the history of Wales

phil-clegg-square-HI-RES

Manchester’s MirrorWeb has been awarded a three-year contract by the Welsh government to digitally archive the nation’s online presence, preserving the government’s web-published content in both English and Welsh on all websites of historic and national significance, including Twitter accounts.

This will open up full accessibility for the Welsh Government’s Information and Archive Services team.

MirrorWeb is a cloud-native web and social media archiving company and has developed robust and highly scalable cloud-based archiving and monitoring tools to enable frequent archiving of web and social media assets for businesses in the private sector, and public sector bodies.

It allows billions of documents to be indexed at unprecedented speed, making archives fully usable and searchable.

Philip Clegg, chief technical officer at MirrorWeb said: “Our website and social media archiving and monitoring platform is built on the cloud and provides the essential infrastructure and capacity to meet the size and complexity of the Welsh government’s archive.

“This will modernise how the content is captured and stored, and provide them with a reliable, comprehensive and intuitive search service of Wales’ digital history.”

National archives around the world have been collecting data for decades, and are only now beginning to realise that the archives of the future will be born out of web and social media content.

Safely capturing and storing this information is the only way to prevent it being lost completely, and modern Big Data tools and the emergence of cloud computing now enable archives to index the data and derive real value on investment from it.

The project is currently in its early stages, with MirrorWeb having received the current historical archive – around 1.8 million pages – amounting to 20TB of data that has now been transferred seamlessly and cost efficiently to the cloud.

The millions of web pages were harvested over the last three years, but have now been captured and indexed by MirrorWeb’s propriotory platform in a matter of hours.

MirrorWeb will now perform crawls of the sites and social media channels, harvesting and preserving the up-to-date data and publishing it using its state of the art technology, providing a comprehensive and complete archive for future generations to access and use.

Related Stories

vitispr