Archiving and Preservation

Expertise and Technology

Through its years of experience and project work, the research group has unique expertise in the areas of digital archiving and preservation. Digital Preservation encompasses standards, best practices, and technologies that are used to ensure long-term access to digital information. The group offers a series of services in this area based on independent consulting and implementation. We can provide short-term customer advantages by detecting weaknesses in existing or planned systems, and by making recommendations for the practical improvement of such systems. In the long term, this leads to avoidance of risk and thus cost savings to the customer due to the avoidance of information loss. We can also offer services for storage optimization that can yield immediate returns through lower storage costs.

The experts can provide support in the further development of existing content management systems by integrating scalable preservation technologies. The research group offers expertise in intensive computing (making use of Map/Reduce and Apache Hadoop), distributed bit preservation (making use of the LOCKSS platform), and automated quality assurance for content digitisation and migration processes.

Big Data

Big Data refers to collections of data that are too large to be stored in traditional databases or too large to be handled by traditional applications. There are many challenges associated with Big Data including ingest, management, preservation, storage, analysis, and visualisation. The group is researching various approaches to these challenges using open source software like MongoDB, Apache Hadoop, and NGDATA Lily. We can apply these solutions in diverse domains such as web archiving and law enforcement.

Quality Assurance and Recommender Systems

The state of the art in digitization quality assurance relies on statistical sampling methods. These methods fail for two reasons: first, with very large scales, the human effort for reviewing sub-samples remains expensive; and secondly, because real errors are correlated and not random, and, therefore, statistical methods are actually unlikely to discover the errors. The research group has developed tools for automated quality assurance in the digitisation process such as matchbox. The output of matchbox feeds a decision support system for human operators, with high detection efficiency and few false positives, thus assuring that quality assurance remains an economically scalable activity.