Data Science

Digital Insight Lab

The research group Digital Insight Lab has evolved significantly since it was founded in 2003. Originally known as Digital Memory Engineering (DME), the group carried out “domain-driven research,” which is to say, it operated in the digital library domain and developed technical solutions based on the requirements of digital libraries.

In the early years of the group, activities cantered on the development of new methods of accessing digital cultural heritage, followed by domain challenges in the context of digital preservation, that is, storage and access to digital information over very long periods (centuries). After that the research group successfully closed a number of research projects including their flagship digital preservation project SCAPE, which was coordinated by AIT. The portfolio of project results from SCAPE includes scalable workflows, characterization tools, and quality assurance tools.
What has become clear over time is that the scope of research in this area can no longer be properly described by “content management” alone. While partners and customers continue to be faced with the problems associated with an exponentially growing volume of digital assets, the more pressing research questions have focused on how to extract knowledge and valorise these assets. By combining this requirement with the research group’s goal of becoming technology driven rather than domain driven, the group realized that their activities are best described by the emerging field of Data Science.

Data Science focuses on gaining insight from data by applying quantitative methods and techniques on scalable data processing and analytics infrastructures. It conducts applied, data-centred research throughout the entire data life cycle starting from problem formulation, over data aggregation, analytics, and visualization to publication of data sets for reuse and reproducibility.

Data Science research conducted within the group faces several challenges requiring a combination of diverse skills drawn from computer science, statistics, and user experience design:

  • the ability to handle huge volumes of data and algorithms by operating high-performance computing clusters and cloud-based infrastructures
  • the know-how that is required to apply, tune and evaluate predictive analytics techniques
  • the creativity involved in designing powerful visualizations and user interactions
  • the expertise of applying modern data publication and preservation strategies to enable efficient reuse of data sets also in a long-term perspective.