Partner in EHRI: Ontotext - Semantic Integration of Archives

Ontotext - Semantic Integration
Thursday, 6 October, 2016

Vladimir Alexiev

Ontotext is a Bulgarian software company that works in the areas of Semantic Web, Linked Open Data, Semantic Text Analysis and Conceptual Search. We have worked on many EC research projects and commercial projects in the areas of semantic repositories (databases) and semantic text enrichment (concept extraction and named entity recognition). A lot of our business is with mass-media and publishing companies, such as BBC, Euromoney, Press Association, Financial Times, IET, Oxford University Press, Wiley, etc. We have also worked with GLAM institutions, such as the UK National Archives, the British Museum, one of the most important vocabulary institutions in Switzerland, Europeana, DBpedia.

Holocaust research domain

We are very excited to work in the Holocaust research domain, since we believe our experience and technologies can be applied usefully to interconnect institutions, documents, subjects on one hand, and people, organizations, places, historic events on the other.

EHRI involvement

We are involved in several EHRI working groups (or Work Packages):

  • In the working group on Resource Identification and Integration Workflows (Work Package 10) we’ll work on a tool to convert various archive formats to the XML standard EAD, and perhaps extract person information as EAC. We may also get involved in improving the EAD ingestion processes to the portal, to enable use cases such as synchronization.
  • In the working group on Users and Standards (WP11) we will work on semantic (LOD) publication according to established standards, on elaborating the EHRI vocabularies (authorities), and connecting text-only access points in ingested archival descriptions to authority access points. Currently less than 10% of access points are to authorities, and there is 10-20x duplication due to variations of spelling and punctuation (e.g. “Lodz” is spelt using at least 23 different ways!) We are planning to co-reference places to Geonames, so we can leverage the Geonames place hierarchy and geographic coordinates. This should also enable semantic (conceptual) search, to improve discoverability and interconnectivity of archival descriptions.
  • In the working group on Research Data Infrastructures for Holocaust Material (WP13) we will work on building up domain knowledge bases by using semantic data and text integration, e.g. to provide the foundation for the research use case on building up Social Networks of people.
  • In the working group on Digital Historiography of the Holocaust (WP14) we will work on semantic text analysis (semantic enrichment) and researcher tools using Digital Historiography approaches, including Prosopographical approaches.

The image with this article symbolizes integrating databases, free text, and metadata of archival materials to interconnect objects and obtain networks of people, events, places, etc.

Image: ©Ontotext