"If this Portal could talk..."| EHRI's Veerle and Herminio Talk About Data Identification and Integration
The EHRI Portal offers access to information on Holocaust-related archival material held in institutions across Europe and beyond. It is one of the main achievements of the European Holocaust Research Infrastructure. The EHRI Portal, an ever growing resource, now enables you to browse 418,164 archival descriptions in 807 institutions. It also has information on 2,234 archival institutions in 60 countries. 63 national reports provide an overview of the Second World War and Holocaust history as well as of the archival situation in the covered countries. It makes it hard to believe that this all started in 2010 with an Excel spreadsheet. Veerle Vanden Daelen and Herminio García-González are leading on data identification and integration into the EHRI Portal. Time to ask them how this works.
Veerle and Herminio, you work within EHRI for Data Identification and Integration. Could you explain what that means and what you do - also for people who may have no idea?
Herminio: As you mentioned in your question, incorporating new archival descriptions in the EHRI portal is a two-step process. We start by identifying institutions that hold Holocaust-relevant collections (data identification). Then, we establish a contact with them in order to explain what the EHRI project is, what the goals of the EHRI portal are and what the benefits are for them of integrating their collections in it. Afterwards, the conversation goes on a more technical level (data integration) by deciding how these descriptions can be imported in the EHRI portal in the most automatic way possible. Finally, once all these details are settled we design the data mapping (translation from their format to our format) and ingest them in a testing environment. Here, the institution’s representatives can request changes and approve the final shape of their data within the EHRI portal. Once both sides agree on the results everything is moved to the EHRI portal and made publicly available.
Veerle: What also happens, and that is really nice, is that institutions contact EHRI asking how they can be part of the EHRI Portal. The procedure goes in the same way as described by Herminio, but with an already motivated and interested partner, it is always very rewarding to work together. With the institutions contacted by EHRI, it is equally fulfilling to see them getting introduced to the project and our work, and happy with the results.
The EHRI Portal gives access to Holocaust-related sources. What is the result of what you do, how does this change the Portal, what is the impact?
Veerle: Well, first and foremost the EHRI Portal wishes to provide as much information on Holocaust-related archives as possible, so that researchers, be they professional academics, teachers, family members of the victims, or the broad public, can all find as complete information as possible on the topic they are looking for in one central place. As Holocaust sources are scattered all over the world and have not always ended up in the places one would assume, bringing all this information together offers invaluable information to everyone researching the Holocaust.
Herminio: Indeed, the main result is that after each of the aforementioned data identification and integration processes we are able to offer more archival descriptions to the EHRI portal users. However, there are more side effects that may not be so evident at first sight: 1) contributing institutions can gain visibility through the EHRI portal as users can find their collections more easily and then be redirected to their own web catalogue or even to their contact information to go there in-person; 2) collections are contextualised as part of being jointly accessible with other Holocaust sources or even interlinked with others (e.g., copy linking); 3) collections can be enriched in various ways, for example, linking the access points to the EHRI vocabularies and making them more searchable in different languages.
The EHRI Portal now has 418,164 archival descriptions in 807 institutions. Have you been in touch personally with each of these 807 institutions? How has this been achieved?
Herminio: Well, that’s almost impossible! This is an ongoing effort which started in EHRI’s first phase of funding (2010-2015), involving different people. While we always try to give preference to automatic data integration methods, there are many cases in which this is not possible. Therefore, a lot of manual work is still needed. In that sense, some institutions decide to enter their data manually and may contact different people inside the project. We count on local experts working closely with regional hubs leaders to incorporate archives that are not easily reachable.
Veerle: Indeed, having been involved in EHRI’s data identification and integration almost since its very start, I am familiar with many names and institutions who have contributed to the EHRI Portal. And as Herminio mentions, it is not always one person or one institution providing information for one institution: next to the local experts we are working with, there are also aggregators, often on a national level, but sometimes also on a specific topic, with whom we work together to integrate collection descriptions to various institutions in the EHRI Portal to. For Belgium and the Netherlands, for example, much work has been done via such national aggregators. As long as they keep their information up to date, this is also a feasible way to integrate the information we wish to see in the EHRI Portal.
Veerle, as you mentioned, you have been working for EHRI Data Identification and Integration since the start. Could you share a little about the history of this process?
Veerle: Sure! When I started in April 2011, EHRI was only a couple months old. There was no EHRI Portal yet. All I received on my desk was an Excel overview, provided to us by the Claims Conference with names and contact details of Holocaust-related archives. With an international consortium of historians, archivists, digital humanists and IT-specialists, the options were discussed on how to provide via an online search tool information on all these archives and their holdings. The result of many lively and very interesting discussions is what now is the EHRI Portal. This Portal was launched and shared very proudly in Berlin in the event concluding EHRI’s first phase of funding (2010-2015). In its second phase of funding (2015-2019), tools were developed to enable institutions to create valid EAD, which stands for Encoded Archival Descriptions - the standard we use within the EHRI Portal and to which we map all collection descriptions which reach us - and to allow institutions to publish their descriptions in the EHRI Portal and wherever else they wish. We also worked on connecting related collections to each other in the Portal: Holocaust-archives are among the most copied archives in the world and therefore making the connection between different descriptions of the same materials, be it in original or copy, can help both researchers and archivists tremendously to understand the context and the content as we became more and more aware on how these descriptions are influenced by the time and place of where they have been described. All of these tools and methodologies have been further fine-tuned and continue to be further developed in our current work and all this time, the portal has been growing significantly, both in numbers as in quality and sustainability of the information we provide. It is really quite amazing to see where we started and where we stand today. It makes me very eager to continue the work and curious to see where we will stand in a couple of years from now and beyond!
Why is it important what you do? For users of the portal (for example Holocaust researchers) and also for the data providers, like archives?
Herminio: As mentioned before, the EHRI Portal gives the opportunity for archives to gain visibility and contextualise their collections into a big network of Holocaust sources. In addition, they can enter a big network of archives working on Holocaust material. For Portal users it really alleviates the exploration phase when looking for relevant archival material. Instead of searching across different archives, web catalogues and systems (which you need to learn how to use) you can get a unified view and search through the EHRI Portal, speeding up this process a lot. Of course, we do not have everything in the EHRI Portal, but it is always a good entry point before going for more specific searches.
Veerle: For the data providers, most often archivists, I think it is also really rewarding to see their meticulous work receiving more attention and visibility. Also in the EHRI partner institution where Herminio and I both work, Kazerne Dossin, we notice that our holdings are now discovered by many more people than before. Moreover, it is a very clear remuneration on all efforts that go into moulding all information on the historical records and objects into standardised formats. By merging information from so many different institutions into the EHRI Portal, the data providers can really feel the benefits of this unified approach, which allows the holdings they are caring for, to be in dialogue with others. It really creates a community and very many connections which otherwise would not be possible.
Do you travel a lot for this work, or is it mostly done behind a computer?
Herminio: At the beginning the EHRI Data Integration Lab was thought of as the EHRI Mobile Data Integration Lab even when the COVID-19 pandemic was at its peak. Maybe because of this mishap we realised that many of the work and contacts were possible just behind a computer and a great amount of data integration cases were achieved in this manner. Nevertheless, the lab is still a mobile one, meaning that it is still possible to go in-situ to advance data integration processes if it is deemed necessary. For example, in our regional hub, the migration countries hub, we have travelled to Israel and Canada to have a first contact with archives, present the benefits of the EHRI Portal and try to incorporate them to the EHRI Portal.
Veerle: Indeed, a lot is possible from a distance, but the magic happens most when people really meet. It is actually one of the paradoxes of virtual teams described in management literature: virtual teams work the best if they also meet in person. This is equally true for the EHRI Portal and the people we work with. Having an in-person introduction or a face-to-face meeting for a more complex step in the process, can not be overvalued, especially since this is what often creates very effective and efficient online exchanges afterwards.
What have you been working on most recently?
Veerle: Well, talking about travelling, we have worked intensively on Canadian and Israeli archives during in-person meetings at the archives themselves and that has been very special and successful. Another point of focus has been to also ensure that the whole EHRI consortium leads by example and is working with us to have their metadata in the Portal in an as sustainable way as possible. We strongly believe in “lead by example” while continuously reaching out to new people and institutions.
Herminio: It is difficult to just highlight something as many things run in parallel. However, following the last question there are some recent and nice results from the Canada trip in the form of new archival descriptions ingested in the EHRI Portal for institutions like the Alex Dworkin Canadian Jewish Archives and the Ottawa Jewish Archives. In Israel we had some legal difficulties that are now starting to be solved and we will see some collections from The Wiener Library for the Study of the Nazi Era and the Holocaust of the University of Tel Aviv integrated in the EHRI portal very soon.
What have been (or still are) the major challenges?
Herminio: Among all the challenges that we face, I can identify two main ones. Firstly, the communication aspect on how to promote the EHRI portal and convince new institutions to participate in the EHRI Portal. This is difficult not only because of the promotion aspect but also because for many institutions this is not their main business case and they are collaborating with us in their free time.
Veerle: Indeed, a national archive, for example, holds such a wide variety of sources and the Holocaust is just one of the very many topics they cover. They are overloaded with requests from all over the place and did not have this a specific “to do” on their own planning before EHRI knocked on their door, so then it is a little bit more of a challenge for them to find people available upon our request to work with us on this topic.
Herminio: The second challenge has to do more with the technical aspect, we are dealing with many different systems with different capabilities and in many cases with functionalities that archivists are not using on a daily basis (e.g., export options) or technical aspects of their system that they do not know (e.g., endpoints, APIs, etc.). Therefore, it is always difficult to communicate about these aspects and come to a common ground of understanding.
Veerle: And, I guess a third challenge would be the expectations of the users. EHRI authors some of the information in the Portal, but mostly brings together the information it receives. This means that the metadata provided are very diverse and divergent from one another. This is not always so easily understood by the users of the Portal, but at the same time, I think this is also a very good learning opportunity for researchers, who are not archivists, to understand that the metadata are in their way research results as well and that they are also created in their historical and institutional context. So, let’s make that last challenge the offer of a learning experience.
What have been your greatest achievements?
Herminio: I would not like to highlight a specific case as all the cases that we have treated are equally important. Instead, I would like to highlight that since the EHRI Data Integration Lab started to work (almost two years ago) we have integrated data from fifteen data providers in an automatic way and we still have some more cases under testing. All in all, I think this number is per se a great achievement.
Veerle: The Portal and its community. If this Portal could talk and tell us about how it has brought knowledge and expertise to both data providers and researchers and how it has created connections between information and people, that would be a wonderful story to hear!
What do you enjoy most about your work? And what is your least favourite aspect?
Herminio: I find it particularly enjoyable when people think that integrating their data will be extremely difficult, but then we design a data integration workflow which requires very little effort from their side. Then, they are very happy and grateful - and sometimes even surprised - of the solution that we delivered, but that is what technology should be about, making people’s lives easier. As for my least favourite aspect, I would highlight the required amount of messages exchanged in order to introduce the project, explain all the aspects of the data integration and polish the different details. Sometimes it is hard to keep track of everything.
Veerle: My least favourite aspect in the project had nothing to do with the project itself, but was without the least doubt the time of the pandemic. While we always did and always will do the bulk of our work from a distance and online, the “online only”-modus and the lack of in-person contact was really draining. So, it surely comes as no surprise that the most enjoyable part of the work is meeting people and getting to know their work, their institutions, the sources they preserve and open up for research and then, after having gone through the data integration process, being able to see this reflected in the EHRI Portal. When we met with several Canadian archives last year, it was their first introduction to the European Holocaust Research Infrastructure. We were overwhelmed with the work they do and with their readiness to cooperate with us, so I am very happy that six visits last year already resulted in four archives sharing their metadata in the EHRI Portal, the other two most likely to follow soon.
For less professional users of the Portal, terms such as LOD (Linked Open Data) or the VMT (Vocabularies Matching Tool) are quite a mystery. Could you explain what they mean in laymen terms and what the benefits are?
Herminio: Both of them refer to new things that we are introducing as part of the EHRI Work Package 9 on Data Identification and Integration. The VMT is a tool that allows to match a list of access points (i.e., keywords in archival descriptions) to terms in the EHRI vocabularies. The tool provides the user with a list of possible matches so the user can have the final decision of the links to be created. The main aim is to increase the coverage of access points linked to EHRI vocabularies terms by lowering down the effort needed on the user’s side. Linked Open Data (LOD) refers to a way in which the data is represented, allowing other data providers to connect from and to our data, enabling, as a final goal, the navigability and interoperability of the data no matter where it is located. This is quite appealing as this is the way in which we, as humans, navigate the web, unfortunately this is not possible for machines. Using Linked Open Data, we can offer the same experience not only for humans but also for machines. This would enable new use cases like expanding information in the EHRI Portal with information from other data providers without the need to host the data in our database. As you can imagine at this point, this has the potential of changing how we understand and run the data integration activities and that is exactly what we are exploring in the EHRI Portal.
Veerle: I would first like to change the “less professional users” into “otherwise professional users”! A Portal which is connecting so many pieces of information wishes to ensure that everything is as connected as possible. During my years in the project, I have become a sort of interpreter between various jargons and very job-specific jargon. Every time I hear something I do not know yet, another new term or abbreviation, I try to repeat in my own words what I understood from the explanation I got from the expert, asking them to correct me if I get it wrong. So, for me these magic words stand for ensuring that everything can be as interconnected and shareable as possible so that the information can reach the target audiences in the best way possible.
What is in stall for the future? Will the data integration ever be finished?
Veerle: No, never. But that is not a problem, on the contrary. This is a process, a process that never ends, and always keeps improving and evolving. That is actually the exciting part about it, we cannot foresee now what the possibilities will be in a year or a decade or a century from now…
Herminio: We will continue reaching out to more institutions and hopefully integrating their data in the EHRI Portal. Most probably, in the next EHRI phases, and when EHRI will be a permanent organisation, we are going to see a change of paradigm with the introduction of the EHRI National Nodes decentralising a bit our data identification and integration endeavours and potentially covering more archival holdings. Nevertheless, this task is by nature a never ending one as finishing this task would mean that we have nothing new to discover.
Is there anything else you would like to add or share? Perhaps a nice anecdote?
Veerle: When my colleague, Laurence Schram, and I had our appointment at the Ontario Jewish Archives in Toronto, a fire alarm went off at the very start of our meeting. When outdoors, following the instructions for this fire drill (as it luckily was just an exercise), we started talking with more staff working at the UJA (United Jewish Appeal) Federation of Greater Toronto where the archives are located. To our great surprise, we met Joshua (Josh) Otis there, the grand cousin of Natan Ramet, who is the founder of our institution in Belgium, Kazerne Dossin. Josh’s grandfather, Ben Otis, had met Natan’s sister as an allied Canadian GI in Antwerp after the liberation and the couple had settled in Canada. So, really the attempt to connect collections is also connecting people beyond what we anticipated for!