Navigation path

Czech Digital Library and the Kramerius Open Source System

(
 
)
1/5 | 1 votes |

Kramerius is the name of an open source database application which could be described as a content management system (CMS). The application is designed to provide access to digital documents either on a local network or over the Internet.

Kramerius is primarily intended for digitized library collections, monographs, and periodicals. However, it can also be used for other types of documents, such as maps, musical scores, and illustrations, and parts of documents, such as articles and chapters. The system is also suitable for documents that were created in electronic form [1].

Kramerius is one of several components in an unusual open source software infrastructure used in the Czech Republic for tasks related to digitizing, archiving, and making library content accessible. It is an example of the implementation of an important and extensive open source software solution funded from a national budget.

Digitization in the Czech Republic

One of the impulses that led to the creation of the Kramerius system was the floods of 2002, which damaged a large number of books in Czech archives and libraries. In the process of recovering the damaged documents it was decided to make digital copies publicly accessible.

This digitization process is now well under way in Czech libraries, the chief motivation being to protect historic documents and provide comfortable access for users. A number of libraries at various institutions in the Czech Republic are carrying out their own digitization projects, often with the help of the VISK grant program administered by the Czech Ministry of Culture. The main projects include the National Digital Library and the Czech Digital Library.

Within the Czech Digital Library project, Kramerius is supplemented by two other open source components:

  • RDflow
  • ProArc

Both sub-projects are licensed using the GPL as is the Kramerius system, see below. New features for all three components are planned and developed together.

Václav Matěj Kramerius [sidebar]

Václav Matěj Kramerius (1753–1808), after whom the Kramerius system is named, was a Czech writer, journalist, and publisher. At a time when there was only a single Czech newspaper in print, Kramerius started his own paper and, following its commercial success, established a printing shop and publishing house for Czech-language works. The majority of Czech-language books were published by his publishing house at that time.

RDflow

RDflow is a digitization workflow management program linked to the Digitization Registry, which is a system for keeping records of digitized documents and monitoring the digitization process. The Registry records all digitization activities related to library collections in the Czech Republic, helping to prevent duplication and aiding collaboration with other library information systems. A central installation is operated by the National Library of the Czech Republic at registrdigitalizace.cz [2]. RDflow communicates automatically with the Registry, and allows users to keep working on occasions when the Registry is unavailable.

ProArc

ProArc is a production and archiving system that has been developed with the support of the Ministry of Culture since 2011. The program is used to create digital documents in compliance with standards set by the National Library of the Czech Republic. ProArc handles the archiving of documents for long-term preservation (LTP) using accepted standards such as those from ISO and OAIS (Open Archival Information System).

Digitization Registry [sidebar]

The Digitization Registry is a joint project of the National Library of the Czech Republic, the Library of the Academy of Sciences and the Czech IT company INCAD. The aim of the initial project was to create a national registry of digitized documents which would serve to control the digitization workflows of individual institutions, avoid duplication and facilitate the sharing of the results.

With the scheduled transition to mass digitization using a robotic scanner system, the registry will be expanded to include automated data harvesting and subsequent processing and recording. The system is based on an RIII (J2EE) application framework and the data is stored in an Oracle relational database. To access the latest information about the state of digitization, end users use the Fast search engine. All user access is through a web application.

When creating the system, the emphasis was on robustness and ease of extensibility. The solution builds on other systems used in libraries: Aleph, Kramerius, and Sirius. The implementation team is currently working on version R4, which will be available as open source. The solution is compatible with other Central European registries which are cooperating on the digitization of documents [3].

National Digital Library

The National Digital Library is a common term for the activities of the National Library of the Czech Republic that focus on digitization and subsequent projects for making accessible the treasures of the nation's libraries. The most important of these activities are included in a project called Creating A National Digital Library which is funded, among other sources, from the Structural Funds of the European Union.

The goal of this project is digitization, long-term preservation, and making accessible the greater part of the resources of the National Library of the Czech Republic and the Moravian Library. With the help of robotic scanners, over 50 million pages, or approximately 300 thousand volumes, should be digitized by 2019.

The conditions of the project require a major change in the digitization procedures and metadata formats used to date. The existing, proprietary, closed document type definitions (DTD) will be replaced by new, standardized formats compatible with those used in other libraries and similar projects concerning digitization and long-term preservation of digital data. The deployment of the new version 4 of Kramerius will accompany this change-over.

Czech Digital Library

The Czech Digital Library is an umbrella project whose main goal is to aggregate digital (and especially digitized) content from all kinds of digital libraries in the Czech Republic. It is not concerned directly with the process of digitization; instead its purpose is to pool digital documents and provide an infrastructure for accessing the digital resources of all Czech libraries, including data acquired as a part of the National Digital Library project.

The resulting system will collect metadata which will be made available through a unified full-text search interface. The system will allow dynamic loading of complete text sources, so it will not be limited to indexing. Europeana portal – the Czech Digital Library project will serve as an aggregator for the Czech Republic.

EU perspective [sidebar]

The European Commission promotes a program called the Digital Libraries Initiative. This aims to bring together all digital library resources from member states for the purposes of research and study [4].

Europeana.eu is a web portal which provides a single access point to digitized records from numerous European cultural and scientific institutions. The database includes millions of books, paintings, films, museum artifacts, and archive documents. The project, which is funded through various programs of the European Commission and by ministries of education and culture of individual EU states, is available now, though it is still in development. The Czech content will be provided by the Czech Digital Library.p

Development and Future of the Kramerius System

The Kramerius development team comprises employees of the:

  • Library of the Academy of Sciences of the Czech Republic (ASCR)
  • National Library of the Czech Republic, and
  • Moravian Library in Brno.

The project's technology partner, chosen by public tender, is INCAD, s. r. o., led by Pavel Kocourek. The project's guarantor is the Library of the Academy of Sciences of the Czech Republic, represented by its director Martin Lhoták.

The Kramerius system, including RDflow and ProArc, is distributed under the GPL (GNU General Public License). The GPL was chosen because it is recognized as a de facto standard among free-software licenses and because the creators of the software wanted to avoid any ambiguity over its free and open nature.

According to Martin Lhoták, the decision to use an open source license was a good choice, especially considering that the software will be used in libraries: "Kramerius was created to fit our own needs right from the start, as no other suitable solution was available at the time, either commercial or freely available. The project was founded with funding for an indeterminate period of time. When this system for making digital and digitized documents accessible was created, other libraries took into consideration the fact that it had been created by the Library of the Academy of Sciences of the Czech Republic and that the same institution was using it for its own documents. They considered this a good reason to start adopting the software for their own needs. Trust in open source products is based on who develops them and how many users already use them."

Development was started by the National Library of the Czech Republic in 2003. The Library of the ASCR has contributed to the project right from the start, and took over as the main development coordinator in the project's first year. Another landmark in 2003 was the selection of the open source Fedora repository as the basis of the system. INCAD was subsequently chosen to continue the development. The current version of the system is Kramerius 4.

Pavel Kocourek of INCAD explains: "We have been following the digital library community, and out of the several possible solutions we believed at the time that a system based on Fedora was the most suitable for our purposes. We still believe that it was a good choice."

Fedora Repository Project [sidebar]

Fedora (Flexible Extensible Digital Object Repository Architecture) was developed by researchers at Cornell University as a flexible, extensible architecture for storing, managing, and accessing digital content in the form of digital objects. Fedora defines a set of abstractions for expressing digital objects, asserting relationships between them, and linking "behaviors" (i.e. services) to digital objects.

The Fedora Repository Project implements the Fedora abstractions in a robust open source system. Fedora provides a core repository service that is exposed as web-based services with well-defined APIs. In addition, Fedora provides an array of supporting services and applications including search, OAI-PMH, messaging, administrative clients, and more. Fedora provides RDF support and the repository software is integrated with semantic triple store technology, including the Mulgara RDF database. Fedora includes features which help to ensure the durability of digital content [5].

As of 2012, the development of Kramerius has funding through to 2015 thanks to the Czech Digital Library and Tools for Complex Digitization Processes project, which is part of the Program of Applied Research and Development for National and Cultural Identity operated by the Czech Ministry of Culture.

The pre-approved amount to be spent on further development within the Czech Digital Library project (which includes the development of Kramerius, RDflow, and ProArc) is CZK 20 million (approximately €800,000) [6]. Considering the fact that this investment enables libraries, archives, and universities to manage the entire process of digitizing the majority of the important printed documents in the possession of Czech institutions, from start to finish, it is a remarkable and extremely economical achievement.

Features

The simple set of features originally requested has been extended in recent years through several development projects. Kramerius is continuously fine-tuned to ensure that the structure of the metadata used for digital documents corresponds with the standards published by the National Library of the Czech Republic [7].

The system's web interface lets users perform metadata and full-text searches, generate multi-page PDF documents from selected pages, create virtual collections, and carry out other operations on top of the stored collection of digital documents.

Who Uses Kramerius

The Kramerius system is currently the main and most frequently used application solution for providing access to digital documents in large Czech libraries and archives. Besides the Library of the Academy of Sciences of the Czech Republic and the National Library of the Czech Republic, Kramerius is used by a number of public libraries and archives, including the Moravian Library in Brno, the Municipal Library in Prague, the National Film Archive and many other important institutions [8].

Even though libraries do not, in general, oppose open solutions, Kramerius has the additional advantage of being backed by the Library of the Academy of Sciences of the Czech Republic and the National Library of the Czech Republic. The fact that these well-known organizations stand behind the system helps to convince other libraries and organizations not to be afraid to try it.

Mr. Martin Lhoták summarizes the (open source) software decision-making process: "When deploying an open source solution, you have two choices: start developing the software for your own needs, set aside some money from the budget for the project, and later offer the product to others, or use an existing project that fits your needs. When considering the latter option, you need to see a lively community supporting the project and a critical mass of users – a sufficiently broad user base. We would hesitate to use an open source project for a critical application if we knew that it was being developed by a single organization and not used by anyone else. In that case we would require a long period of testing and an assessment of whether the product would be further developed in the future. Unless a system has enough users, there is always the risk of the product losing funding and becoming a dead end."

Selected users of the Kramerius system in the Czech Republic

  • Academy of Sciences
  • Moravian-Silesian Research Library, Ostrava
  • Moravian Library
  • Research Library, Olomouc
  • State Technical Library
  • František Bartoš Regional Library Zlín
  • Mendel University of Agriculture and Forestry, Brno
  • The Research Library of South Bohemia, Ceske Budejovice
  • National Film Archive
  • National Medical Library
  • Municipal Library, Prague
  • The North Bohemian Research Library
  • Library of the Theatre Institute

Open Source and Digitization in the Czech Republic

The massive, coordinated digitizing effort currently under way in the Czech Republic relies on a set of sophisticated open source tools which help to control and administer the entire workflow, starting with process management and ending with the presentation of the final product. All the components of the system have been developed by the public-sector organizations involved in the project.

The institutions which create the tools used for digitization choose open source licensing for their products mainly on account of the well-known characteristics of the open development and distribution model: low acquisition costs for individual institutions and organizations, prevention of vendor lock-in, flexibility arising from the openness of the source code, and a guarantee of continuity.

The fact that the entire infrastructure of the extensive software ecosystem developed and used in this endeavor was conceived as an open source environment means that all the components, as well as the entire system, can be reused in similar or different scenarios, or modified as necessary to fit future requirements. The open source development model is the only method of achieving such results.

Other notable Czech projects related to open source in libraries [sidebar]

Opensource.knihovna.cz The main aim of this portal is to promote expert discussion of open source as alternatives to proprietary solutions in libraries in general. The founders of the project believe that the ongoing economic recession and the consequent cuts in public library budgets present an opportunity for open source to prove its worth by helping to maintain or even enhance the quality of services while reducing costs. To that end, the portal seeks out, catalogues, and introduces quality applications, solutions, and tools with the potential to aid librarians in their mission to provide access to knowledge and information [9].

 

Sources

[1] Kramerius 4. In: Kramerius system [online]. c2012 [cit. 2012-08-24]. Available from: http://code.google.com/p/kramerius/>.
[2] RegistrDigitalizace.CZ. In: RegistrDigitalizace.CZ [online]. c2012 [cit. 2012-08-24]. Available from: http://www.registrdigitalizace.cz/rdcz/.
[3] VANDASOVA, Anna. About digitalization registry. In: Kramerius Information Portal [online]. Praha: National Library of the Czech Republic. C2008–2012, upd. 2012-11-06 [cit. 2012-11-07]. Available from: http://kramerius-info.nkp.cz/digitalization-in-czech-republic/references-to-kramerius-systems/view?set_language=en.
[4] Digital Agenda for Europe: Digital Libraries Initiative. In: ec.europa.eu [online]. Luxembourg: European Commission, Information Society and Media DG, Access to Information Unit. c2012 [cit. 2012-12-23]. Available from: http://ec.europa.eu/information_society/activities/digital_libraries/index_en.htm.
[5] Fedora Commons Repository Service. In: fedora-commons.org [online]. Winchester: Fedora Commons, Inc. c2012 [cit. 2012-08-24]. Available from: http://www.fedora-commons.org/.
[6] LHOTÁK, Martin. Systém Kramerius jako řešení pro Českou digitální knihovnu. In: skipcr.cz [online]. Praha: SKIP. c2012, upd. 2012-11-29 [cit. 2012-12-23]. Available from: http://www.skipcr.cz/dokumenty/akm-2012/Lhotak.pdf.
[7] HUTAŘ, Jan. Nové standardy digitalizace (od roku 2012). In: Národní digitální knihovna [online]. Praha: Národní knihovna ČR. c2001–2012, upd. 2012-06-01 [cit. 2012-08-24]. Available from: http://ndk.cz/digitalizace/nove-standardy-digitalizace-od-roku-2011.
[8] Links to the Kramerius System. In: Kramerius Information Portal [online]. Praha: National Library of the Czech Republic. c2008–2012 [cit. 2009-08-24]. Available from: http://kramerius-info.nkp.cz/digitalization-in-czech-republic/references-to-kramerius-systems/view?set_language=en.
[9] DENÁR, Michal. O projektu. In: Opensource.knihovna.cz [online]. Brno: Opensource.knihovna.cz. c2012, upd. 2012-09-05 [cit. 2012-12-23]. Available from: http://opensource.knihovna.cz/index.php/o-projektu.

Information

Case type:
Open source case study
Themes:
Business and Competition, Communications, Finance, Free Movement of Capital, Labour Market, Marketing, Monetary Economics, Prices, Social Questions, Trade Policy