
#Teambeam github full
Unfortunately, in practice good quality metadata is not always available, sometimes it is missing, full of errors or fragmentary. In order to provide such high-quality services, the library requires an access not only to the sources of stored documents, but also to their metadata including information such as title, authors, keywords, abstract or bibliographic references. Modern digital libraries support the process of studying the literature by providing intelligent search tools, proposing similar and related documents, building citation and author networks, and so on. The main reason for this is huge and constantly growing volume of scientific literature, and also the fact that publications are mostly available in the form of unstructured text.


Unfortunately, studying scientific literature, and in particular being up-to-date with the latest positions, is difficult and extremely time-consuming. Ignoring this task can result in deficiencies in the knowledge related to the latest discoveries and trends, which in turn can lower the quality of the research, make results assessment much harder and significantly limit the possibility to find new interesting research areas and challenges. Keeping track of the latest scientific findings and achievements, typically published in journals or conference proceedings, is a crucial aspect of the research work. We also thoroughly compare CERMINE to similar solutions, describe evaluation methodology and finally report its results.Īcademic literature is a very important communication channel in the scientific world. In this paper, we outline the overall workflow architecture and provide details about individual steps implementations. CERMINE system is available under an open-source licence and can be accessed at. The evaluation of the extraction workflow carried out with the use of a large dataset showed good performance for most metadata types, with the average F score of 77.5 %. The implementations of most steps are based on supervised and unsupervised machine learning techniques, which simplifies the procedure of adapting the system to new document layouts and styles.

The system is based on a modular workflow, whose loosely coupled architecture allows for individual component evaluation and adjustment, enables effortless improvements and replacements of independent parts of the algorithm and facilitates future architecture expanding. CERMINE is a comprehensive open-source system for extracting structured metadata from scientific articles in a born-digital form.
