In the recent two decades, one could observe an emergence of various integrated frameworks and toolkits designed for assembling natural language text processing chains. For the purpose of this report, we decided to select a subset of such frameworks in order to examine the meta-data standards for processing open source information. We used the following criteria in the selection process:
Coverage: The range of the processing components covered
Popularity: The number of users, initiatives and projects which use the framework for research or operationally
Project Health: The frequency of how new upgrades and features are released.
Multilinguality: The range of languages and language-specific resources covered
Portability: How usable the software is in different environments.
Modularity and Workflow Management: How modular the software is and how the software supports assembling dedicated text processing chains. Especially important is the combination of core processing components with clearly defined I/O specifications.
In principle, two frameworks are specialised for natural language processing and stand out vis-a-vis the criteria above, namely Apache UIMA (Unstructured Information Management Architecture) and GATE (General Architecture for Text Engineering). We will briefly describe both frameworks in the following chapters. Additionally, we will describe the KNIME Analytics Platform. This platform provides a framework aimed more broadly at data processing than only natural language processing.