For the project implementation, there are multiple technologies that will take part of the expected solution. It is a main priority –as long as it is possible- that chosen technologies are open-source solutions. These technologies are widely supported by different software communities on the Internet.
The technologies to use in each of the blocks of the architecture are explained below.
- ETL processing
- Core Solution Component
- Visualisation Component
- Database Management System
- AWS Services
![talend-logo](/sites/default/files/styles/wysiwyg_one_third_width/public/inline-images/talend-logo.png?itok=DpY0ZGi_)
It is an open-source multiplatform software (Windows and macOS only) licensed under two different licenses based on the edition: Apache License 2.0 for the free one and proprietary for the paid ones.
Python
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic binding, make it very attractive for rapid application development, as well as for use as a scripting or glue language to connect existing components together.
Its philosophy emphasizes code readability and its syntax allows users to express complex concepts in fewer lines of code than in other languages, such as Java or C.
The Python interpreter and its extensive standard library are available in source or binary form free of charge for all major platforms and can also be freely distributed.
Selenium
![selenium](/sites/default/files/styles/wysiwyg_one_third_width/public/inline-images/selenium.png?itok=-27k8-3i)
Selenium is a framework for testing web applications. Its main use is to automatize web applications tests execution but its utilities also allow to simulate native human-like web navigation given the user the possibility to do a full interaction with the webpage.
It can be run against most modern web browsers and it is multiplatform.
It is open-source software under an Apache 2.0 license.
Beautiful Soup
Beautiful Soup is a Python library designed for HTML and XML documents parsing. This way, it builds a parse tree that can be used to extract data from HTML making it very useful for web scraping. Its main capabilities are:
- It provides a few simple methods for navigating, searching, and modifying a parse tree
- It automatically converts incoming documents to Unicode and outgoing to UTF-8, so there is no need to worry about encodings.
- It uses popular Python parsers like lxml and html5lib, giving different choices to think about.
Pandas
![pandas](/sites/default/files/styles/wysiwyg_half_width/public/inline-images/pandas.png?itok=pFUB-yW3)
Google Custom Search
![Google Custom Search](/sites/default/files/styles/wysiwyg_one_third_width/public/inline-images/googleCustomSearch_0.png?itok=RRNxiABs)
It is a private commercial product.
Logstash
![logstash](/sites/default/files/styles/wysiwyg_one_third_width/public/inline-images/logstash.png?itok=X4Xaiyl2)
It is open-source software licensed under Apache License 2.0.
Elasticsearch
![elasticksearch](/sites/default/files/styles/wysiwyg_one_third_width/public/inline-images/elasticksearch.png?itok=yg1owaxv)
Some of the parts of the software are open-source with mostly Apache License while other parts are commercial.
It is part of the Amazon Elasticsearch service used in this project
Kibana
![kibana](/sites/default/files/styles/wysiwyg_one_third_width/public/inline-images/kibana.png?itok=D4-jh9Mr)
It is open-source software licensed under Apache License 2.0.
It is part of the Amazon Elasticsearch service used in this project
4. Database Management System
PostgreSQL
PostgreSQL is a relational database management system. It is developed by a worldwide team of volunteers and none private organizations have control over it, being its source code available free of charge.
It is a very popular database that supports text, images, sounds, and video.
PostgreSQL is an open-source multiplatform software licensed under an own license (PostgreSQL License) that is an Open Source license between MIT and BSD licenses.
pgAdmin
pgAdmin is a visual management tool for PostgreSQL and derivative relational databases. It may be run either as a web or desktop application.
It is the most used solution for managing PostgreSQL databases. pgAdmin has a desktop and a server mode –the first one is focused on local use and the second one allows multiple users accessing over the web.
It is a multiplatform (Windows, Linux and macOS) open-source software licensed under the PostgreSQL licence.
5. AWS Services
Amazon Cognito
Amazon Cognito is a scalable access control service that provides the authentication, authorization and user management tools for web applications, simplifying the integration with the services used in the project (it is the recommended user service to use along with Kibana).
Amazon Elasticsearch
Amazon Elasticsearch Service (Amazon ES) is a managed service that eases the deployment, operation and scalability of Elasticsearch clusters in the AWS Cloud. Amazon ES is Amazon own implementation of Elasticsearch software, offering automation on clustering scale and other self-managed infrastructure options. This service includes Elasticsearch and Kibana software: