The purpose of this document is providing a guideline on how to use the module developed to extract, search and identify the products from the Rapid Exchange of Information System (hereinafter referred as RAPEX) database using Internet as data source.
The main objective of this application is to gather every product identified by RAPEX and to search them across the Internet in order to found web pages that are selling or promoting these articles online.
This solution has three main components:
- Alerts ETL (Extract, Transform and load) process, using Talend software.
- Core component, composed by:
- Alerts searching component, that will search every alert on the Internet by using google, based on user defined criteria.
- Scraping component that will extract the information from the web pages obtained by the searching component.
- Seller detection, using text mining techniques on the scraped sources that will analyse the content of the web pages in order to determine which of those sites are selling RAPEX products.
- Results KPIs visualisation, using ELK (Elasticsearch, Logstash, Kibana) stack a visualization of the obtained result can be analysed.
The documentation will be structure based on two different roles:
- RAPEX Administrator: this kind of user will be managing the configuration of each component, being an IT specialized profile.
- Business User: any final user that want to access to the final information provided by the system.