DG JUST is responsible for the rapid exchange of information (RAPEX) between Member States and EFTA / EEA countries and the European Commission about measures and actions taken at national level in relation to products posing a serious risk to consumers:
- Member States notify the measures they have taken against products identified as posing a risk to consumers (withdrawing them from the market, recalling it from consumers, etc.) and alert all other Member States, who in turn check their own markets to see if the products is also available in their territory.
- The products notified by Member States are taken from the most part from physical shops and markets.
- Also, the products notified in RAPEX by Member States are searched for by other Member States mostly in physical shops and only in a very inconsistent way in online web stores: Member States mostly search for specific products in case-by-case basis.
The objective of RAPEX Searcher solution developed by DIGIT is to provide a system able to search for reported RAPEX products in online web stores in an automatic and systematic way.
This solution supports market surveillance authorities including DG JUST and related member states to carry out online market surveillance tasks in a harmonized, automatic and systematic way by using a common system developed for their specific needs.
The RAPEX Searcher is composed of an ETL process, a searching, scraping and text mining engine and a dashboard.
The ETL (Extract, Transform and Load) process is used to download, process and store the XML source files into a database. The steps are:
- Download and store the list of weekly alert reports published by RAPEX.
- Download and store the alerts of each of the published reports (example).
The ETL is built with Talend Open Studio.
Searching, scraping and text mining engine
This component is :
- Searching the Internet, based on the alerts obtained from the ETL process .
- Scrapping URLs to obtain the HTML code.
- Using text mining to detect if the web page includes products for sale repored by alerts.
This processes is built with Python 3.7 and uses Google Custom Search API.
A dashboard component is used to visualise the results from the RAPEX Searcher analysis. This visualisation is done in Kibana, indexing some specific data into an Elasticsearch. The indexing is done thanks to Logstash.
These three applications are part of ELK Stack developed by Elastic.