doris+ 1.0

Latest release
1 year ago

DORIS is at corporate tool for the DGs of the Commission that aims to provide an accurate analysis and a more tailored visualisation of the results of Open Public Consultations. It can process results of surveys coming from EUSurvey and Better Regulation Portal. It allows users to analyze data from open and closed questions, and offers a dashboard through which users can visualize the results of the analysis.

This distribution is the new version of Doris, which is hosted in the cloud (AWS)

In the doris-python-code main folder folder you can find:

  • App.py: this script is called by the orchestration functions by command (in the command line) similar to:
    • python app.py -t reporting -e deloitte_dev -q FREE_TEXT –p 1 -i 0
  • Lambda folder: contains all the AWS lambda functions scripts
  • Tests folder contains all the inegration and unit tests as well as the needed input and output files.
  • Requirements.txt lists all the Python libraries that need to be installed for the code to work. Every Docker container running this code uses this file to get them installed.
  • Shared folder: contains the logic used by app.py and has the following folders:
    • Config folder contains:
      • A resources folder which stores the config variables. These are centralized variables that are passed to connectors or functions over several scripts.
        • Config_deloitte_dev.yaml is the main config file which contains:
          •  urls for e translation, elastic etc.
          • languages supported by e tranlate,
          • Indirectly (through AWS secret) paswords
          • ...
      • Application.py which controls all the connections to AWS services a.o. at initialization. It passes along the config defined in the yaml file (I.e. config_deloitte_dev.yaml)
      • Config.py
    • Connectors folder has all the scripts with functionality per connector/service used by the processors and etl scripts. These services are AWS Comprehend, AWS Translate, AWS DocumentDB, AWS Elasticsearch, E translation and AWS S3
    • Etl folder contains the scripts that fetch and write to these databases of DocumentDB and Elastic and pass the data for processing to the processors.
      • Batch_feedbacks_etl.py: is called directly from app.py and directs the actual processing like text analysis. It reads from the DocumentDB ‘feedbacks’ and ‘consultations’ collections and stores them after processing in the collection ‘feedbacks_processed’.
      • Batch_reporting.py: will fetch the feedbacks from ‘feedback_processed’ and direct the processors to transform and filter the fields to become suitable for reporting in the visualization tool (Kibana). Therefor it stores them after processing in Elasticsearch.
    • Processors folder has three processors that handle the feedbacks and consultation documents. They are split up according to the question type and are called by the ETL scripts. Batch_feedback_processor.py is the main class while the other two are child classes that thus inherit the same functions. Hereby they can be directly called from the ETL scripts.
      • Batch_feedback_processor.py handles closed questions (SINGLE_CHOICE and MULTIPLE_CHOICE) and also centralises all functionality that all the question types have in common like prepping general fields that have info over the consultations or the user. Also some functions involving text (like language detection) are centralised here as both the child processors use it.
      • Batch_feedback_processor_file_upload.py has attachment-specific functions like mapping & filtering appropriate fields, connecting to S3 for the attachments and handling topic detection of these attachments.
      • Batch_feedback_processor_free_text.py on the other hand handles specific free text analysis like mapping & filtering appropriate ‘freetext’  fields and text analysis methods with AWS Comprehend that detect keyphrases, sentiment and entities.
    • Schemas > elastic >
      • Feedbacks.json: Contains the schema with the fields for the index in Elasticsearch and Kibana. Here the names and types of the fields are defined.

In the doris-general-code folder you can find:

  • api_mapping folder: this folder contain the mapping template for the post method of API Gateway. This mapping is used to map API call to JSON used by the lambda function.
  • cloud9 folder:   this folder contains a set of script using to deploy a cloud9 environment similar than the one used to develop DORIS.
  • docker folder: contains all the scripts to deploy a container to AWS ECS. The container created will be executed by AWS Batch.
    • To generate a container run ‘create_docker_image.sh’.
    • Requirement.txt contains all the python dependency used by the container
    • The container will create a copy of the master branch of AWS commit
  • Step functions folder: contains all the step functions code used to orchestrate the flow for reference