Orchestration Components

For orchestration two main components are used: the Lambda and Step functions - These two components are key tools for a serverless application. Serverless allows you to run applications and services without managing the underlying server. One of the main advantages is that the server is only used/paid for when it runs the application. It is "serverless" in the sense that the application does not require a constantly running service managed by the user.

AWS Step Functions

AWS Step Functions coordinates multiple AWS services into serverless workflows. It executes the ETL workflow in parallel or sequentially and track input, output and state between the different services in use.

AWS Step Functions is based on the concepts of tasks and state machines. A state corresponds to each step in your workflow. It can refer to an AWS resource referenced by a specific ARN, or to a specific logical task such as the wait, condition, failure state, and so on. You define state machines using the Amazon States Language in JSON. This means that the workflow of an entire step function can be easily replicated from one environment to another. Although developed in code, the Step Functions console displays a graphical view of the workflow that represents the different states, tasks, and their relationships/sequences. This graphical view not only allows one to understand the workflow but also to track the progress of the ETL during execution and identify successes/issues. Besides, by integrating AWS Simple Notification Services into the workflow, the AWS Step Function automatically notifies the main user of the success or failure of the flow.

One of the shortcomings of AWS Step Functions is that it cannot be intelligently restarted. In case of failure, the ETL must be completely run from the start again. Since the complete workflow requires a lot of computation, the flow is divided into 7 parts: The main stream, part 1 of the open questions, part 1 of the file download, part 1 of the closed questions, part 2 of the open questions, part 2 of the file download and part 2 of the closed questions. The master stream triggers the different step function in parallel (for the type of question) and sequentially (for the part).

Part 1 is the analytical part of the workflow. The workflow extracts from DocumentDB and enriches the feedbacks by detecting the language, translates it into English, detects the topics and performs a sensitivity analysis. It then stores the enriched feedbacks in AWS DocumentDB.  Part 2, extracts the transformed feedbacks, organizes them by consultation and loads them into the AWS elastic search. A more schematic overview can be seen in the figure below

schematicOverviewOfTheOrchistration
Schematic Overview Of the Orchestration

 

AWS Step functions overview

The step functions are described as follow:

 

Step Function

Question type

Description

DorisStateMachineDev-MASTER

All

This step function triggers the different step functions per question type in parallel and per part sequentially.

DorisStateMachineDev-FREE_TEXT      

Free text

This step function processes and analyses all new free text components since the last successful run. The feedbacks are translated to English, and enriched with sentiment analysis and topics. Finally, the processed feedback is stored in DocumentDB.

DorisStateMachineDev-FREE_TEXT-P2

Free text

This step function loads the processed free text feedbacks to Elasticsearch.

DorisStateMachineDev-CLOSED              

Closed question

This step function processes and analyses all new closed questions since the last successful run. At the end, the processed feedback is stored in DocumentDB.

DorisStateMachineDev-CLOSED - P2

Closed question

This step function loads the processed free text feedbacks to Elasticsearch.

DorisStateMachineDev-FILE-UPLOAD  

File Upload

This step function processes and analyses all attachment file upload questions that were loaded since the last successful run. The attachments are translated to English, and it enriches the data with sensitive analysis/topics. The uploaded file will be retrieved, and translated in base64.  Finally, the processed feedback is stored in DocumentDB.

DorisStateMachineDev- FILE-UPLOAD - P2

File Upload

This step function loads the processed file upload feedbacks to Elasticsearch.

AWS step function graphical view and explanation

The explanation of the workflow and different step functions for open and file upload questions are explained together as their orchestration is similar. However, close question orchestration is simplified, as no translation is required. As a reminder, the step functions are split by question type and between analytics and loading part.

Part 1: Analytics

  • Open Question & attachment
Open question step function analysis
Open question step function analysis

The following state machine can be divided into 6 parts:

  1. The step function will trigger the lambda that orchestrates the feedback to be processed. This lambda function will retrieve the feedbacks identifier loaded since the last ETL run of the question type of step function in DocumentDB. Doris+ then divides the feedbacks into two categories: one with text to be processed and one with empty feedbacks.  The lambda function will then subdivide this to part them into batches of 1000 ids (or 10 consultations for attachment) and load the list of identifiers into multiple AWS S3 files. Finally, the lambda will monitor the start of execution.  As an output, the lambda function returns the number of batches created per category.

 

  1. The step function triggers AWS batch jobs for feedback containing text. AWS Batch jobs are of two types. Either single job or array jobs. Array jobs are multiple single jobs running in parallel; the step function triggers the array jobs if at least two batch files of the question type need to run. At this point, a container image running within AWS batch will send all documents to the eTranslation module for translation and store all this feedback with status untranslated in DocumentDB. eTranslation will send asynchronously the translation to the Doris API gateway, and a lambda function will add the translated text and update the status in DocumentDB.
  2. In this step, the step function will select whether the AWS batch needs array jobs or single jobs for the empty text. The AWS batch will process the feedback and store it in DocumentDB. Finally, it will monitor the end of the workflow for the empty job.

 

  1. The step function waits 3 to 30 minutes for all feedbacks to be translated by eTranslation. It will then check if all feedbacks have been received asynchronously. Next, it will trigger part 2 if it is successful, or it will either wait another three minutes (maximum 10 times) or send back the untranslated feedbacks. A lambda will resend the feedbacks, wait and check again if they are all translated. It will then go either to part 2 or to failure status as soon as it receives the complete translation.

 

  1. The second part of the feedback processing will use the feedbacks translated entirely into English to find the topics in the text and make a sentiment analysis with AWS Comprehend. Finally, the transformed feedbacks are stored in DocumentDB. It also monitors the end of the workflow for translated feedbacks.

 

  1. Finally, the step function will monitor the success or catch any error within the flow. In both cases, the step function can notify users per email on the status.
  • close question
closedQuestionStepFunctionalAnalysis
closed question step functional analysis

The following state machine can be divided into 3 parts:

  1. The step function triggers the lambda that orchestrates the feedbacks to be processed. This lambda function retrieves the feedback identifiers loaded since the last execution of the closed question in DocumentDB.  The lambda will then divide them into batches of 1000 ids and load the list of identifiers into multiple AWS S3 files. Finally, the lambda will monitor the start of execution. As an output, the lambda function returns the number of batch files created.

 

  1. In this step, the step function will select whether the AWS batch needs array jobs or single jobs. The AWS batch will process the feedback and store it in DocumentDB. Finally, it will monitor the end of the workflow for the empty job. If no feedbacks were found since the last load, it will monitor the load as successful.

 

  1. Finally, the step function will monitor the success or catch any error within the flow. In both cases, the step function can notify users per email on the status.

Pasrt 2

A. Open question & attachment

one question step function loading
one question step function loading

 

The following step function can be divided into 4 parts:

  1. The step function triggers the lambda that orchestrates the feedbacks to process for the loading part. This lambda function will retrieve the feedback identifier files loaded in S3 during the last execution of the analysis part. As an output, the lambda function returns the number of batches created per category (empty or not) loaded during the analysis part.

 

  1. In this section, the step function will select whether the AWS batch needs array jobs or single jobs. The AWS batch will extract the feedback, rename columns, aggregate feedbacks into consultations and store it in AWS Elastic search. Finally, it will monitor the end of the workflow.

 

  1. In this section, the empty feedback will be similarly inserted to elastic search.

 

  1. Finally, the step function will monitor the success or catch any error within the flow. In both cases, the step function can notify users per email on the status.

 

B. closed question

dorisPlus-closedQuestionStepFunctionAnalysis
closed question step function analysis

The following step function can be divided into 3 parts:

  1. The step function will trigger the lambda that orchestrates the feedbacks to process for the loading part. This lambda function will retrieve the feedback identifiers files number loaded in S3 during the last execution of the analysis part for close questions. As an output, the lambda function returns the number of batches loaded during the analysis part.

 

  1. In this section, the step function will select whether the AWS batch needs array jobs or single jobs. The AWS batch will extract the feedback, rename a column, aggregate feedbacks into consultations and store it in AWS Elastic search. Finally, it will monitor the end of the workflow. The insertion in AWS elastic search is at the feedback identifiers and consultation identifiers granularity.

 

Finally, the step function will monitor the success or catch any error within the flow. In both cases, the step function can notify users per email on the status.

AWS Lambda Functions

AWS Lambda executes codes without having to provision or manage a server. Indeed, you pay only for the compute time you consume. It runs based on triggers received called events. The event sources can vary from step function triggers for the orchestration of the ETL flow to API gateway receipt of REST calls.

Step Function

Part of the process

Description

lambda_feedbacks_orchestrator.py

Orchestration

This lambda function will retrieve the feedbacks identifiers loaded since the last ETL run of the step function question type in DocumentDB. DORIS divides the feedbacks into two categories: containing text to be processed or not (empty).  The lambda will then subdivide them into batches of 1000 ids (or 10 consultations for attachment) and load lists of identifiers into a multiple AWS S3 files. Finally, the lambda will monitor the start of step-function execution.  As an output, the lambda function returns the number of batches created per category.

 

lambda_feedbacks_orchestrator_p2.py

Orchestration

This lambda function will retrieve the feedback identifiers files number loaded in S3 during the last execution of the analysis part. As an output, the lambda function returns the number of batches created per category (empty or not) loaded during the analysis part.

 

lambda_process_done.py

Monitoring

This lambda will monitor the success or failure of the ETL. It will retrieve the run id of the step function and monitor the full run across all steps of the ETL.

eTranslation_to_mongo.py

Translation

This lambda will be triggered by the reception of a translated feedback from eTranslation to the API gateway endpoint. Upon reception, it will transform the urlencoded output to a JSON, update the status of the feedbacks to ‘translated’ and store the translation in DocumentDB.

eTranslation_to_mongo_attachment.py

Translation

This lambda will be triggered by the reception of a translated document from eTranslation to the API gateway endpoint. Upon reception, it will transform the base64 output to a JSON; update the status of the feedbacks to ‘translated’ in DocumentDB and store the translation in S3.

eTranslation_resent_translation.py

Translation

This lambda recovers all feedbacks sent to be retranslated.
This step will send them again to eTranslation; wait and check for the full reception of the translation. It will return a success/failure status upon the full reception of all translation.

eTranslation_translation_error.py

Translation

This lambda will process the error received from eTranslation.

eTranslation_validation.py

Translation

This lambda check if all feedbacks translation have been received asynchronously from eTranslation. This lambda retrieves all feedbacks sent to be retranslated later by eTranslation call (with a status ‘NOT TRANSLATED’.). If any exist, return false.