Dedupe
Introduction
Feature(s)
The Dedupe node allows you to remove duplicate data from the form and avoid calculating the same data repeatedly.
A maximum of ten Dedupe nodes can be added to one data stream in the data factory.
Application Scenario(s)
- When the input form contains subforms, data from the main form will be duplicated for each subform record. To calculate the main form data accurately, you can remove duplicate data through the Dedupe node before calculating.
- In business scenarios, the same data needs to be maintained in multiple copies. For example, there may be multiple records for the same customer data. If a Dedupe node is not added, there will be duplicate calculations.
Preview
If a purchase request form contains subforms, the total amount in the main form will be duplicated when you make calculations. Through the Dedupe node, you can deduplicate data according to the order number. Then only one record will be kept for each order, and any duplicate data will be removed.
Setting Procedure
Creating a Data Stream
Go to App Management > Data Factory, and click New Data Stream.
Adding Input Source
Click the Input node, and select the input form and fields in the corresponding form that need to be deduplicated.
Deduplicating Data
1. Add a Dedupe node.
Drag a Dedupe node from the left panel and connect the Input node to the Dedupe node.
2. Set Dedupe fields.
Dedupe fields are the criteria for deduplication. You can add multiple Dedupe fields, the relationship between these fields is AND. This means that only when all of these fields are duplicated in a record will that record be deleted.
For example, you can set the Order Number as the Dedupe field, then the form will keep only one record for each order. This prevents the repeated calculation of the total amounts.
Demonstration
After removing the duplicate data based on the order number, you can see that there are only three records in the form, compared to the original seven records.
Note(s)
1. At the Dedupe node, one record is retained at random for calculations.
2. If you add new data to the data that is being deduped, the retained data may be changed.