SQL Server Integration Services Video
19 page views
1536 video views
2024, February 24, Saturday

#068 How to Remove Duplicate Records in SQL Server Integration Services

Apresentamos nesse vídeo como remover os registros duplicados de uma tabela utilizando o SQL Server Integration Services.

We'll get to know the techniques:

1. Identify if there are duplicate records in the table (DISTINCT):

SELECT DISTINCT column1, column2
FROM your_table;

2. Display the execution plan by the global definition (SET STATISTICS XML):

set statistics xml on;
-- Your enquiry here
SET STATISTICS XML OFF;

3. Calculate values by using SQL Server (SELECT, OPERATIONS):SELECT

column1, column2

, column1 + column2 AS sum_result
FROM your_table;

4. Add Data Flow Task in Integration Services Project (DATA FLOW TASK):

In SQL Server Data Tools (SSDT), drag the "Data Flow Task" task to the flow control.

5. Get the data from the table in SQL Server (OLE DB SOURCE, DATA CONNECTION, DATA ACCESS MODE):In

the data

flow, use the "OLE DB Source" source and configure the connection and data access mode.

6. Sort the

data and enable the removal of all duplicate records (SORT, DUPLICATE SORT VALUES):In

the

dataflow, add a "Sort" transformation and check the "Remove rows with duplicate sort values" option.

7. Identify duplicate records by grouping the columns (AGGREGATE, GROUP BY, COUNT ALL):

Use the "Aggregate" transformation to count the records grouped by columns.

8. Select only duplicate records by CONDITIONAL SPLIT (OUTPUT): Use

the "Conditional Split" transformation to route records based on a condition (duplicate or not).

9. Remove duplicate records by Dynamic SQL Code (OLE DB COMMAND; SQL COMMAND, PARAMS, COLUMN MAPPINGS, DELETE FROM):

Use the "OLE DB Command" transformation with a dynamic SQL command to delete duplicates.

10. Insert the distinct records by Dynamic SQL Code (OLE DB COMMAND; SQL COMMAND, PARAMS, COLUMN MAPPINGS, INSERT INTO):-

Use the "OLE DB Command" transformation with a dynamic SQL command to insert distinct records.

11. Enable data visualization between tasks (DATA VIEWER):

- Use the "Data Viewer" tool to visualize data between tasks in debug mode.

12. Run the Integration Services package in the project (START, DESIGN):

- On SSDT, click "Start Debugging" to run the Integration Services package.

13. Display in-memory records (BUFFER) in data view:-

Use tools such as "Data Viewers" or "Derived Columns" to visualize data in the pipeline.

14. Identify if there are duplicate records in the table (DISTINCT, EXECUTION PLAN):-

Run a query using 'DISTINCT' and analyze the execution plan for optimization.

These are general guidelines and details may vary based on the specific version of SQL Server Integration Services and environment settings. Be sure to refer to the official documentation for more detailed information.

This content contains

Content Video
Language Portuguese
Duration 10m 52s
Subtitles Não
Reading time 2 min 5 seg

Fabio Santos

Data Scientist and Consultant for Digital and Analytics Solutions

Youtube Channel

@fabioms

#098 Como executar Pipeline em Loop no Azure Data Factory

2023, abril 30, domingo
Vídeo

Como criar e quanto custa o Azure Purview?

2023, março 21, terça
Vídeo