Free cookie consent management tool by TermsFeed Policy Generator

How to Remove Duplicate Records in SQL Server Integration Services

Apresentamos nesse vídeo como remover os registros duplicados de uma tabela utilizando o SQL Server Integration Services.

We'll get to know the techniques:

1. Identify if there are duplicate records in the table (DISTINCT):

SELECT DISTINCT column1, column2
FROM your_table;

2. Display the execution plan by the global definition (SET STATISTICS XML):

set statistics xml on;
-- Your enquiry here
SET STATISTICS XML OFF;

3. Calculate values by using SQL Server (SELECT, OPERATIONS):SELECT

column1, column2
, column1 + column2 AS sum_result
FROM your_table;

4. Add Data Flow Task in Integration Services Project (DATA FLOW TASK):

  • In SQL Server Data Tools (SSDT), drag the "Data Flow Task" task to the flow control.

5. Get the data from the table in SQL Server (OLE DB SOURCE, DATA CONNECTION, DATA ACCESS MODE):In

the data
  • flow, use the "OLE DB Source" source and configure the connection and data access mode.
6. Sort the

data and enable the removal of all duplicate records (SORT, DUPLICATE SORT VALUES):In

the
  • dataflow, add a "Sort" transformation and check the "Remove rows with duplicate sort values" option.

7. Identify duplicate records by grouping the columns (AGGREGATE, GROUP BY, COUNT ALL):

  • Use the "Aggregate" transformation to count the records grouped by columns.

8. Select only duplicate records by CONDITIONAL SPLIT (OUTPUT): Use

  • the "Conditional Split" transformation to route records based on a condition (duplicate or not).

9. Remove duplicate records by Dynamic SQL Code (OLE DB COMMAND; SQL COMMAND, PARAMS, COLUMN MAPPINGS, DELETE FROM):

  • Use the "OLE DB Command" transformation with a dynamic SQL command to delete duplicates.

10. Insert the distinct records by Dynamic SQL Code (OLE DB COMMAND; SQL COMMAND, PARAMS, COLUMN MAPPINGS, INSERT INTO):-

Use the "OLE DB Command" transformation with a dynamic SQL command to insert distinct records.

11. Enable data visualization between tasks (DATA VIEWER):

- Use the "Data Viewer" tool to visualize data between tasks in debug mode.

12. Run the Integration Services package in the project (START, DESIGN):

- On SSDT, click "Start Debugging" to run the Integration Services package.

13. Display in-memory records (BUFFER) in data view:-

Use tools such as "Data Viewers" or "Derived Columns" to visualize data in the pipeline.

14. Identify if there are duplicate records in the table (DISTINCT, EXECUTION PLAN):-

Run a query using 'DISTINCT' and analyze the execution plan for optimization.

These are general guidelines and details may vary based on the specific version of SQL Server Integration Services and environment settings. Be sure to refer to the official documentation for more detailed information.

This content contains
  • Content Video
  • Language Portuguese
  • Duration 10m 52s
  • Subtitles Não

  • Reading time 2 min 5 seg

avatar
Fabio Santos

Data Scientist and Consultant for Digital and Analytics Solutions


  • Share

Youtube Channel

@fabioms

Subscribe now