Free cookie consent management tool by TermsFeed Policy Generator
  • SQL Server Video
  •    26 page views
  •   1884 video views
  • 2024, August 03, Saturday

#055 How to Identify Data Outliers in SQL Server

Apresentamos nesse vídeo com reduzir o impacto dos valores extremos em uma tendência central identificando os outliers dos dados utilizando o SQL Server.

We'll get to know the techniques:

1.

Common Table Expression (CTE):
  • A Common Table Expression (CTE) is a named temporary query that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It makes it easy to write and read complex queries. Example:

    WITH CTEExample AS (
     SELECT Col1, Col2
     FROM YourTable
     WHERE SomeCondition
    )
    SELECT*
    FROM CTEExample;
    

2. Query Table Records (SELECT):

  • SELECT is used to retrieve data from one or more tables in a database. Basic example:

    SELECT Column1, Column2
    FROM YourTable
    WHERE SomeCondition;
    

3. Calculate Quartile, Median, Interquartile, and Outliers (PERCENTILE_CONT):

  • The PERCENTILE_CONT is used to calculate percentile values, useful for statistical analyses. Example:

    SELECT
     PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY YourColumn) AS Quartile1,
     PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY YourColumn) AS Median,
     PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY YourColumn) AS Quartile3
    FROM YourTable;
    

4. Calculate upper and lower fences:

  • Upper and lower fences are often used to identify outliers. They can be calculated using the quartiles obtained earlier.
5. Get and add new Box Plot visual and

configure in Power BI:

  • In Power BI, go to the "Visualizations" tab and add the "Box plot" visual to your page. Configure it by selecting the appropriate columns for the axes.
6. Combine two tables

(CROSS APPLY):

  • CROSS APPLY is used to combine two tables, similar to INNER JOIN, but with some differences. Example:

    SELECT *
    FROM Table1
    CROSS APPLY Table2
    WHERE SomeCondition;
    

7. Conditional column (CASE, WHEN, ELSE):

  • The CASE statement is used to create conditional columns. Example:

    SELECT
     Column1,
     CASE
        WHEN Column2 > 10 THEN 'High'
        WHEN Column2 > 5 THEN 'Medium'
        ELSE 'Low'
     END AS Priority
    FROM YourTable;
    

8. Identify the outliers of the records (Power BI):

  • Use the Power BI graphical tools to visualize and identify the outliers in the Box Plot you added earlier. This will allow for easy visual analysis.

   

This content contains
  • Content Video
  • Language Portuguese
  • Duration 11m 5s
  • Subtitles Não

  • Reading time 1 min 30 seg

avatar
Fabio Santos

Data Scientist and Consultant for Digital and Analytics Solutions


  • Share

Youtube Channel

@fabioms

Subscribe now