Free cookie consent management tool by TermsFeed Policy Generator

#142 How to map dynamic columns in Azure Data Factory

How to dynamically map columns in Azure Data Factory to adjust the match between source and target columns

Azure Data Factory (ADF) is a cloud data orchestration service that allows you to create, schedule, and manage data pipelines. Let's cover each of the mentioned topics:

1. Store files in Azure Data Lake:

  • Walkthrough:
    1. Access Azure Data Lake Storage Gen2 in the Azure portal.
    2. Create or select a container to store the files.
    3. Upload the files to the container.

https://dadosabertos.mec.gov.br/pnp?start=0

  • Example: You can upload CSV, JSON, Parquet, and other files to Azure Data Lake Storage Gen2.

2. Set Data Type Variable Array:

  • Step-by-step:

    1. In the Azure Data Factory pipeline editor, click "Variables" in the side menu.
    2. Create a new variable and set the type to "Array".
    3. Enter the desired values for the array.
  • Example:

    {
        "name": "ArrayOfValues",
        "type": "Array",
        "value": [
            "Value1",
            "Value2",
            "Value3"
        ]
    }
    

3. Set parameters in dataset:

  • Step-by-step:

    1. In the dataset editor, click "Parameters".
    2. Add the necessary parameters such as file name, path, etc.
  • Example:

    {
        "name": "fileName",
        "type": "String"
    }
    

4. Extract data from Azure Data Lake:

  • Walkthrough:

    1. In the pipeline editor, add a copy activity.
    2. Configure the source as Azure Data Lake and the destination as desired.
  • Example:

    "source": {
        "type": "AzureDataLakeStoreSource",
        "storeSettings": {
            "type": "AzureDataLakeStoreReadSettings",
            "recursive": true
        }
    }
    

5. Use JSON file lookup activity:

  • Step-by-step:

    1. Add a data read activity to your pipeline in Azure Data Factory.
    2. Configure the source as Azure Data Lake Storage or Blob Storage where the JSON file is located.
    3. Specify the file format as JSON and provide the path to the JSON file.
    4. Configure the activity properties according to your needs, such as data filtering, column projection, etc.
  • Example:

    "activities": [
        {
            "name": "ReadJSONActivity",
            "type": "Copy",
            "inputs": [
                {
                    "referenceName": "AzureDataLakeStorageLinkedService",
                    "type": "DatasetReference",
                    "parameters": {
                        "fileName": "example.json"
                    }
                }
            ],
            "outputs": [
                {
                    "referenceName": "OutputDataset",
                    "type": "DatasetReference"
                }
            ],
            "typeProperties": {
                "source": {
                    "type": "JsonSource"
                },
                "sink": {
                    "type": "AzureBlobSink"
                },
                "enableStaging": false
            }
        }
    ]
    

6. Store data in Azure SQL Database:

  • Walkthrough:

    1. Add a copy activity to the pipeline.
    2. Configure the source as desired and the target as Azure SQL Database.
  • Example:

    "sink": {
        "type": "SqlServerSink",
        "writeBatchSize": 10000,
        "writeBatchTimeout": "00:05:00"
    }
    

7. Map columns using dynamic content:

  • Step-by-step:

    1. During copy activity setup, map columns using dynamic expressions.
  • Example:

    {"type": "TabularTranslator",
    "mappings": [
      {"source": {"name": "Year"},"sink": {"name": "Year"}},
      {"source": {"name": "OU"},"sink": {"name": "OU"}},
      {"source": {"name": "Action"},"sink": {"name": "Government Action"}},
      {"source": {"name": "GND"},"sink": {"name": "GND"}},
      {"source": {"name": "Settled Expenses"},"sink": {"name": "Total Settlements"}}
    ]}
    

These steps and examples illustrate how to perform various operations with Azure Data Factory, from storing and extracting data to mapping columns and using specific activities.

This content contains
  • Content Video
  • Language Portuguese
  • Duration 20m 41s
  • Subtitles Não

  • Reading time 2 min 49 seg

avatar
Fabio Santos

Data Scientist and Consultant for Digital and Analytics Solutions


  • Share

Youtube Channel

@fabioms

Subscribe now