Azure Data Factory Video
1 page views
426 video views
2025, February 19, Wednesday

#142 How to map dynamic columns in Azure Data Factory

How to dynamically map columns in Azure Data Factory to adjust the match between source and target columns

Azure Data Factory (ADF) is a cloud data orchestration service that allows you to create, schedule, and manage data pipelines. Let's cover each of the mentioned topics:

1. Store files in Azure Data Lake:

Walkthrough:
1. Access Azure Data Lake Storage Gen2 in the Azure portal.
2. Create or select a container to store the files.
3. Upload the files to the container.

https://dadosabertos.mec.gov.br/pnp?start=0

Example: You can upload CSV, JSON, Parquet, and other files to Azure Data Lake Storage Gen2.

2. Set Data Type Variable Array:

Step-by-step:
1. In the Azure Data Factory pipeline editor, click "Variables" in the side menu.
2. Create a new variable and set the type to "Array".
3. Enter the desired values for the array.

Example:

{
    "name": "ArrayOfValues",
    "type": "Array",
    "value": [
        "Value1",
        "Value2",
        "Value3"
    ]
}

3. Set parameters in dataset:

Step-by-step:
1. In the dataset editor, click "Parameters".
2. Add the necessary parameters such as file name, path, etc.

Example:

{
    "name": "fileName",
    "type": "String"
}

4. Extract data from Azure Data Lake:

Walkthrough:
1. In the pipeline editor, add a copy activity.
2. Configure the source as Azure Data Lake and the destination as desired.

Example:

"source": {
    "type": "AzureDataLakeStoreSource",
    "storeSettings": {
        "type": "AzureDataLakeStoreReadSettings",
        "recursive": true
    }
}

5. Use JSON file lookup activity:

Step-by-step:
1. Add a data read activity to your pipeline in Azure Data Factory.
2. Configure the source as Azure Data Lake Storage or Blob Storage where the JSON file is located.
3. Specify the file format as JSON and provide the path to the JSON file.
4. Configure the activity properties according to your needs, such as data filtering, column projection, etc.

Example:

"activities": [
    {
        "name": "ReadJSONActivity",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "AzureDataLakeStorageLinkedService",
                "type": "DatasetReference",
                "parameters": {
                    "fileName": "example.json"
                }
            }
        ],
        "outputs": [
            {
                "referenceName": "OutputDataset",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "JsonSource"
            },
            "sink": {
                "type": "AzureBlobSink"
            },
            "enableStaging": false
        }
    }
]

6. Store data in Azure SQL Database:

Walkthrough:
1. Add a copy activity to the pipeline.
2. Configure the source as desired and the target as Azure SQL Database.

Example:

"sink": {
    "type": "SqlServerSink",
    "writeBatchSize": 10000,
    "writeBatchTimeout": "00:05:00"
}

7. Map columns using dynamic content:

Step-by-step:
1. During copy activity setup, map columns using dynamic expressions.

Example:

{"type": "TabularTranslator",
"mappings": [
  {"source": {"name": "Year"},"sink": {"name": "Year"}},
  {"source": {"name": "OU"},"sink": {"name": "OU"}},
  {"source": {"name": "Action"},"sink": {"name": "Government Action"}},
  {"source": {"name": "GND"},"sink": {"name": "GND"}},
  {"source": {"name": "Settled Expenses"},"sink": {"name": "Total Settlements"}}
]}

These steps and examples illustrate how to perform various operations with Azure Data Factory, from storing and extracting data to mapping columns and using specific activities.