We'll get to know the techniques:
1. Set Isolated compute from analytics:
- The isolated compute option in Azure Synapse Analytics allows you to run queries in a separate, isolated tier, ensuring dedicated resources for critical analytics.
2. Define the architecture type and processing amount of the primary instance
(NODE SIZE, MEMORY OPTIMIZED, HARDWARE ACCELERATED):Choosing
the
- size and architecture of the primary instance determines the amount of compute resources available for queries and operations in Synapse Analytics. Options include memory optimization and hardware acceleration.
3. Set autoscale (AUTOSCALE):
- Autoscale dynamically adjusts the number of processing units (Dedicated SQL Pools) based on the workload, providing flexibility and cost optimization.
4. Set the number of nodes (EXECUTORS):
- When you configure the number of nodes (executors) in an Apache Spark instance, you determine the distributed processing capacity for big data workloads.
5. Set AUTOMATIC PAUSING:
- The ability to automatically pause the instance when not in use helps optimize costs by stopping billing when no resources are needed.
6. Set Apache Spark instance version
(VERSION, PREVIEW):
- Specify the Apache Spark version when creating an instance to ensure compatibility and access to the latest features.
7. Enable Installation of Session-Level Packages (ALLOW SESSION LEVEL PACKAGES):
- Allows you to install Python packages at the session level to customize the runtime environment in Spark.
8. Define Tags (TAGS):
- Adding tags helps with organization and resource management, allowing you to classify and identify instances more efficiently.
9. View and change the
configuration of the created instance (SCALE SETTINGS):
- Access to the scaling settings, where you can view and change the configuration of the instance.
10. Create new notebook by attaching to Apache Spark instance (ATTACH TO):
- Creating a new notebook in the Apache Spark environment, which allows you to run interactive Spark code.
11. Define the notebook programming language (LANGUAGE):
- Specify the notebook programming language, such as Scala, Python, or SQL, depending on the preference and task.
12. Create Code and Rich Text Cells (CODE, MARKDOWN):
- Organize notebook content into cells, where code cells contain executable instructions and Markdown cells allow text formatting.
13. Run Notebook (RUN ALL, SESSION START): Start
- running all cells in the notebook or start a new session to run Spark code.
14. Monitor Apache Spark instance (ALLOCATED vCores, Memory, ACTIVE APPLICATIONS):
- Monitoring of allocated resources, memory usage, and active applications on the Spark instance.
15. Stop Session Execution:
- Stop the execution of the current session in the Apache Spark environment.
These steps provide an overview of how to set up and use Azure Synapse Analytics with Apache Spark instance for big data analytics. Be sure to refer to the official documentation for specific details and updates.