Emp crypto

Comment

Author: Admin | 2025-04-28

Spark application in all the four languages supported inside Synapse, PySpark (Python), Spark (Scala), SparkSQL, and .NET for Spark (C#).Create a GPU-enabled pool.Create a notebook and attach it to the GPU-enabled pool you created in the first step.Set the configurations as explained in the previous section.Create a sample dataframe by copying the below code in the first cell of your notebook:ScalaPythonC#import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}import org.apache.spark.sql.Rowimport scala.collection.JavaConversions._val schema = StructType( Array( StructField("emp_id", IntegerType), StructField("name", StringType), StructField("emp_dept_id", IntegerType), StructField("salary", IntegerType)))val emp = Seq(Row(1, "Smith", 10, 100000), Row(2, "Rose", 20, 97600), Row(3, "Williams", 20, 110000), Row(4, "Jones", 10, 80000), Row(5, "Brown", 40, 60000), Row(6, "Brown", 30, 78000) )val empDF = spark.createDataFrame(emp, schema)emp = [(1, "Smith", 10, 100000), (2, "Rose", 20, 97600), (3, "Williams", 20, 110000), (4, "Jones", 10, 80000), (5, "Brown", 40, 60000), (6, "Brown", 30, 78000)]empColumns = ["emp_id", "name", "emp_dept_id", "salary"]empDF = spark.createDataFrame(data=emp, schema=empColumns)using Microsoft.Spark.Sql.Types;var emp = new List{ new GenericRow(new object[] { 1, "Smith", 10, 100000 }), new GenericRow(new object[] { 2, "Rose", 20, 97600 }), new GenericRow(new object[] { 3, "Williams", 20, 110000 }), new GenericRow(new object[] { 4, "Jones", 10, 80000 }), new GenericRow(new object[] { 5, "Brown", 40, 60000 }), new GenericRow(new object[] { 6, "Brown", 30, 78000 })};var schema = new StructType(new List(){ new StructField("emp_id", new IntegerType()), new StructField("name", new StringType()), new StructField("emp_dept_id", new IntegerType()), new StructField("salary", new IntegerType())});DataFrame empDF = spark.CreateDataFrame(emp, schema);Now let's do an aggregate by getting the maximum salary per department ID and display the result:ScalaPythonC#val resultDF = empDF.groupBy("emp_dept_id").max("salary")resultDF.show()resultDF = empDF.groupBy("emp_dept_id").max("salary")resultDF.show()DataFrame resultDF = empDF.GroupBy("emp_dept_id").Max("salary");resultDF.Show();You can see the operations in your query that ran on GPUs by looking into the SQL plan through the Spark History Server:How to tune your application for GPUsMost Spark jobs can see improved performance through tuning configuration settings from defaults, and the same holds true for jobs leveraging the

Emp crypto

Comment

Add Comment

Best Articles

Subscibe