build: Drop Spark 3.2 support by huaxingao · Pull Request #581 · apache/datafusion-comet

huaxingao · 2024-06-17T15:17:05Z

Which issue does this PR close?

Closes 565.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

huaxingao · 2024-06-17T16:36:24Z

cc @andygrove @viirya @kazuyukitanimura @parthchandra
Could you please take a look when you have a moment? Thanks!

kazuyukitanimura

It looks installation.md and overview.md have 3.2 mentioned.

We can also remove spark-3.2 shims.

Additionally, we can remove a few more things e.g. ShimCometParquetUtils, github actions, etc...

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

spark/src/main/scala/org/apache/spark/sql/comet/CometBatchScanExec.scala

kazuyukitanimura

Looking good, but a few more things.

Github action CI for 3.2 should be dropped.
ShimCometBatchScanExec can be also cleaned up. I.e. moving keyGroupedPartitioning and inputPartitions to CometBatchScanExec

spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala

.github/workflows/pr_build.yml

kazuyukitanimura

LGTM pending ci

viirya · 2024-06-18T18:28:35Z

spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala

-  }
-
-  // TODO: remove after dropping Spark 3.2 support and directly call new FileScanRDD
+  // TODO: remove after dropping Spark 3.4 support and directly call new FileScanRDD


3.4 or 3.3? I don't see we explicitly mention 3.4 in other places.

Should be 3.4 because FileScanRDD has a different signature in 4.0
Here is the 4.0 signature

class FileScanRDD( @transient private val sparkSession: SparkSession, readFunction: (PartitionedFile) => Iterator[InternalRow], @transient val filePartitions: Seq[FilePartition], val readSchema: StructType, val metadataColumns: Seq[AttributeReference] = Seq.empty, metadataExtractors: Map[String, PartitionedFile => Any] = Map.empty, options: FileSourceOptions = new FileSourceOptions(CaseInsensitiveMap(Map.empty)))

Here is the 3.4 signature

class FileScanRDD( @transient private val sparkSession: SparkSession, readFunction: (PartitionedFile) => Iterator[InternalRow], @transient val filePartitions: Seq[FilePartition], val readSchema: StructType, val metadataColumns: Seq[AttributeReference] = Seq.empty, options: FileSourceOptions = new FileSourceOptions(CaseInsensitiveMap(Map.empty)))

I see. Thanks.

How about 3.3? Is it also different to Spark 3.4?

yes, 3.3 is also different from 3.4. Here is the 3.3 signature

class FileScanRDD( @transient private val sparkSession: SparkSession, readFunction: (PartitionedFile) => Iterator[InternalRow], @transient val filePartitions: Seq[FilePartition], val readSchema: StructType, val metadataColumns: Seq[AttributeReference] = Seq.empty)

Spark 3.5 has the same signature as Spark 4.0

Yea, that is why I asked about

remove after dropping Spark 3.4 support ...

Isn't it Spark 3.3/Spark 3.4?

Ok Let me rewrite this to make it more clear

huaxingao · 2024-06-18T23:13:06Z

Thanks, everyone!

* build: Drop Spark 3.2 support * remove un-used import * fix BloomFilterMightContain * revert the changes for TimestampNTZType and PartitionIdPassthrough * address comments and remove more 3.2 related code * remove un-used import * put back newDataSourceRDD * remove un-used import and put back lazy val partitions * address comments * Trigger Build * remove the missed 3.2 pipeline * address comments

huaxingao added 4 commits June 16, 2024 10:45

build: Drop Spark 3.2 support

b715ae8

remove un-used import

77a9a9b

fix BloomFilterMightContain

9cc0b3e

revert the changes for TimestampNTZType and PartitionIdPassthrough

99f2d5c

kazuyukitanimura reviewed Jun 17, 2024

View reviewed changes

spark/src/test/scala/org/apache/comet/CometCastSuite.scala Show resolved Hide resolved

eejbyfeldt reviewed Jun 17, 2024

View reviewed changes

spark/src/main/scala/org/apache/spark/sql/comet/CometBatchScanExec.scala Show resolved Hide resolved

huaxingao added 4 commits June 17, 2024 15:26

address comments and remove more 3.2 related code

0e936a8

remove un-used import

ce1538c

put back newDataSourceRDD

1c0ba78

remove un-used import and put back lazy val partitions

d3c5fca

kazuyukitanimura reviewed Jun 18, 2024

View reviewed changes

huaxingao added 2 commits June 17, 2024 20:17

address comments

1380e8f

Trigger Build

818a629

kazuyukitanimura reviewed Jun 18, 2024

View reviewed changes

.github/workflows/pr_build.yml Show resolved Hide resolved

remove the missed 3.2 pipeline

b9602e0

kazuyukitanimura approved these changes Jun 18, 2024

View reviewed changes

kazuyukitanimura mentioned this pull request Jun 18, 2024

test: Enable Spark 4.0 tests #537

Merged

viirya reviewed Jun 18, 2024

View reviewed changes

viirya approved these changes Jun 18, 2024

View reviewed changes

address comments

edb55e3

kazuyukitanimura mentioned this pull request Jun 18, 2024

feat: Add experimental support for Apache Spark 3.5.1 #587

Merged

andygrove merged commit d584229 into apache:main Jun 18, 2024

huaxingao deleted the drop_3.2 branch June 18, 2024 23:13

Conversation

huaxingao commented Jun 17, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

huaxingao commented Jun 17, 2024

Uh oh!

kazuyukitanimura left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

viirya Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

huaxingao Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

huaxingao Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

huaxingao Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kazuyukitanimura left a comment •

edited

Loading