Conversation
|
cc @andygrove @viirya @kazuyukitanimura @parthchandra |
spark/src/main/scala/org/apache/spark/sql/comet/CometBatchScanExec.scala
Show resolved
Hide resolved
kazuyukitanimura
left a comment
There was a problem hiding this comment.
Looking good, but a few more things.
Github action CI for 3.2 should be dropped.
ShimCometBatchScanExec can be also cleaned up. I.e. moving keyGroupedPartitioning and inputPartitions to CometBatchScanExec
spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala
Outdated
Show resolved
Hide resolved
spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala
Show resolved
Hide resolved
spark/src/main/spark-3.x/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala
Outdated
Show resolved
Hide resolved
kazuyukitanimura
left a comment
There was a problem hiding this comment.
LGTM pending ci
| } | ||
|
|
||
| // TODO: remove after dropping Spark 3.2 support and directly call new FileScanRDD | ||
| // TODO: remove after dropping Spark 3.4 support and directly call new FileScanRDD |
There was a problem hiding this comment.
3.4 or 3.3? I don't see we explicitly mention 3.4 in other places.
There was a problem hiding this comment.
Should be 3.4 because FileScanRDD has a different signature in 4.0
Here is the 4.0 signature
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
@transient val filePartitions: Seq[FilePartition],
val readSchema: StructType,
val metadataColumns: Seq[AttributeReference] = Seq.empty,
metadataExtractors: Map[String, PartitionedFile => Any] = Map.empty,
options: FileSourceOptions = new FileSourceOptions(CaseInsensitiveMap(Map.empty)))
Here is the 3.4 signature
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
@transient val filePartitions: Seq[FilePartition],
val readSchema: StructType,
val metadataColumns: Seq[AttributeReference] = Seq.empty,
options: FileSourceOptions = new FileSourceOptions(CaseInsensitiveMap(Map.empty)))
There was a problem hiding this comment.
How about 3.3? Is it also different to Spark 3.4?
There was a problem hiding this comment.
yes, 3.3 is also different from 3.4. Here is the 3.3 signature
class FileScanRDD(
@transient private val sparkSession: SparkSession,
readFunction: (PartitionedFile) => Iterator[InternalRow],
@transient val filePartitions: Seq[FilePartition],
val readSchema: StructType,
val metadataColumns: Seq[AttributeReference] = Seq.empty)
Spark 3.5 has the same signature as Spark 4.0
There was a problem hiding this comment.
Yea, that is why I asked about
remove after dropping Spark 3.4 support ...
Isn't it Spark 3.3/Spark 3.4?
There was a problem hiding this comment.
Ok Let me rewrite this to make it more clear
|
Thanks, everyone! |
* build: Drop Spark 3.2 support * remove un-used import * fix BloomFilterMightContain * revert the changes for TimestampNTZType and PartitionIdPassthrough * address comments and remove more 3.2 related code * remove un-used import * put back newDataSourceRDD * remove un-used import and put back lazy val partitions * address comments * Trigger Build * remove the missed 3.2 pipeline * address comments
Which issue does this PR close?
Closes 565.
Rationale for this change
What changes are included in this PR?
How are these changes tested?