Skip to content

[lake/lance] Lance writer should emit Arrow FixedSizeList for array columns to enable native vector search #2706

@leekeiabstraction

Description

@leekeiabstraction

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

main (development)

Please describe the bug 🐞

When tiering tables with ARRAY columns to Lance, Fluss always writes them as Arrow's variable-length List. This prevents Lance's native vector search (nearest / ANN index) from working, since Lance requires FixedSizeList(n) for vector columns.

This seems to be due to LanceArrowUtils.toArrowType(), ArrayType unconditionally maps to ArrowType.List.INSTANCE:

  // fluss-lake/fluss-lake-lance/src/main/java/org/apache/fluss/lake/lance/utils/LanceArrowUtils.java:140
  } else if (dataType instanceof ArrayType) {
      return ArrowType.List.INSTANCE;
  }

Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions