Skip to content

Fix null conversion#151

Merged
alxmrs merged 6 commits intoalxmrs:mainfrom
ahuang11:fix_null_conversion
Mar 27, 2026
Merged

Fix null conversion#151
alxmrs merged 6 commits intoalxmrs:mainfrom
ahuang11:fix_null_conversion

Conversation

@ahuang11
Copy link
Copy Markdown
Collaborator

I noticed that the values were all NaN. This fixes it. Used Claude Opus during planning and execution.

@ahuang11 ahuang11 mentioned this pull request Mar 19, 2026
Copy link
Copy Markdown
Owner

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests need some improvements. In general, thank you for making this fix!

Comment on lines +215 to +233
def test_max_ignores_nan(self, nan_ds):
ctx = XarrayContext()
ctx.from_dataset("data", nan_ds)
result = ctx.sql("SELECT MAX(temp) AS mx FROM data").to_pandas()
assert result["mx"].iloc[0] == 8.0

def test_min_ignores_nan(self, nan_ds):
ctx = XarrayContext()
ctx.from_dataset("data", nan_ds)
result = ctx.sql("SELECT MIN(temp) AS mn FROM data").to_pandas()
assert result["mn"].iloc[0] == 1.0

def test_avg_ignores_nan(self, nan_ds):
import numpy as np
ctx = XarrayContext()
ctx.from_dataset("data", nan_ds)
result = ctx.sql("SELECT AVG(temp) AS avg FROM data").to_pandas()
expected = np.nanmean([1.0, 2.0, 4.0, 5.0, 7.0, 8.0])
assert abs(result["avg"].iloc[0] - expected) < 1e-6
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these could be melded into one test -- or better yet, only one of these should imply the others.

@ahuang11
Copy link
Copy Markdown
Collaborator Author

Made changes!

I'm not sure I agree with this interface, as it breaks the conventions in datafusion IUUC.

Playing devil's advocate, I'm used to having pandas df and xarray ds be returned, but this can be discussed more in #148

ahuang11 and others added 5 commits March 27, 2026 12:00
@ahuang11 ahuang11 force-pushed the fix_null_conversion branch from 715b610 to e78cc76 Compare March 27, 2026 19:02
@ahuang11 ahuang11 requested a review from alxmrs March 27, 2026 19:09
Copy link
Copy Markdown
Owner

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank for this critical fix!!

@alxmrs alxmrs merged commit 5176113 into alxmrs:main Mar 27, 2026
12 checks passed
alxmrs added a commit that referenced this pull request Mar 30, 2026
Closes #144

After attempting this, I realized it did not feel like a good first
issue~ Note I used Claude Opus to plan and write this.

Two tiers based on calendar type:
1. Gregorian-like (standard, noleap, all_leap) → pa.timestamp('us').
String SQL filters work naturally (WHERE time > '1980-01-01').
Microsecond resolution covers ±292k years, avoiding the nanosecond
1678–2262 overflow.
2. Non-Gregorian (360_day, julian) → pa.int64() with
xarray:units/xarray:calendar metadata on the Arrow field. Lossless — a
360-day "Feb 30th" isn't forced into Gregorian. Auto-registered cftime()
UDF provides ergonomic filtering: WHERE time > cftime('2000-07-01').

Now this runs:
```
import xarray as xr
from metpy.cbook import get_test_data
from xarray_sql import XarrayContext

ds = xr.tutorial.open_dataset("rasm")
context = XarrayContext().from_dataset("rasm", ds, chunks={"time": 2})
context.sql("SELECT * FROM rasm LIMIT 1000")
```

<img width="230" height="279" alt="image"
src="https://github.com/user-attachments/assets/5f7ff224-d4b1-429f-a3b3-51e017475c21"
/>

However as you can see, there are a lot of NaNs; to be fixed in a
separate PR in #151

---------

Co-authored-by: Alex Merose <al@merose.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants