Skip to content

fix: maxDistance cast to string#698

Merged
dluc merged 4 commits intomicrosoft:mainfrom
koteus:fix/minSimilarity-string-cast
Jul 16, 2024
Merged

fix: maxDistance cast to string#698
dluc merged 4 commits intomicrosoft:mainfrom
koteus:fix/minSimilarity-string-cast

Conversation

@koteus
Copy link
Contributor

@koteus koteus commented Jul 12, 2024

Motivation and Context (Why the change? What's the scenario?)

During string interpolation if dotnet culture is set to French it used comma as the decimal separator instead of period, which cased SQL error "unexpected ,"

High level description (Approach, Design)

Forcing culture-insensitive cast using maxDistance.ToString(CultureInfo.InvariantCulture)

during string interpolation if dotnet culture is set to French it used comma as the decimal separator instead of period, which cased SQL error "unexpected ,"
@koteus koteus requested a review from dluc as a code owner July 12, 2024 18:55
@dluc
Copy link
Collaborator

dluc commented Jul 12, 2024

Interesting, also considering that one could run Postgres with a French locale, so there's no straightforward fix. This issue probably affects other parts of the code

What about changing the .NET locale instead of changing the code? is it possible?

@koteus
Copy link
Contributor Author

koteus commented Jul 13, 2024

In my current project it is possible to change locale of .NET, but I can imagine that it won't be possible for many other teams/projects.

Changing the locale in PostgreSQL to French will not change the way it handles decimal delimiters, it follows the SQL standard that always uses "." periods. It only affects parsing and formatting, there are some functions like to_char and to_number that take into account the locale setting lc_numeric:

SELECT to_char(1234.5678, '9G999D99');

here G is thousands separator and D is decimal separator, they are locale-aware, so for en-US the result will be 1,234.57 and for fr-FR it will be 1.234,57, the same way we can parse string to numbers:

SELECT TO_NUMBER('123,456.78', '999G999D99');

but internal representation of numbers always uses periods.

So, we may use to_number in this case:

filterSql += $" AND {this._colEmbedding} <=> @embedding < to_number('{maxDistance}', '9D99')";

but that would work only if the database and the .NET app both use the same locale, and in my case .NET is 'fr-FR' and database is 'en-US' and it gives error

ERROR:  numeric field overflow
DETAIL:  A field with precision 1, scale 0 must round to an absolute value less than 10^1.

because in my case if let's say maxDistance is 0.82 the SQL would be to_number('0,82', '9D99') and my database would not be able to parse it with 'en-US' setting.

That's why I think the solution here is to explicitly convert number to string maxDistance.ToString(CultureInfo.InvariantCulture) this way the final SQL will have 0.82 always with the period delimiter.

@dluc
Copy link
Collaborator

dluc commented Jul 15, 2024

what about using a parameter instead?

- filterSql += $" AND {this._colEmbedding} <=> @embedding < {maxDistance.ToString(CultureInfo.InvariantCulture)}";
+ filterSql += $" AND {this._colEmbedding} <=> @embedding < @maxDistance";
  cmd.Parameters.AddWithValue("@embedding", target);
+ cmd.Parameters.AddWithValue("@maxDistance", maxDistance);

@koteus koteus force-pushed the fix/minSimilarity-string-cast branch from 5ac1d50 to 68e0a9f Compare July 16, 2024 07:31
@koteus
Copy link
Contributor Author

koteus commented Jul 16, 2024

what about using a parameter instead?

- filterSql += $" AND {this._colEmbedding} <=> @embedding < {maxDistance.ToString(CultureInfo.InvariantCulture)}";
+ filterSql += $" AND {this._colEmbedding} <=> @embedding < @maxDistance";
  cmd.Parameters.AddWithValue("@embedding", target);
+ cmd.Parameters.AddWithValue("@maxDistance", maxDistance);

Yes this way it's much cleaner (and it works as expected) ✅

@dluc
Copy link
Collaborator

dluc commented Jul 16, 2024

For the record, this fixes a regression introduced in #684

@dluc dluc merged commit a01b58f into microsoft:main Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments