Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-1466] Support for UNSIGNED types of TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT in type system #275

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

strongduanmu
Copy link
Member

@strongduanmu strongduanmu changed the title [CALCITE-1466] Support for UNSIGNED types of int,long,float etc. in type system [CALCITE-1466] Support for UNSIGNED types of TINYINT, SMALLINT, MEDIUMINT, INT BIGINT in type system Feb 2, 2025
@strongduanmu strongduanmu changed the title [CALCITE-1466] Support for UNSIGNED types of TINYINT, SMALLINT, MEDIUMINT, INT BIGINT in type system [CALCITE-1466] Support for UNSIGNED types of TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT in type system Feb 2, 2025
Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine, but it's hard to tell whether all code paths that may be relevant have been handled until we have some end-to-end tests.

@strongduanmu
Copy link
Member Author

The changes look fine, but it's hard to tell whether all code paths that may be relevant have been handled until we have some end-to-end tests.

Yes, I agree with you, end-to-end testing is very important. Currently, I have tested different unsigned types with calcite avatica 1.26.0-SNAPSHOT and they all work fine. When this PR is merged, I will try to support unsigned types in calcite and add more end-to-end test cases.

@mihaibudiu
Copy link
Contributor

I am wondering whether it will pay off to actually build a set of classes UnsignedInt, UnsignedShort, etc, which would encapsulate these abstractions in a clean way. I am not sure that this is the right place for them - BTW. But I think you will need them eventually in the runtime of Calcite to properly implement the semantics of unsigned arithmetic.

@mihaibudiu
Copy link
Contributor

I notice that Guava has some of these.

@strongduanmu
Copy link
Member Author

@mihaibudiu Thank you for your remind, I will try to implement this feature using a class like Guava UnsignedInt.

@strongduanmu
Copy link
Member Author

Hi @mihaibudiu, I spent some time looking at Guava. Currently, it only provides UnsignedInteger and UnsignedLong. Other UnsignedByte and UnsignedShort are not provided by Guava.

In addition, I found that MySQL provides another idea, we can handle them by mapping a wider range of Java types. For example, the BIGINT type is usually mapped to java.lang.Long for processing, while BIGINT UNSIGNED needs to be mapped to java.math.BigInteger for processing. For more type mappings, please refer to the document: https://dev.mysql.com/doc/connector-j/en/connector-j-reference-type-conversions.html

Through this mapping conversion method, we do not need to add any classes, and the Calcite runtime can handle UNSIGNED types. What do you think of this approach? I will keep trying if you have any suggestions.

Copy link
Contributor

@ILuffZhe ILuffZhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

@mihaibudiu
Copy link
Contributor

From my experience, in the long term it pays off to implement the right abstractions.
I expect there will be a lot of work to be done using unsigned types, including in the type checker, expression simplification, and runtime system. In particular, at runtime the enumerable implementation looks at the runtime types to make some decisions (maybe that's a bug in the implementation). So if the runtime will see a Long it won't know whether it's a real Long or an UnsignedInt in disguise.

@strongduanmu
Copy link
Member Author

From my experience, in the long term it pays off to implement the right abstractions. I expect there will be a lot of work to be done using unsigned types, including in the type checker, expression simplification, and runtime system. In particular, at runtime the enumerable implementation looks at the runtime types to make some decisions (maybe that's a bug in the implementation). So if the runtime will see a Long it won't know whether it's a real Long or an UnsignedInt in disguise.

Hi @mihaibudiu, thank you for sharing these valuable experiences. These are issues I had not considered before. Since I am still a novice, I will spend some time to understand type checking, expression simplification, etc., and then try to add new UnsignedInt, UnsignedLong and other types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants