Skip to content

Certain operations lack consideration for data types #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DrXiao opened this issue Nov 12, 2024 · 6 comments
Open

Certain operations lack consideration for data types #166

DrXiao opened this issue Nov 12, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@DrXiao
Copy link
Collaborator

DrXiao commented Nov 12, 2024

[ Edit: 2025/01/21 19:15 ]
Recently, I used the following tricky code to test shecc:

/* test.c */
int main() {
    char a = 0x11, b = 0x22, c = 0x33, d = 0x44;
    char *aa = &a, *bb = &b, *cc = &c, *dd = &d;
    int *aaa = &a;
    char arr[4];
    arr[0] = 0x11;
    arr[1] = 0x22;
    arr[2] = 0x33;
    arr[3] = 0x44;

    /* 1 */
    printf("== 1 ==\n");
    printf("%d %d %d %d\n", a, b, c, d);
    printf("%x %x %x %x\n", aa, bb, cc, dd);
    printf("%d %d %d\n", bb - aa, cc - bb, dd - cc);
    printf("%x %x\n\n", *aaa, aaa[0]);

    /* 2 */
    aaa = arr;

    printf("== 2 ==\n");
    for (int i = 0; i < 4; i++)
	    printf("arr[%d] = %x\n", i, arr[i]);

    printf("aaa = %x, arr = %x\n", aaa, arr);
    printf("*aaa = %x, aaa[0] = %x\n\n", *aaa aaa[0]);
    
    /* 3 */
    printf("== 3 ==\n");
    a += 6000;
    b += 400;
    c -= 400;
    d -= 6000;
    aaa = &a;
    printf("a = %d, b = %d, c = %d, d = %d\n", a, b, c, d);

    return 0;
}

When the code is compiled using GCC with no optimization, all outputs match the expected results:

$ gcc -o test test.c
$ ./test
== 1 ==
17 34 51 68
1759c720 1759c721 1759c722 1759c723
1 1 1
44332211 44332211

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 1759c754, arr = 1759c754
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = -127, b = -78, c = -93, d = -44

However, if using shecc to compile the code, some outputs seem to be wrong:

$ qemu-arm out/shecc-stage2.elf -o test test.c
$ qemu-arm test
== 1 ==
17 34 51 68
407ffd8c 407ffd90 407ffd94 407ffd98
4 4 4
11 11

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 407ffda8, arr = 407ffda8
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = 6017, b = 434, c = -349, d = -5932

By the code and the comment in test.c, it is obvious that the code is divided to 3 snippets. Next, I use these snippets to explain their behaviors and the bug what I found.


snippet 1

After introducing the pull request (#171), every stack allocation increment had been ensured to be properly aligned. Due to stack alignment requirements, each single character variable occupies 4 bytes, despite a character's actual size being 1 byte.

Thus, the following results are correct.

  1. the pointer arithmetic results like bb - aa, cc - bb and dd - cc are 4.
  2. Because aaa points to the address of a, the result of the dereference through aaa is equaivalent to the value of a.

snippet 2

- printf("*aaa = %x, aaa[0] = %x\n\n", *arr, aaa[0]);    /* 2024/12/15 22:56 */
+ printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);    /* 2025/01/21 19:15 */

(Previously, I made a typo so that I misunderstood pointer dereferencing also had an error. After fixing it, the result is correct.)


snippet 3 - (bug)

Because the variables (a ~ d) are signed characters, their results should be overflowed after executing additions and subtractions.

The correct results should be a = -127, b = -78, c = -93 and d = -44, but the results of the executable compiled by shecc produces error outputs.

Conclusion

According to the result of the snippet 3, the current backends may lack consideration of the data type when generating load/store instructions.

For example, the Arm backend may generates a ldr instruction for loading a character (1 byte), regardless of the data type. But, For character variables, it must generate ldrsb instructions instead.

Since these problems also occur in the RISC-V backend, both the Arm and RISC-V backends require improvements to generate data-type-specific instructions, such as ldrsb for signed characters.

@DrXiao DrXiao added the bug Something isn't working label Nov 12, 2024
@DrXiao
Copy link
Collaborator Author

DrXiao commented Dec 15, 2024

Due to the introduction of some recent pull requests, I fetched the latest changes, refactored the test code, and re-described this issue.

(I will gradually improve the description about this issue in the future.)

@DrXiao DrXiao changed the title Error behaviors for the load/store operations on a character variable Error behaviors of load/store operations and pointer dereferencing Dec 15, 2024
@ChAoSUnItY
Copy link
Collaborator

I'm currently testing snippet 2, but is the last printing statement supposed to be printing *arr? Because I found that your formatting string is actually asserting *aaa instead.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 20, 2025

printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);

Sorry, it should test *aaa in the snippet 2.

Due to personal reason, I will modify the code and description tomorrow.

@ChAoSUnItY
Copy link
Collaborator

After testing snippet 3, I found that it's quite hard to handle data type conversion more conventionally, due to lack of data size information, but also because of peephole optimization, we'll need to also resize it in arithmetic operations instead of just in OP_load and OP_store (I'm not sure if OP_assign is needed to be carefully handled too since it's meant to move value between registers). Maybe there's a concise way to handle this? Correct me if I'm wrong.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 21, 2025

printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);

Sorry, it should test *aaa in the snippet 2.

Due to personal reason, I will modify the code and description tomorrow.

After fixing the typo and testing the code again, the result shows that the behavior of pointer dereferencing is correct.

The code and the description have been updated. I think the main title of this issue should be Certain operations lack consideration for data types.

@DrXiao DrXiao changed the title Error behaviors of load/store operations and pointer dereferencing Certain operations lack consideration for data types Jan 21, 2025
@ChAoSUnItY
Copy link
Collaborator

I've attempted to fix this on local side, however, I've also noticed that this involves lots of things to be fixed in order to make things work correctly as gcc does:

  1. Currently, OP_write and OP_read correctly handle char and int types, so only OP_assign is required to changed. We may want to consider to introduce some kind of additional opcode like OP_cast to handle it in an additional step (not sure which option affects SSA optimization more)? I'm currently using OP_assign and using and opcode with intermediate to implement this behavior.
  2. Changing OP_assign's logic isn't enough, as constant folding optimization in SSA optimization pass doesn't also respect data type, we'll have to also consider var_t's actual data size.
  3. By implementing above 2 fixes, this will produce correct byte value, but outputs incorrect value in printf because of nature of variadic function (printf to be specific), every arguments are considered as int and byte should be promoted to int by sign extension. Thus, we should also consider function arguments' type and add OP_sign_ext to fix this.

Aside from above issue, currently modifying code to get var_t's type info is quite some pain and lack of efficiency, since type_t is not cached in var_t but lookup with type_name (which is linear time). I'm thinking of storing index of type in TYPES to make access to type easier.

In addition to point 3, there's no information for a function pointer, which makes compiler unable to determine argument type and sign extends argument if need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants