Skip to content

list.c: improve code comments to point to official documentation about problems which may cause code to get stuck inside of list.c #1051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2024

Conversation

ElectricRCAircraftGuy
Copy link
Contributor

@ElectricRCAircraftGuy ElectricRCAircraftGuy commented May 8, 2024

Description

Improve comments to help people when debugging. This would have saved me weeks of time. An uninitialized binary semaphore (ie: one which has not been given yet) will result in getting stuck inside of vListInsert(). You must give it before taking it.

Test Steps

Create a semaphore, start the scheduler, try to take it. You'll get stuck in a debugger inside the for loop in VListInsert().

Now, right after creating it, but before starting the scheduler, give it. Start the scheduler. Now, when it is taken it works fine.

Checklist:

  • I have tested my changes. No regression in existing tests.
  • I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Can someone please run necessary tests on your end to ensure this is behavior you see too?

Related Issue

NA

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ElectricRCAircraftGuy ElectricRCAircraftGuy requested a review from a team as a code owner May 8, 2024 22:31
list.c Outdated
Comment on lines 187 to 195
* - This includes initializing binary semaphores before taking them. If
* you create one with `xSemaphoreCreateBinary()` or
* `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()`
* before calling `xSemaphoreTake(). See:
* https://freertos.org/xSemaphoreCreateBinaryStatic.html:
* > The semaphore is created in the 'empty' state, meaning the
* > semaphore must first be given using the xSemaphoreGive() API
* > function before it can subsequently be taken (obtained) using the
* > xSemaphoreTake() function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to expand this comment so much. Maybe point 4 could be changed to...

        *   4) Using a queue or semaphore before it has been initialised or
        *      before the scheduler has been started (are interrupts firing
        *      before vTaskStartScheduler() has been called?). See
        *      respective object documentation for more.

Copy link
Contributor Author

@ElectricRCAircraftGuy ElectricRCAircraftGuy May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the only part you recommend adding then is this:

See respective object documentation for more.

That doesn't add anything at all in my opinion. I've already been reading the respective object documentation for years, and actively reading it recently for months.


The scenario is this:

I'm in a several million line codebase.
I did a 1.2 million line (400+ files changed) merge, where there were dozens of files with significant conflicts and in which it took me days to resolve them.

I have no idea why the code no longer runs, but I keep getting stuck in this for loop now and don't know why. What "respective object documentation" should I read in such a situation!? We are using half of the FreeRTOS objects in the kernel and I don't know why my code won't run after a 1.2 million line merge, and which of these objects is now being misused.

By adding my two comments here, when debugging, I'd see this in my debugger, where the green line indicates where the debugger is stuck:

image

Except in the yellow area I'd then see this:

IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.

...and then I'd read above. And when I read above, I'd then see this:

    *     - This includes initializing binary semaphores before taking them. If
    *       you create one with `xSemaphoreCreateBinary()` or
    *       `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()`
    *       before calling `xSemaphoreTake(). See:
    *       https://freertos.org/xSemaphoreCreateBinaryStatic.html:
    *       > The semaphore is created in the 'empty' state, meaning the
    *       > semaphore must first be given using the xSemaphoreGive() API
    *       > function before it can subsequently be taken (obtained) using the
    *       > xSemaphoreTake() function.

and then I'd go search my merge commit for changes to semaphores, see we converted a mutex to a binary semaphore, and add the xSemaphoreGive() call after creating it (which calls got botched and removed in the merge). Boom done. Would have saved me 2 weeks of non-stop debugging.

list.c Outdated
Comment on lines 204 to 206
* insertion position.
* IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this. It's 10 lines from the comment block above explaining common causes.

Copy link
Contributor Author

@ElectricRCAircraftGuy ElectricRCAircraftGuy May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my logic:

Here's the view from the MPLAB X IDE while debugging and halting on this for loop, on which the code is stuck. My eyes naturally focus on the contents of the for loop, in yellow:

image

The purpose of this line:

IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.

Is to tell me to look at the notes above, or else I won't, because I'm looking inside the for loop in which the debugger is stuck, instead.

@kstribrnAmzn
Copy link
Member

Hello @ElectricRCAircraftGuy! Thanks for submitting the PR. I like the idea of improving the documentation but I believe this case is well covered by both our website documentation and in the semaphore header. I'd like to leave the documentation there to keep a single authoritative source. Updating the list source code comment to point the authoritative documentation will ensure comments stay up to date with the behavior.

Please see my suggestions about how to rework the comment.

Copy link
Contributor Author

@ElectricRCAircraftGuy ElectricRCAircraftGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at where I write "The scenario is this" for my use-case and why I think this would be a useful addition.

list.c Outdated
Comment on lines 204 to 206
* insertion position.
* IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.
*/
Copy link
Contributor Author

@ElectricRCAircraftGuy ElectricRCAircraftGuy May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my logic:

Here's the view from the MPLAB X IDE while debugging and halting on this for loop, on which the code is stuck. My eyes naturally focus on the contents of the for loop, in yellow:

image

The purpose of this line:

IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.

Is to tell me to look at the notes above, or else I won't, because I'm looking inside the for loop in which the debugger is stuck, instead.

list.c Outdated
Comment on lines 187 to 195
* - This includes initializing binary semaphores before taking them. If
* you create one with `xSemaphoreCreateBinary()` or
* `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()`
* before calling `xSemaphoreTake(). See:
* https://freertos.org/xSemaphoreCreateBinaryStatic.html:
* > The semaphore is created in the 'empty' state, meaning the
* > semaphore must first be given using the xSemaphoreGive() API
* > function before it can subsequently be taken (obtained) using the
* > xSemaphoreTake() function.
Copy link
Contributor Author

@ElectricRCAircraftGuy ElectricRCAircraftGuy May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the only part you recommend adding then is this:

See respective object documentation for more.

That doesn't add anything at all in my opinion. I've already been reading the respective object documentation for years, and actively reading it recently for months.


The scenario is this:

I'm in a several million line codebase.
I did a 1.2 million line (400+ files changed) merge, where there were dozens of files with significant conflicts and in which it took me days to resolve them.

I have no idea why the code no longer runs, but I keep getting stuck in this for loop now and don't know why. What "respective object documentation" should I read in such a situation!? We are using half of the FreeRTOS objects in the kernel and I don't know why my code won't run after a 1.2 million line merge, and which of these objects is now being misused.

By adding my two comments here, when debugging, I'd see this in my debugger, where the green line indicates where the debugger is stuck:

image

Except in the yellow area I'd then see this:

IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.

...and then I'd read above. And when I read above, I'd then see this:

    *     - This includes initializing binary semaphores before taking them. If
    *       you create one with `xSemaphoreCreateBinary()` or
    *       `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()`
    *       before calling `xSemaphoreTake(). See:
    *       https://freertos.org/xSemaphoreCreateBinaryStatic.html:
    *       > The semaphore is created in the 'empty' state, meaning the
    *       > semaphore must first be given using the xSemaphoreGive() API
    *       > function before it can subsequently be taken (obtained) using the
    *       > xSemaphoreTake() function.

and then I'd go search my merge commit for changes to semaphores, see we converted a mutex to a binary semaphore, and add the xSemaphoreGive() call after creating it (which calls got botched and removed in the merge). Boom done. Would have saved me 2 weeks of non-stop debugging.

@ElectricRCAircraftGuy ElectricRCAircraftGuy changed the title list.c: improve documentation about initializing binary semaphores list.c: improve code comments to point to formal documentation about initializing binary semaphores and other problems which could cause code to get stuck inside of lists.c May 9, 2024
@ElectricRCAircraftGuy ElectricRCAircraftGuy changed the title list.c: improve code comments to point to formal documentation about initializing binary semaphores and other problems which could cause code to get stuck inside of lists.c list.c: improve code comments to point to formal documentation about initializing binary semaphores and other problems which could cause code to get stuck inside of list.c May 9, 2024
@ElectricRCAircraftGuy ElectricRCAircraftGuy changed the title list.c: improve code comments to point to formal documentation about initializing binary semaphores and other problems which could cause code to get stuck inside of list.c list.c: improve code comments to point to official documentation about problems which may cause code to get stuck inside of list.c May 9, 2024
Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@aggarg aggarg merged commit 29b202a into FreeRTOS:main May 13, 2024
16 checks passed
@ElectricRCAircraftGuy
Copy link
Contributor Author

ElectricRCAircraftGuy commented May 13, 2024

Thanks @kar-rahul-aws and @aggarg for incorporating the intent of my changes! I appreciate it. Looks good to me.

@ElectricRCAircraftGuy ElectricRCAircraftGuy deleted the gstaples_update_list.c branch May 13, 2024 15:05
Comment on lines +187 to +193
* 5) Attempting to 'take' binary semaphores created using
* `xSemaphoreCreateBinary()` or `xSemaphoreCreateBinaryStatic()`
* APIs, before 'giving' them. Binary semaphores created using
* `xSemaphoreCreateBinary()` or `xSemaphoreCreateBinaryStatic()`,
* are created in a state such that the semaphore must first be
* 'given' using xSemaphoreGive() API before it can be 'taken' using
* xSemaphoreTake() API.
Copy link
Contributor

@jefftenney jefftenney May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new statement seems misleading. Taking a binary semaphore that has not yet been given results in a yield, not getting stuck in this loop. However, taking a binary semaphore before it has been created does end up this way. Item 4 above covers that case already.

There's nothing wrong with attempting to take a binary semaphore that has not yet been given -- in fact that is the most common use of a binary semaphore -- so maybe we're better off without this new statement in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are exactly right @jefftenney. I verified that with the following code on a Cortex-M7:

void Task( void * param )
{
    SemaphoreHandle_t xBinarySem;

    ( void ) param;

    xBinarySem = xSemaphoreCreateBinary();

    xSemaphoreTake( xBinarySem, portMAX_DELAY );

    for( ;; )
    {
        HAL_GPIO_TogglePin( LD1_GPIO_Port, LD1_Pin );
        vTaskDelay( pdMS_TO_TICKS( 1000 ) );
    }
}
void app_main( void )
{
    xTaskCreate( Task,
                 "Task",
                 configMINIMAL_STACK_SIZE,
                 NULL,
                 tskIDLE_PRIORITY | portPRIVILEGE_BIT,
                 NULL );

    vTaskStartScheduler();

    for( ;; );
}

I'll revert it in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in this PR - #1056. Thanks again @jefftenney !

Copy link
Contributor Author

@ElectricRCAircraftGuy ElectricRCAircraftGuy May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jefftenney and @aggarg , thank you both for taking a look at this. Is there an explanation for the behavior I was seeing, then? I have a TaskSetup() function inside main() which gets called before we call vTaskStartScheduler(); at the end of main(). TaskSetup() contains, among other things, this:

xI2C1mutex = xSemaphoreCreateBinaryStatic(&xI2C1mutexBuffer);	// I2C1 bus
xI2C2mutex = xSemaphoreCreateBinaryStatic(&xI2C2mutexBuffer);	// I2C2 bus
xI2C4mutex = xSemaphoreCreateBinaryStatic(&xI2C4mutexBuffer);	// I2C4 bus

Like that, my debugger gets stuck forever inside of vListInsert() in list.c.

If I change it to this, however, it works fine:

xI2C1mutex = xSemaphoreCreateBinaryStatic(&xI2C1mutexBuffer);	// I2C1 bus
xI2C2mutex = xSemaphoreCreateBinaryStatic(&xI2C2mutexBuffer);	// I2C2 bus
xI2C4mutex = xSemaphoreCreateBinaryStatic(&xI2C4mutexBuffer);	// I2C4 bus

xSemaphoreGive(xI2C1mutex);
xSemaphoreGive(xI2C2mutex);
xSemaphoreGive(xI2C4mutex);

Furthermore, I think bullet 4 still needs something extra. It says:

*   4) Using a queue or semaphore before it has been initialised or
*      before the scheduler has been started (are interrupts firing
*      before vTaskStartScheduler() has been called?).

So, does this include regular mutexes, recursive mutexes, and binary semaphores? Or, just binary semaphores? I mean, a mutex is a type of semaphore.

Also, what does it mean to "be initialised"?

@jefftenney , your wording above, "taking a binary semaphore before it has been created", makes more sense than using the word "initialised", unless initialized means something different, such as "created and taken", as seems to be implied by the official documentation:

The semaphore is created in the 'empty' [I read this as "uninitialized"] state, meaning the semaphore must first be given using the xSemaphoreGive() API function before it can subsequently be taken (obtained) using the xSemaphoreTake() function.

So, something in this documentation needs to be improved and or clarified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ElectricRCAircraftGuy The term "create" and "initialize" are equivalent here. One possible improvement to the official docs would be to use "create" exclusively, because that term matches the API function names.

The term "empty" does not refer to the created/uncreated state. Instead it describes one of the two states of a semaphore after it has been created. It is either empty or full. Some people say taken or given. Some people say unsignaled or signaled. Application usage of the semaphore changes it back and forth between these two states.

Another improvement could be to add clarity as you suggested. Something like this (probably needs work):

The semaphore is created in the 'empty' state, meaning the semaphore must first be given using the xSemaphoreGive() API function before xSemaphoreTake() can successfully take the semaphore.

All of this still leaves you with your mystery. Do any of the 5 items listed in the code comment apply? Something might be corrupting kernel data structures and your startup calls to xSemaphoreGive() might be hiding the issue.

aggarg added a commit to aggarg/FreeRTOS-Kernel that referenced this pull request May 14, 2024
As pointed out by Jeff Tenney, the comment introduced in the PR is not
accurate.

Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
aggarg added a commit that referenced this pull request May 14, 2024
As pointed out by Jeff Tenney, the comment introduced in the PR is not
accurate.

Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
schilkp pushed a commit to schilkp/FreeRTOS-Kernel that referenced this pull request May 17, 2024
…t problems which may cause code to get stuck inside of list.c (FreeRTOS#1051)

list.c: improve documentation about initializing binary semaphores
schilkp pushed a commit to schilkp/FreeRTOS-Kernel that referenced this pull request May 17, 2024
As pointed out by Jeff Tenney, the comment introduced in the PR is not
accurate.

Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants