-
Notifications
You must be signed in to change notification settings - Fork 1.3k
list.c: improve code comments to point to official documentation about problems which may cause code to get stuck inside of list.c #1051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list.c: improve code comments to point to official documentation about problems which may cause code to get stuck inside of list.c #1051
Conversation
list.c
Outdated
* - This includes initializing binary semaphores before taking them. If | ||
* you create one with `xSemaphoreCreateBinary()` or | ||
* `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()` | ||
* before calling `xSemaphoreTake(). See: | ||
* https://freertos.org/xSemaphoreCreateBinaryStatic.html: | ||
* > The semaphore is created in the 'empty' state, meaning the | ||
* > semaphore must first be given using the xSemaphoreGive() API | ||
* > function before it can subsequently be taken (obtained) using the | ||
* > xSemaphoreTake() function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to expand this comment so much. Maybe point 4 could be changed to...
* 4) Using a queue or semaphore before it has been initialised or
* before the scheduler has been started (are interrupts firing
* before vTaskStartScheduler() has been called?). See
* respective object documentation for more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the only part you recommend adding then is this:
See respective object documentation for more.
That doesn't add anything at all in my opinion. I've already been reading the respective object documentation for years, and actively reading it recently for months.
The scenario is this:
I'm in a several million line codebase.
I did a 1.2 million line (400+ files changed) merge, where there were dozens of files with significant conflicts and in which it took me days to resolve them.
I have no idea why the code no longer runs, but I keep getting stuck in this for loop now and don't know why. What "respective object documentation" should I read in such a situation!? We are using half of the FreeRTOS objects in the kernel and I don't know why my code won't run after a 1.2 million line merge, and which of these objects is now being misused.
By adding my two comments here, when debugging, I'd see this in my debugger, where the green line indicates where the debugger is stuck:
Except in the yellow area I'd then see this:
IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.
...and then I'd read above. And when I read above, I'd then see this:
* - This includes initializing binary semaphores before taking them. If * you create one with `xSemaphoreCreateBinary()` or * `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()` * before calling `xSemaphoreTake(). See: * https://freertos.org/xSemaphoreCreateBinaryStatic.html: * > The semaphore is created in the 'empty' state, meaning the * > semaphore must first be given using the xSemaphoreGive() API * > function before it can subsequently be taken (obtained) using the * > xSemaphoreTake() function.
and then I'd go search my merge commit for changes to semaphores, see we converted a mutex to a binary semaphore, and add the xSemaphoreGive()
call after creating it (which calls got botched and removed in the merge). Boom done. Would have saved me 2 weeks of non-stop debugging.
list.c
Outdated
* insertion position. | ||
* IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this. It's 10 lines from the comment block above explaining common causes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my logic:
Here's the view from the MPLAB X IDE while debugging and halting on this for loop, on which the code is stuck. My eyes naturally focus on the contents of the for loop, in yellow:
The purpose of this line:
IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.
Is to tell me to look at the notes above, or else I won't, because I'm looking inside the for loop in which the debugger is stuck, instead.
Hello @ElectricRCAircraftGuy! Thanks for submitting the PR. I like the idea of improving the documentation but I believe this case is well covered by both our website documentation and in the semaphore header. I'd like to leave the documentation there to keep a single authoritative source. Updating the list source code comment to point the authoritative documentation will ensure comments stay up to date with the behavior. Please see my suggestions about how to rework the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at where I write "The scenario is this" for my use-case and why I think this would be a useful addition.
list.c
Outdated
* insertion position. | ||
* IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my logic:
Here's the view from the MPLAB X IDE while debugging and halting on this for loop, on which the code is stuck. My eyes naturally focus on the contents of the for loop, in yellow:
The purpose of this line:
IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.
Is to tell me to look at the notes above, or else I won't, because I'm looking inside the for loop in which the debugger is stuck, instead.
list.c
Outdated
* - This includes initializing binary semaphores before taking them. If | ||
* you create one with `xSemaphoreCreateBinary()` or | ||
* `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()` | ||
* before calling `xSemaphoreTake(). See: | ||
* https://freertos.org/xSemaphoreCreateBinaryStatic.html: | ||
* > The semaphore is created in the 'empty' state, meaning the | ||
* > semaphore must first be given using the xSemaphoreGive() API | ||
* > function before it can subsequently be taken (obtained) using the | ||
* > xSemaphoreTake() function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the only part you recommend adding then is this:
See respective object documentation for more.
That doesn't add anything at all in my opinion. I've already been reading the respective object documentation for years, and actively reading it recently for months.
The scenario is this:
I'm in a several million line codebase.
I did a 1.2 million line (400+ files changed) merge, where there were dozens of files with significant conflicts and in which it took me days to resolve them.
I have no idea why the code no longer runs, but I keep getting stuck in this for loop now and don't know why. What "respective object documentation" should I read in such a situation!? We are using half of the FreeRTOS objects in the kernel and I don't know why my code won't run after a 1.2 million line merge, and which of these objects is now being misused.
By adding my two comments here, when debugging, I'd see this in my debugger, where the green line indicates where the debugger is stuck:
Except in the yellow area I'd then see this:
IF YOU FIND YOUR CODE STUCK HERE, SEE THE NOTE JUST ABOVE.
...and then I'd read above. And when I read above, I'd then see this:
* - This includes initializing binary semaphores before taking them. If * you create one with `xSemaphoreCreateBinary()` or * `xSemaphoreCreateBinaryStatic()`, you must call `xSemaphoreGive()` * before calling `xSemaphoreTake(). See: * https://freertos.org/xSemaphoreCreateBinaryStatic.html: * > The semaphore is created in the 'empty' state, meaning the * > semaphore must first be given using the xSemaphoreGive() API * > function before it can subsequently be taken (obtained) using the * > xSemaphoreTake() function.
and then I'd go search my merge commit for changes to semaphores, see we converted a mutex to a binary semaphore, and add the xSemaphoreGive()
call after creating it (which calls got botched and removed in the merge). Boom done. Would have saved me 2 weeks of non-stop debugging.
Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
|
Thanks @kar-rahul-aws and @aggarg for incorporating the intent of my changes! I appreciate it. Looks good to me. |
* 5) Attempting to 'take' binary semaphores created using | ||
* `xSemaphoreCreateBinary()` or `xSemaphoreCreateBinaryStatic()` | ||
* APIs, before 'giving' them. Binary semaphores created using | ||
* `xSemaphoreCreateBinary()` or `xSemaphoreCreateBinaryStatic()`, | ||
* are created in a state such that the semaphore must first be | ||
* 'given' using xSemaphoreGive() API before it can be 'taken' using | ||
* xSemaphoreTake() API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new statement seems misleading. Taking a binary semaphore that has not yet been given results in a yield, not getting stuck in this loop. However, taking a binary semaphore before it has been created does end up this way. Item 4 above covers that case already.
There's nothing wrong with attempting to take a binary semaphore that has not yet been given -- in fact that is the most common use of a binary semaphore -- so maybe we're better off without this new statement in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are exactly right @jefftenney. I verified that with the following code on a Cortex-M7:
void Task( void * param )
{
SemaphoreHandle_t xBinarySem;
( void ) param;
xBinarySem = xSemaphoreCreateBinary();
xSemaphoreTake( xBinarySem, portMAX_DELAY );
for( ;; )
{
HAL_GPIO_TogglePin( LD1_GPIO_Port, LD1_Pin );
vTaskDelay( pdMS_TO_TICKS( 1000 ) );
}
}
void app_main( void )
{
xTaskCreate( Task,
"Task",
configMINIMAL_STACK_SIZE,
NULL,
tskIDLE_PRIORITY | portPRIVILEGE_BIT,
NULL );
vTaskStartScheduler();
for( ;; );
}
I'll revert it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in this PR - #1056. Thanks again @jefftenney !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jefftenney and @aggarg , thank you both for taking a look at this. Is there an explanation for the behavior I was seeing, then? I have a TaskSetup()
function inside main()
which gets called before we call vTaskStartScheduler();
at the end of main()
. TaskSetup()
contains, among other things, this:
xI2C1mutex = xSemaphoreCreateBinaryStatic(&xI2C1mutexBuffer); // I2C1 bus
xI2C2mutex = xSemaphoreCreateBinaryStatic(&xI2C2mutexBuffer); // I2C2 bus
xI2C4mutex = xSemaphoreCreateBinaryStatic(&xI2C4mutexBuffer); // I2C4 bus
Like that, my debugger gets stuck forever inside of vListInsert()
in list.c
.
If I change it to this, however, it works fine:
xI2C1mutex = xSemaphoreCreateBinaryStatic(&xI2C1mutexBuffer); // I2C1 bus
xI2C2mutex = xSemaphoreCreateBinaryStatic(&xI2C2mutexBuffer); // I2C2 bus
xI2C4mutex = xSemaphoreCreateBinaryStatic(&xI2C4mutexBuffer); // I2C4 bus
xSemaphoreGive(xI2C1mutex);
xSemaphoreGive(xI2C2mutex);
xSemaphoreGive(xI2C4mutex);
Furthermore, I think bullet 4 still needs something extra. It says:
* 4) Using a queue or semaphore before it has been initialised or
* before the scheduler has been started (are interrupts firing
* before vTaskStartScheduler() has been called?).
So, does this include regular mutexes, recursive mutexes, and binary semaphores? Or, just binary semaphores? I mean, a mutex is a type of semaphore.
Also, what does it mean to "be initialised"?
@jefftenney , your wording above, "taking a binary semaphore before it has been created", makes more sense than using the word "initialised", unless initialized means something different, such as "created and taken", as seems to be implied by the official documentation:
The semaphore is created in the 'empty' [I read this as "uninitialized"] state, meaning the semaphore must first be given using the xSemaphoreGive() API function before it can subsequently be taken (obtained) using the xSemaphoreTake() function.
So, something in this documentation needs to be improved and or clarified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ElectricRCAircraftGuy The term "create" and "initialize" are equivalent here. One possible improvement to the official docs would be to use "create" exclusively, because that term matches the API function names.
The term "empty" does not refer to the created/uncreated state. Instead it describes one of the two states of a semaphore after it has been created. It is either empty or full. Some people say taken or given. Some people say unsignaled or signaled. Application usage of the semaphore changes it back and forth between these two states.
Another improvement could be to add clarity as you suggested. Something like this (probably needs work):
The semaphore is created in the 'empty' state, meaning the semaphore must first be given using the xSemaphoreGive() API function before xSemaphoreTake() can successfully take the semaphore.
All of this still leaves you with your mystery. Do any of the 5 items listed in the code comment apply? Something might be corrupting kernel data structures and your startup calls to xSemaphoreGive() might be hiding the issue.
As pointed out by Jeff Tenney, the comment introduced in the PR is not accurate. Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
…t problems which may cause code to get stuck inside of list.c (FreeRTOS#1051) list.c: improve documentation about initializing binary semaphores
As pointed out by Jeff Tenney, the comment introduced in the PR is not accurate. Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
Description
Improve comments to help people when debugging. This would have saved me weeks of time. An uninitialized binary semaphore (ie: one which has not been given yet) will result in getting stuck inside of
vListInsert()
. You must give it before taking it.Test Steps
Create a semaphore, start the scheduler, try to take it. You'll get stuck in a debugger inside the for loop in
VListInsert()
.Now, right after creating it, but before starting the scheduler, give it. Start the scheduler. Now, when it is taken it works fine.
Checklist:
Can someone please run necessary tests on your end to ensure this is behavior you see too?
Related Issue
NA
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.