Skip to content

Save emulation state with MCU in binary file crashes due to MMIO callbacks #1553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
antcpl opened this issue Mar 25, 2025 · 6 comments
Open

Comments

@antcpl
Copy link

antcpl commented Mar 25, 2025

Describe the bug
I work with Cortex M3 MCU (on the dev branch) and I tried to make a snapshot into a bin file with save() but it crashes due to MMIO callbacks.

Sample Code

def save_callback(ql, *args, **kw):  
    dic = ql.save(reg=True,mem=True,hw=True,snapshot="./snapshot.bin")
    ql.stop()

ql = Qiling(["./toto.elf"],
            archtype=QL_ARCH.CORTEX_M, ostype=QL_OS.MCU, env=stm32f103, verbose=QL_VERBOSE.DISABLED)

ql.hw.create('scb')
ql.hw.create('gpioa')
ql.hw.create('usart2').watch()
ql.hw.create('rcc')
ql.hw.create('afio')
ql.hw.create('exti')
ql.hw.create('gpioc')

ql.hook_address(save_callback,0x8008680)

ql.hw.usart2.send("totototototo".encode())

ql.run(count=1000000)

del ql

Expected behavior
Perform the snapshot without any problem.

Screenshots
Here is the traceback calls :

Traceback (most recent call last):
File "qiling/examples/mcu/fuzzing_test/version8/stm32_bo_hook_before_crash.py", line 47, in
ql.run(count=1000000)
File "qilingenv/lib/python3.12/site-packages/qiling/core.py", line 588, in run
self.os.run()
File "qilingenv/lib/python3.12/site-packages/qiling/os/mcu/mcu.py", line 80, in run
self.ql.emu_start(current_address, 0, count=1)
File "qilingenv/lib/python3.12/site-packages/qiling/core.py", line 775, in emu_start
raise self.internal_exception
File "qilingenv/lib/python3.12/site-packages/qiling/core_hooks.py", line 141, in wrapper
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "qilingenv/lib/python3.12/site-packages/qiling/core_hooks.py", line 286, in _hook_addr_cb
ret = hook.call(ql)
^^^^^^^^^^^^^
File "qilingenv/lib/python3.12/site-packages/qiling/core_hooks_types.py", line 25, in call
return self.callback(ql, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "qiling/examples/mcu/fuzzing_test/version8/stm32_bo_hook_before_crash.py", line 24, in save_callback
dic = ql.save(reg=True,mem=True,hw=True,cpu_context=True,snapshot="./snapshot.bin")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "qilingenv/lib/python3.12/site-packages/qiling/core.py", line 655, in save
pickle.dump(saved_states, save_state)
AttributeError: Can't pickle local object 'QlHwManager.setup_mmio.<locals>.mmio_read_cb'

Additional context
I digged in the code to try to understand what happens here : the problem only happens when the mem=True parameter is set in the save call otherwise no problem.
I identified the following behaviour :

  • in core.py :
    save() function calls the save method in the QlMemoryManager class when the mem=True parameter is added
  • in memory.py in the save function has this line for mmios :
mem_dict['mmio'].append((lbound, ubound, perm, label, *self.mmio_cbs[(lbound, ubound)]))

self.mmio_cbs makes reference to the two mmio callbacks that are defined in the hw.py file in the setup_mmio function. Thus these two callbacks are part of the QlHwManager instance.

However these callbacks are defined in the setup_mmio function as local functions. Pickle doesn't serialize local functions. This is why the bug is happening.

Suggested correction
Honestly, quite hard to define a way to correct this. These are some of my thoughs :
Either using the dill module that allows serialization of local functions.
Either changing the scope definition of these functions.
Either not saving mmio callbacks and waiting for people to redefine them in the new restoring environment. (Not sure at all about this suggestion, I don't manage to see if this could work)

@antcpl
Copy link
Author

antcpl commented Mar 25, 2025

May I also ask the goal of saving the mmio callbacks ? In my case I can't manage to find a reason for doing that as they are just the same callbacks (read and write) for all mmios.
Wouldn't it be easier to only save the memory content and wait for people to make the same "hardware" setup in new restoring environment ?
Thanks in advance for the answer

@elicn
Copy link
Member

elicn commented Mar 26, 2025

Hi,
Thanks for analyzing and reporting this.

The reason behind the saving and restoring the MMIO methods is to allow enough flexibility to define your own pseudo device. I am not familiar with the MCU use case, but this is required for other architectures such as Intel.

I've already thought about re-designing this, and instead specifying read / write methods - use a simple class that define its own read / write methods in a similar fashion, but make it easier to maintain a state machine for that pseudo device (.. and also this is more Pythonic).

Do you think that using a class with read / write methods would solve this problem?

@antcpl
Copy link
Author

antcpl commented Mar 26, 2025

Hi, thanks for your answer.

Ok I think what you mean by "allow enough flexibility to define your own pseudo device" is what I had in mind. Actually, I will talk only for the MCU case because I'm not familiar with the rest.

I think the best answer to your question is just explaining my particular case and then try to generalize.
In the MCU case, when you use an extension device as stm32f1 for example, mmio callbacks are "standards" and the provided implementation works perfecly fine. That means when you call this line in your script :

ql.hw.create("usart2")

In the backend what happens is that a QlMemoryManager object is created and to the mmio region defined for the USART are associated the two "standard" callbacks.

Yesterday, when I wrote the messages I assumed that you included the mmio callback saving because people would just have to call this line in the restoring script :

ql.restore("snapshot.bin")

Without making again all the "hardware setup stuff" (ie ql.hw.create("usart2")) as this info would be included in the snapshot.

Actually, when you use the extension mcus as in my case, saving the callbacks is totally useless as they are standards. To validate my assumption I tried this : I just disabled the mmio callback part in the save and restore functions and I tried to make a snapshot in a bin file, restore the state from it in another script and run again. It worked perfectly fine as in my restoring script I made the same hardware setup as in my saving script (ie making all the hw.create() same calls).

Thus, with my quite narrow understanding of Qiling as I never used it for another thing than emulate mcus, saving callbacks is just an entry point for potential errors. I think that if I would define my own pseudo device in the "mcu environment" I would replicate exactly the setup you made with stm32fx for example and the conclusion would still be the same.

However, as there are a lot of different use cases which differs a lot from mine, if you want to preserve this way working I would say that yes a class with read/write methods could solve the problem in this particular case as the root cause is just the local scope of these two functions and the Pickle module which doesn't handle that.

Hope I have been clear and my answer could help !

@elicn
Copy link
Member

elicn commented Mar 26, 2025

Got it. What I meant is we can modify the design in a way that it stays functionally equal to the current state, but instead of having two functions which cannot be pickled, we use a class. I understand that in the MCU case it is not very important and this setup happens under the hood anyways, but it will be still beneficial for other archs. My question was essentially whether this change (a class instead of two functions) will change anything regarding the pickling process.

@antcpl
Copy link
Author

antcpl commented Mar 27, 2025

Understood, if we only focus on the pickling process here what I found on the documentation page "Note that functions (built-in and user-defined) are pickled by fully qualified name, not by value. [2] This means that only the function name is pickled, along with the name of the containing module and classes. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised. [3]"

So, yes I think the implementation with a class and two functions would solve the pickling problem. Just be careful with the functions definitions they must be defined using def and not lambda. And of course must not be locals but that obvious.

@elicn
Copy link
Member

elicn commented Mar 27, 2025

Thanks for the pointers. I'll see if I can come up with something soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants