Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are raw pointers and unsafe actually needed? #68

Open
AxFaure opened this issue Jan 7, 2019 · 4 comments
Open

Are raw pointers and unsafe actually needed? #68

AxFaure opened this issue Jan 7, 2019 · 4 comments
Labels

Comments

@AxFaure
Copy link

AxFaure commented Jan 7, 2019

Hello,

This is probably more of an open question than an actual issue.

There seems to be quite a lot of unsafe code in this crate (mostly pointer handling) and there are some issues that mention memory unsafety. So I was wondering: Is this unsafe code actually needed?

Let me know if I am wrong but I guess the use of pointers was preferred over Rc/Weak references to get better performances. But, on the other hand, this crate offers a DOM interface which is expected to be a bit slower and heavier than SAX, and we usually expect memory safety from rust crates.
So, do you think replacing the unsafe code would have so much of an impact on performance that it is worth the potential safety/security issues of dealing with complex pointers relationships?

I am in no way an expert in Rust (or XML parsing) by the way so do not hesitate to let me know if I am totally wrong.

As a completely unrelated side note, you seem to have worked quite a lot on this crate (thank you for sharing it with us!), so I was wondering if you consider the DOM part of the project to be production ready or if you think it still needs more work?

Thanks

@AxFaure AxFaure changed the title Are raw pointers and unsafe actually needed Are raw pointers and unsafe actually needed? Jan 7, 2019
@shepmaster
Copy link
Owner

Let me know if I am wrong but I guess the use of pointers was preferred over Rc/Weak references to get better performances.

That is correct. One of the stated goals is to beat out libxml2. Everything I know about Rust says that this should be possible, or at least get within a very close margin.

To that end, I periodically compare the performance of sxd-document / sxd-xpath against similar libxml2 implementations. I have yet to get as close as I'd like.

slower and heavier than SAX

It's true (and there's a SAX implementation inside of the code, just not exposed as a public API) for this reason. Theoretically, parsing the XML itself should require a constant amount of heap allocation and hopefully be speedy.

replacing the unsafe code would have so much of an impact on performance that it is worth the potential safety/security issues

I think if you look back far enough in the history, the DOM did originally use Rc and IIRC there was a benefit to changing it.

That being said, I'm not against experimenting with alternate ways of implementing the "DOM". There's a secret API I've named the "thin" API, for example. I'm almost certain that the thindom is hopelessly broken, but it's an attempt at a Rust-first thinking about the problem of modeling XML documents. It wouldn't surprise me if there were other, better ones out there.

If someone were motivated to give it a shot, I'd love to help guide them in experimenting with some different implementations.

@shepmaster
Copy link
Owner

consider the DOM part of the project to be production ready

yes

or if you think it still needs more work?

yes 😄

I don't have much of a reason to use SXD in my own day-to-day work, which means that certain pieces are likely lacking. You can see that a recent release finally added the ability to remove children, mostly because I was generally building up documents, not modifying existing ones. It never even occurred to me that these were missing! (Rather embarrassing). Of course, we are happy to get issue reports and PRs improving things!

@AxFaure
Copy link
Author

AxFaure commented Jan 9, 2019

Thank you for the answers. That's very clear.

I will try to look for an alternative solution for thindom and/or potential performance improvement when I have some time.

By the way, I assume you have defined some kind of benchmark you use to compare this crate and libxml2. Maybe you could share it and put something about it in the Readme. It may help bring some attention and help on this issue (for a newcomer, it is not obvious that there is a specific goal which is not reached yet in this area)

@shepmaster
Copy link
Owner

Maybe you could share it and put something about it in the Readme

Not a bad idea, but it involves a 100MB XML file, which isn't really the kind of thing you should put into git. I'll have to figure something out for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants