Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SumType becomes less memory efficient with non-isbits fields #65

Closed
Tortar opened this issue Jan 21, 2024 · 9 comments
Closed

SumType becomes less memory efficient with non-isbits fields #65

Tortar opened this issue Jan 21, 2024 · 9 comments

Comments

@Tortar
Copy link
Contributor

Tortar commented Jan 21, 2024

I noticed this when developing https://github.com/JuliaDynamics/MixedStructTypes.jl e.g.

using DynamicSumTypes

@sum_structs :on_types A{X,Y} begin
           @kwdef mutable struct B{X}
               a::Tuple{X, X} = (1,1)
               b::Tuple{Float64, Float64} = (1.0, 1.0)
               const c::Symbol = :s
           end
           @kwdef mutable struct C
               a::Tuple{Int, Int} = (1,1)
               const c::Symbol = :q
               d::Int32 = Int32(2)
               e::Bool = false
           end
           @kwdef struct D{Y}
               a::Tuple{Int, Int} = (1,1)
               c::Symbol = :s
               f::Y = 2
               g::Tuple{Complex, Complex} = (im, im)
           end
       end


vec_a = A[rand((B,C,D))() for _ in 1:10^5]

now

Base.summarysize(vec_a) #5053868 (wow!)

let's use instead

using DynamicSumTypes

@sum_structs :on_types A{X,Y} begin
           @kwdef mutable struct B{X}
               a::Tuple{X, X} = (1,1)
               b::Tuple{Float64, Float64} = (1.0, 1.0)
               const c::Symbol = :s
           end
           @kwdef mutable struct C
               a::Tuple{Int, Int} = (1,1)
               const c::Symbol = :q
               d::Int32 = Int32(2)
               e::Bool = false
           end
           @kwdef struct D{Y}
               a::Tuple{Int, Int} = (1,1)
               c::Symbol = :s
               f::Y = "s"  ### non isbit
               g::Tuple{Complex, Complex} = (im, im)
           end
       end


vec_a = A[rand((B,C,D))() for _ in 1:10^5]

now

Base.summarysize(vec_a) #8863933

Is there a way to improve this? Or is this expected? Notice that in the second case the memory occupied is very similar to just a compactification of all fields in a unique struct

@Tortar
Copy link
Contributor Author

Tortar commented Jan 21, 2024

ok actually changing struct D{Y} in mutable struct D{Y} gets a better memory efficiency, in this case vec_a requires 6933221 bytes, still I'm a bit puzzled by these results.

(anyway, what all of that expands to which could be relevant to this (in the first case here) is just

SumTypes.@sum_type A{X, Y} begin
        B{X}(ht::var"##B#225"{X})
        C(ht::var"##C#227")
        D{Y}(ht::var"##D#228"{Y})
    end

where the inner types are mutable/immutable structs with all fields defined as above.)

@Tortar Tortar changed the title SumType becomes memory inefficient when any variant contains a non-isbit type SumType becomes less memory efficient when any variant contains a non-isbit type Jan 21, 2024
@MasonProtter
Copy link
Owner

So one thing to notice is that your vec_a abstractly typed (missing parameters). I suspect you actually wanted to do

vec_a = A{Int, Int}[rand((B,C,D))() for _ in 1:10^5];

which brings it down to 4860060 bytes. (48 bytes per element) which seems pretty reasonable, since the biggest variant you have is D and, D{Int} should have a footprint of 40 bytes, and then there needs to be a discriminator for the union (i.e. a flag for whether it stores a B, C, or D), which presumably gets bumped to 8 bytes for the purposes of padding.

And with the non-isbits definition that stored a string, I get

julia> vec_a = A{Int, Int}[rand((B,C,D))() for _ in 1:10^5];

julia> Base.summarysize(vec_a)
4325972

julia> Base.summarysize(vec_a) / 10^5
46.94708

so that's also good.


Regarding the increase in footprint from the non-isbits "a", I'm not totally sure what's causing that to be honest. I guess just more pointer indirection meaning there needs to be a pointer in D and also the actual string out there somewhere.

@Tortar
Copy link
Contributor Author

Tortar commented Jan 21, 2024

thank you a lot (I always forget to add parameters :( )

so the last case where memory increase is just when the vector is abstractly typed, right? Then I think this is not really that problematic, so I think this issue can be closed if you agree, or anyway I will change the title to reflect reality

@Tortar Tortar changed the title SumType becomes less memory efficient when any variant contains a non-isbit type SumType becomes less memory efficient with non-isbits field in an abstract container Jan 21, 2024
@Tortar Tortar changed the title SumType becomes less memory efficient with non-isbits field in an abstract container SumType becomes less memory efficient with non-isbits fields when stored in an abstract container Jan 21, 2024
@Tortar
Copy link
Contributor Author

Tortar commented Jan 21, 2024

And with the non-isbits definition that stored a string, I get

mmh I tried it now, I think that you mean the one which stores an Int right? Then I think I misunderstood you in my last comment above


Yes, with a String it is less efficient, but also strangely Base.summarysize oscillates

julia> vec_a = A{Int, String}[rand((B,C,D))() for _ in 1:10^5];

julia> Base.summarysize(vec_a)
6561005

julia> Base.summarysize(vec_a)
6204757

julia> Base.summarysize(vec_a)
6039637

julia> Base.summarysize(vec_a)
5999493

julia> Base.summarysize(vec_a)
6170045

julia> Base.summarysize(vec_a)
6450517

@Tortar Tortar changed the title SumType becomes less memory efficient with non-isbits fields when stored in an abstract container SumType becomes less memory efficient with non-isbits fields Jan 21, 2024
@MasonProtter
Copy link
Owner

MasonProtter commented Jan 21, 2024

Yes, with a String it is less efficient, but also strangely Base.summarysize oscillates

Yikes, that is weird, I have no idea what could cause that.

mmh I tried it now, I think that you mean the one which stores an Int right?

No I meant the String one I think?

@Tortar
Copy link
Contributor Author

Tortar commented Jan 22, 2024

No I meant the String one I think?

okay, but I think then that the memory usage is more on 6 million bytes than on the 4.3 million in this case, not totally sure about the amount because Base.summarysize is not totally sure either :D (in general it is never sure on sum types, and sometimes it goes almost 2x on some runs in respect to others in some cases I tested)

Do you think this is worth an issue in the Julia repo?

@Tortar
Copy link
Contributor Author

Tortar commented Jan 26, 2024

Opened an issue about Base.summarysize: JuliaLang/julia#53061, seems like it happens also in simpler structs

@Tortar
Copy link
Contributor Author

Tortar commented Jun 14, 2024

just an update because the issue was fixed on Julia nightly, on it:

# first version in the main comment
julia> vec_a = A[rand((B,C,D))() for _ in 1:10^5];

julia> Base.summarysize(vec_a)
4526700

julia> vec_a = A{Int, Int}[rand((B,C,D))() for _ in 1:10^5];

julia> Base.summarysize(vec_a)
3739036

# second version in the main comment
julia> vec_a = A[rand((B,C,D))() for _ in 1:10^5];

julia> Base.summarysize(vec_a)
5866653

julia> vec_a = A{Int, String}[rand((B,C,D))() for _ in 1:10^5];

julia> Base.summarysize(vec_a)
5066685

it seems better now with the right numbers

@Tortar
Copy link
Contributor Author

Tortar commented Jul 2, 2024

Actually now that I investigated it a bit more, in Julia 1.11 where the issue with Base.summarysize has been fixed, I actually see very often that sum types outperform Union types by quite a bit usually, see https://github.com/JuliaDynamics/DynamicSumTypes.jl?tab=readme-ov-file#micro-benchmarks, there a sum type is 3 times more memory efficient than a Union!! And relooking on the updated numbers here I posted before it seems alright also

@Tortar Tortar closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants