Skip to content

Rhs assembly alone #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 21, 2021
Merged

Rhs assembly alone #40

merged 17 commits into from
Oct 21, 2021

Conversation

amartinhuertas
Copy link
Member

@amartinhuertas amartinhuertas commented Oct 16, 2021

Hi @fverdugo,

In this PR, I came up with an aproach to implement the rhs assembly for SubAssembledRows(), which is the complicated case. Let us decide if this is the way to go, possible improvements, etc.. The PR also solves the rhs assembly for the FullyAssembledRows() strategy, although I would say there is not much to discuss here, this strategy is trivial.

The main difficulty comes from the fact that SubAssembledRows() requires the lids of the ghost dofs which are touched during the assembly loop over owned cells. In the matrix_and_vector_assemble case, we are somehow cheating, because this information comes from the matrix assembly process, namely in the row identifiers array I. The PvectorBuilder/PVectorCounter/PVectorAllocation set of assemble interface implementations alone do not seem to serve the purpose of providing this information (see here).

What I did to overcome this limitation is (perhaps there are other cleaner options within the current software design constraints, I am open to reconsideration):

  1. I implemented a symbolic assembly loop to track the ghost dofs touched from owned cells. This required me to overwrite the current implementation of the symbolic assemly loop in Gridap. See here. If we agree this is the way to go, this code should be moved to Gridap.
  2. The symbolic assembly loop is leveraged with a new set of types to implement the assemble interface abstractions. Namely, the ArrayCounterSubAssembledRows{T,A} type for the Gridap assembler, and the counterparts of the PvectorBuilder/PVectorCounter/PVectorAllocation types for the GridapDistributed assembler.
  3. In order to leverage the data types implemented in 2., I had to intercept (overrride) the assemble_vector function in Gridap for the particular case in which we have the SubAssembledRows strategy. Besides, I cannot leverage the input DistributedAssembler for the symbolic stage, as it was by default initialized with the PvectorBuilder type, which does not serve the purpose. As a result, I have to create an-hoc DistributedAssembler for the symbolic stage.This is arguibly the most controversial part of the PR. See here.

I am far from 100% sure, but I have the impression that having the MatrixBuilder, VectorBuilder, and AssemblyStrategy instances tighted to the Assembler, and a unique Assembler in the program is somehow jeopardizing the flexibility of the current sofware design. To be general, one might want to select different assembly strategies to assemble the vector in assemble_matrix_vector and assemble_vector. Does it make sense to have SparseMatrixAssembler and ArrayAssembler separately? Anyway this is not the only issue in this particular scenario, as I needed to use different assembly strategies in the symbolic and numerical loop.

@codecov-commenter
Copy link

codecov-commenter commented Oct 16, 2021

Codecov Report

Merging #40 (f9fca06) into release-0.2 (bfa427b) will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           release-0.2     #40   +/-   ##
===========================================
  Coverage         0.00%   0.00%           
===========================================
  Files               33      33           
  Lines             3135    3205   +70     
===========================================
- Misses            3135    3205   +70     
Impacted Files Coverage Δ
src/Algebra.jl 0.00% <0.00%> (ø)
src/FESpaces.jl 0.00% <0.00%> (ø)
src/Geometry.jl 0.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bfa427b...f9fca06. Read the comment docs.

@amartinhuertas
Copy link
Member Author

amartinhuertas commented Oct 16, 2021

as I needed to use different assembly strategies in the symbolic and numerical loop.

Ok. I just realized that this is just an optimization. We could leverage the same assembly strategy for the symbolic and numeric loops, at the price of touching both owned and ghost dofs during the assembly loop over the owned cells. The need for different DistributedAssemblers persists.

@amartinhuertas
Copy link
Member Author

Ok. I just realized that this is just an optimization. We could leverage the same assembly strategy for the symbolic and numeric loops, at the price of touching both owned and ghost dofs during the assembly loop over the owned cells. The need for different DistributedAssemblers persists.

Done in ce94cf5

@amartinhuertas
Copy link
Member Author

perhaps there are other cleaner options within the current software design constraints

After speaking with @fverdugo ... it turns our there is a better solution. I will implement it soon.

Mi plan para el assembly del rhs era implementar una PVectorAllocation con dos campos (vectores locales con las entradas, y vector locales de booleanos indicando touched rows). El PVectorCounter quedaria como ahora sin singun cambio. Creo que hay que hacer "touch" de las rows sobre la misma allocation, no sobre el counter, y entonces todo encaja.

f8e383d following fverdugo comments

Essential a different PVectorAllocation instance is used for
SubAssembledRows versus FullyAssembledRows
@amartinhuertas
Copy link
Member Author

Mi plan para el assembly del rhs era implementar una PVectorAllocation con dos campos (vectores locales con las entradas, y vector locales de booleanos indicando touched rows). El PVectorCounter quedaria como ahora sin singun cambio. Creo que hay que hacer "touch" de las rows sobre la misma allocation, no sobre el counter, y entonces todo encaja

@fverdugo ... I already followed your advice. Now everything fits better. In particular, the following

As a result, I have to create an-hoc DistributedAssembler for the symbolic stage.

which was the most critical point, is no longer needed. We do not longer need the symbolic assembly loop for dense vectors neither.

HOWEVER, there is still a line of improvement. The current implementation also keeps track of rows touched from owned cells even when the workflow is called from assemble_matrix_and_vector. As commented above, this is not needed in this particular case, as we can retrieve the info from the I vector.

A possible solution that comes into my mind is to enrich the nz_allocation signature such that it gets a hint in regards to from the assembly of what object it is participating, either matrix_vector, or vector. Another alternative is to have different assemblers for matrix, vector, and matrix_and_vector (much more work). Any other idea?

@fverdugo
Copy link
Member

fverdugo commented Oct 18, 2021

A possible solution that comes into my mind is to enrich the nz_allocation signature such that it gets a hint in regards to from the assembly of what object it is participating, either matrix_vector, or vector. Another alternative is to have different assemblers for matrix, vector, and matrix_and_vector (much more work). Any other idea?

we can compute the matrix and vector allocation in the same function

A_alloc, b_alloc = nz_allocation(A_counter,b_counter)

and by default

function nz_allocation(a,b)
  nz_allocation(a), nz_allocation(b)
end

One can overload this function and return a PVector directly instead of a PVectorAllocation in this case.

I have used a similar approach for create_from_nz (see branch https://github.com/gridap/Gridap.jl/tree/gridap_distributed/src)

You also need to call the 2-argument version of nz_allocation in the right places. E.g. for create_from_nz now we are doing this:

https://github.com/gridap/Gridap.jl/blob/14baee102e50491d181b10cfbb9d52ea395ebec5/src/FESpaces/SparseMatrixAssemblers.jl#L119

@fverdugo
Copy link
Member

Feel free to introduce the 2-argument nz_allocation in Gridap#gridap_distributed

Copy link
Member

@fverdugo fverdugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments above

@amartinhuertas
Copy link
Member Author

Feel free to introduce the 2-argument nz_allocation in Gridap#gridap_distributed

Done in gridap/Gridap.jl#677

  nz_allocation signature in Gridap.
* Systematic FESpacesTests with both Distributed Assembly Strategies.
@amartinhuertas
Copy link
Member Author

FYI @fverdugo still working on this PR ... not ready to merge

* Manifest for PartitionedArrays pointing temporaririly
  to different branch
@amartinhuertas
Copy link
Member Author

Hi @fverdugo,

we can compute the matrix and vector allocation in the same function

I have already solved this.

Besides, I have also enhaced the tests in FESpacesTests.jl. Along the way I have found a new issue with the assemble_XXX! functions which I have added to #39 as new prioritary task. We will need this for nonlinear problems. At present, the corresponding tests are broken.

Do we solve these issues here or use a different PR? (I did not investigate much the source of the problem yet)

src/Algebra.jl Outdated
par_strategy::A
values::B
rows::C
struct PVectorAllocationTrackOnlyValues{A,B}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this struct is not needed since PVector itself can play this role.

Copy link
Member Author

@amartinhuertas amartinhuertas Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following code. It crashed, though (error message below). Not sure why, I do not have all details in my mind. It seems that the assembly loop uses the local dof ids stored in FE space. It seems these produce out of bounds error when used to address the PVector without ghost dofs. Before I start debugging to understand the problem in detail, any hint? Might this be related with the issues in assemble_xxx! functions?

function Arrays.nz_allocation(a::PVectorCounter{<:FullyAssembledRows})
  dofs = a.rows
  values = map_parts(nz_allocation,a.counters)
  _create_pvector_wo_ghosts(values,dofs)
  #PVectorAllocationTrackOnlyValues(values,dofs)
end

function _create_pvector_wo_ghosts(values,dofs)
  # Create PRange for the rows of the linear system
  parts = get_part_ids(values)
  ngdofs = length(dofs)
  nodofs = map_parts(num_oids,dofs.partition)
  first_grdof = map_parts(first_gdof_from_ids,dofs.partition)

  # This one has not ghost rows
  rows = PRange(parts,ngdofs,nodofs,first_grdof)

  # The ghost values in b_fespace are aligned with the FESpace
  # but not with the ghost values in the rows of A
  b_fespace = PVector(values,dofs)

  # This one is aligned with the rows of A
  b = similar(b_fespace,eltype(b_fespace),(rows,))
end
julia> FESpacesTests.main(parts)
ERROR: BoundsError: attempt to access 2-element Array{Float64,1} at index [3]
Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] add_entry! at /home/amartin/.julia/packages/Gridap/70scN/src/Algebra/AlgebraInterfaces.jl:108 [inlined]
 [3] add_entry! at /home/amartin/.julia/packages/Gridap/70scN/src/Algebra/AlgebraInterfaces.jl:98 [inlined]
 [4] _add_entries! at /home/amartin/.julia/packages/Gridap/70scN/src/Algebra/AlgebraInterfaces.jl:185 [inlined]
 [5] add_entries! at /home/amartin/.julia/packages/Gridap/70scN/src/Algebra/AlgebraInterfaces.jl:169 [inlined]
 [6] evaluate!(::Nothing, ::Gridap.Arrays.AddEntriesMap{typeof(+)}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Int64,1}) at /home/amartin/.julia/packages/Gridap/70scN/src/Arrays/AlgebraMaps.jl:51
 [7] _numeric_loop_matvec!(::Gridap.Algebra.AllocationCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Array{Int64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Tuple{Nothing,Nothing,Tuple{Tuple{Nothing,Gridap.Arrays.CachedArray{Float64,1,Array{Float64,1}},Tuple{Nothing,Tuple{Tuple{Nothing,Gridap.Arrays.CachedArray{Float64,1,Array{Float64,1}},Tuple{Gridap.Arrays.CachedArray{Int32,1,Array{Int32,1}}}},Gridap.Arrays.IndexItemPair{Int64,Array{Float64,1}}},Nothing}},Gridap.Arrays.IndexItemPair{Int64,Tuple{Array{Float64,2},Array{Float64,1}}}},Tuple{Tuple{Nothing,Gridap.Arrays.CachedArray{Int64,1,Array{Int64,1}},Tuple{Gridap.Arrays.CachedArray{Int32,1,Array{Int32,1}}}},Gridap.Arrays.IndexItemPair{Int64,Array{Int64,1}}},Tuple{Tuple{Nothing,Gridap.Arrays.CachedArray{Int64,1,Array{Int64,1}},Tuple{Gridap.Arrays.CachedArray{Int32,1,Array{Int32,1}}}},Gridap.Arrays.IndexItemPair{Int64,Array{Int64,1}}}}, ::Gridap.Arrays.LazyArray{FillArrays.Fill{Gridap.CellData.AttachDirichletMap,1,Tuple{Base.OneTo{Int64}}},Tuple{Array{Float64,2},Array{Float64,1}},1,Tuple{FillArrays.Fill{Tuple{Array{Float64,2},Array{Float64,1}},1,Tuple{Base.OneTo{Int64}}},Gridap.Arrays.LazyArray{FillArrays.Fill{Gridap.Arrays.Broadcasting{Gridap.Arrays.PosNegReindex{Array{Float64,1},Array{Float64,1}}},1,Tuple{Base.OneTo{Int64}}},Array{Float64,1},1,Tuple{Gridap.Arrays.Table{Int32,Array{Int32,1},Array{Int32,1}}}},Array{Bool,1}}}, ::Gridap.Arrays.LazyArray{FillArrays.Fill{Gridap.FESpaces.AssemblyStrategyMap{:rows,Gridap.FESpaces.GenericAssemblyStrategy{typeof(identity),typeof(identity),GridapDistributed.var"#159#161"{Array{Int32,1}},GridapDistributed.var"#160#162"}},1,Tuple{Base.OneTo{Int64}}},Array{Int64,1},1,Tuple{Gridap.Arrays.Table{Int32,Array{Int32,1},Array{Int32,1}}}}, ::Gridap.Arrays.LazyArray{FillArrays.Fill{Gridap.FESpaces.AssemblyStrategyMap{:cols,Gridap.FESpaces.GenericAssemblyStrategy{typeof(identity),typeof(identity),GridapDistributed.var"#159#161"{Array{Int32,1}},GridapDistributed.var"#160#162"}},1,Tuple{Base.OneTo{Int64}}},Array{Int64,1},1,Tuple{Gridap.Arrays.Table{Int32,Array{Int32,1},Array{Int32,1}}}}) at /home/amartin/.julia/packages/Gridap/70scN/src/FESpaces/SparseMatrixAssemblers.jl:341
 [8] numeric_loop_matrix_and_vector!(::Gridap.Algebra.AllocationCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Array{Int64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Gridap.FESpaces.GenericSparseMatrixAssembler, ::Tuple{Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1}}}) at /home/amartin/.julia/packages/Gridap/70scN/src/FESpaces/SparseMatrixAssemblers.jl:324
 [9] #3 at ./generator.jl:36 [inlined]
 [10] iterate at ./generator.jl:47 [inlined]
 [11] collect_to!(::Array{Tuple{Gridap.Algebra.AllocationCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Array{Int64,1},Array{Float64,1}},Array{Float64,1}},2}, ::Base.Generator{Base.Iterators.Zip{Tuple{Array{Gridap.Algebra.AllocationCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Array{Int64,1},Array{Float64,1}},2},Array{Array{Float64,1},2},Array{Gridap.FESpaces.GenericSparseMatrixAssembler,2},Array{Tuple{Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1}}},2}}},Base.var"#3#4"{typeof(Gridap.FESpaces.numeric_loop_matrix_and_vector!)}}, ::Int64, ::NTuple{4,Int64}) at ./array.jl:732
 [12] collect_to_with_first! at ./array.jl:710 [inlined]
 [13] collect(::Base.Generator{Base.Iterators.Zip{Tuple{Array{Gridap.Algebra.AllocationCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Array{Int64,1},Array{Float64,1}},2},Array{Array{Float64,1},2},Array{Gridap.FESpaces.GenericSparseMatrixAssembler,2},Array{Tuple{Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1}}},2}}},Base.var"#3#4"{typeof(Gridap.FESpaces.numeric_loop_matrix_and_vector!)}}) at ./array.jl:691
 [14] map at ./abstractarray.jl:2248 [inlined]
 [15] map_parts at /home/amartin/.julia/packages/PartitionedArrays/evV43/src/SequentialBackend.jl:52 [inlined]
 [16] numeric_loop_matrix_and_vector!(::GridapDistributed.DistributedAllocationCOO{GridapDistributed.FullyAssembledRows,SequentialData{Gridap.Algebra.AllocationCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},Array{Int64,1},Array{Float64,1}},2},PRange{SequentialData{IndexSet,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing},PRange{SequentialData{IndexSet,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing}}, ::PVector{Float64,SequentialData{Array{Float64,1},2},PRange{SequentialData{IndexRange,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing}}, ::GridapDistributed.DistributedSparseMatrixAssembler{GridapDistributed.FullyAssembledRows,SequentialData{Gridap.FESpaces.GenericSparseMatrixAssembler,2},GridapDistributed.PSparseMatrixBuilderCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},GridapDistributed.FullyAssembledRows},GridapDistributed.PVectorBuilder{Array{Float64,1},GridapDistributed.FullyAssembledRows},PRange{SequentialData{IndexSet,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing},PRange{SequentialData{IndexSet,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing}}, ::SequentialData{Tuple{Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1}}},2}) at /home/amartin/git-repos/GridapDistributed.jl/src/FESpaces.jl:407
 [17] assemble_matrix_and_vector(::GridapDistributed.DistributedSparseMatrixAssembler{GridapDistributed.FullyAssembledRows,SequentialData{Gridap.FESpaces.GenericSparseMatrixAssembler,2},GridapDistributed.PSparseMatrixBuilderCOO{SparseArrays.SparseMatrixCSC{Float64,Int64},GridapDistributed.FullyAssembledRows},GridapDistributed.PVectorBuilder{Array{Float64,1},GridapDistributed.FullyAssembledRows},PRange{SequentialData{IndexSet,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing},PRange{SequentialData{IndexSet,2},Exchanger{SequentialData{Array{Int32,1},2},SequentialData{Table{Int32},2}},Nothing}}, ::SequentialData{Tuple{Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1},Array{Any,1}},Tuple{Array{Any,1},Array{Any,1}}},2}) at /home/amartin/.julia/packages/Gridap/70scN/src/FESpaces/SparseMatrixAssemblers.jl:137
 [18] test_fe_spaces(::SequentialData{Int64,2}, ::WithGhost, ::GridapDistributed.FullyAssembledRows) at /home/amartin/git-repos/GridapDistributed.jl/test/FESpacesTests.jl:53
 [19] main(::SequentialData{Int64,2}) at /home/amartin/git-repos/GridapDistributed.jl/test/FESpacesTests.jl:119
 [20] top-level scope at REPL[2]:1

Copy link
Member Author

@amartinhuertas amartinhuertas Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this struct is not needed since PVector itself can play this role.

This is true, one may use PVector itself, instead of PVectorAllocationTrackOnlyValues. However, in the final version I have decided to leave PVectorAllocationTrackOnlyValues. The reason is that I expect the latter to be faster, as we do not have indirection via LocalView in PArrays. In any case, this only applies for assemble_xxx and allocate_xxx functions. For assemble_xxx! we have to live with this indirection, I guess.

@fverdugo fverdugo self-requested a review October 19, 2021 05:55
Copy link
Member

@fverdugo fverdugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Se extra comments

@fverdugo
Copy link
Member

It seems that the assembly loop uses the local dof ids stored in FE space. It seems these produce out of bounds error when used to address the PVector without ghost dofs. Before I start debugging to understand the problem in detail, any hint? Might this be related with the issues in assemble_xxx! functions?

Yes, this is the problem and it's the cause of the issues in assemble_xxx! functions.

I already have a fix in my local machine and assemble_xxx! are working

@fverdugo
Copy link
Member

perhaps we can put this PR in stand-by til I get the nonlinear problem working (since I am fixing a lot of issues in this process)

@amartinhuertas
Copy link
Member Author

perhaps we can put this PR in stand-by til I get the nonlinear problem working (since I am fixing a lot of issues in this process)

Ok. Let me know if you need help. I will continue tomorrow with other items in the list apart from the changes agreed on gridap

@amartinhuertas
Copy link
Member Author

perhaps we can put this PR in stand-by til I get the nonlinear problem working (since I am fixing a lot of issues in this process)

I have solved all the issues in this PR. Some caveats, however:

  • I had to get the symbolic_loop_vector! back from a previous commit. We need it for allocate_vector. If we agree this is needed, then we should move this code to Gridap.
  • We are not actually testing assemble_matrix and assemble_matrix! in FESpaceTests.jl. We are just going through all the code and checking that it does not break.

@amartinhuertas
Copy link
Member Author

I had to get the symbolic_loop_vector! back from a previous commit. We need it for allocate_vector. If we agree this is needed, then we should move this code to Gridap.

Done in gridap/Gridap.jl#681

@amartinhuertas
Copy link
Member Author

We are not actually testing assemble_matrix and assemble_matrix! in FESpaceTests.jl. We are just going through all the code and checking that it does not break.

Solved in 8317378. @fverdugo ... PR ready for review. Remember to modify Manifest.toml to repoint to Gridap#gridap_distributed once the pending PR is accepted.

@amartinhuertas
Copy link
Member Author

Solved in 8317378. @fverdugo ... PR ready for review. Remember to modify Manifest.toml to repoint to Gridap#gridap_distributed once the pending PR is accepted.

@fverdugo ... all tests passing. Ready to review. I already updated the Manifest.toml file to repoint to Gridap#gridap_distributed

@fverdugo fverdugo merged commit 73de176 into release-0.2 Oct 21, 2021
@fverdugo fverdugo deleted the rhs_assembly branch October 21, 2021 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants