Skip to content

Commit 2e8572c

Browse files
committed
added solutions to the last web
1 parent 21d49ff commit 2e8572c

File tree

1 file changed

+28
-13
lines changed

1 file changed

+28
-13
lines changed

Diff for: docs/src/lecture_11/lab.md

+28-13
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,8 @@ using Metal
7575
x = randn(Float32, 60, 60)
7676
y = randn(Float32, 60, 60)
7777

78-
mx = MtlArray(x)
79-
my = MtlArray(y)
78+
mx = CuArray(x)
79+
my = CuArray(y)
8080

8181
@info "" x*y Matrix(mx*my)
8282

@@ -86,7 +86,7 @@ my = MtlArray(y)
8686
This may not be anything remarkable, as such functionality is available in many other languages
8787
albeit usually with a less mathematical notation like `x.dot(y)`. With Julia's multiple dispatch, we
8888
can simply dispatch the multiplication operator/function `*` to a specific method that works on
89-
`MtlArray` type. You can check with `@code_typed`:
89+
`CuArray` type. You can check with `@code_typed`:
9090
```julia
9191
julia> @code_typed mx * my
9292
CodeInfo(
@@ -124,7 +124,7 @@ Let's now explore what the we can do with this array programming paradigm on som
124124
# rgb_img = FileIO.load("image.jpeg");
125125
# gray_img = Float32.(Gray.(rgb_img));
126126
gray_img = rand(Float32, 10000, 10000)
127-
cgray_img = MtlArray(gray_img)
127+
cgray_img = CuArray(gray_img)
128128
```
129129

130130
**HINTS**:
@@ -222,7 +222,7 @@ In the next example we will try to solve a system of linear equations $Ax=b$, wh
222222

223223
**BONUS 1**: Visualize the solution `x`. What may be the origin of our linear system of equations?
224224

225-
**BONUS 2**: Use sparse matrix `A` to achieve the same thing. Can we exploit the structure of the matrix for a more effective solution?
225+
**BONUS 2**: Use sparse matrix `A` to achieve the same thing. Can we exploit the structure of the matrix for a more effective solution? Be aware though that `\` is not implemented for sparse structures by default.
226226

227227
!!! details "Solution"
228228
```julia
@@ -323,14 +323,28 @@ int main() {
323323
Compared to CUDA C the code is less bloated, while having the same functionality.[^4]
324324
```julia
325325
function vadd(a, b, c)
326-
# CUDA.jl
327-
# i = (blockIdx().x-1) * blockDim().x + threadIdx().x
326+
i = (blockIdx().x-1) * blockDim().x + threadIdx().x
327+
c[i] = a[i] + b[i]
328+
return
329+
end
328330
329-
# Metal.jl
331+
len = 100
332+
a = rand(Float32, len)
333+
b = rand(Float32, len)
334+
d_a = CuArray(a)
335+
d_b = CuArray(b)
336+
d_c = similar(d_a)
337+
@cuda threads = len vadd(d_a, d_b, d_c)
338+
c = Array(d_c)
339+
```
340+
341+
In `Metal.jl` for Apple silicon
342+
```julia
343+
function vadd(a, b, c)
330344
i = thread_position_in_grid_1d()
331-
c[i] = a[i] + b[i]
345+
c[i] = a[i] + b[i]
332346

333-
return
347+
return
334348
end
335349

336350
len = 100
@@ -451,7 +465,8 @@ It's important to stress that we only schedule the kernel to run, however in ord
451465
- or a command to copy result to host (`Array(c)`), which always synchronizes kernels beforehand
452466

453467
!!! warning "Exercise"
454-
Fix the `vadd` kernel such that it can work with different launch configurations, such as
468+
Fix the `vadd` kernel such that it can work with different launch configurations, i.e. even if the launch configuration does not correspond to the length of arrays, it will not crash.
469+
455470
```julia
456471
@cuda threads=64 blocks=2 vadd(d_a, d_b, d_c)
457472
@cuda threads=32 blocks=4 vadd(d_a, d_b, d_c)
@@ -732,8 +747,8 @@ end
732747
using Metal
733748
a = rand(Float32, 1000, 1000)
734749
b = rand(Float32, 1000, 1000)
735-
ag = a |> MtlArray
736-
bg = b |> MtlArray
750+
ag = a |> CuArray
751+
bg = b |> CuArray
737752
c = similar(ag)
738753
matmul!(ag,bg,c)
739754

0 commit comments

Comments
 (0)