-
Notifications
You must be signed in to change notification settings - Fork 21
Execution time speed compared to ArrayFire C++ #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think any of us have benchmarked the performance of the .NET wrapper compared to C++. But I would certainly believe the C++ Wrapper would be faster. One of the main reasons for this is that the .NET wrapper is using the unified backend to load DLLs, where as the native libraries can be called directly and do not spend time in loading DLLs (not sure if you are counting this towards your time). Ideally, to benchmark, you should run the main part of your code in a loop and then take the average time. Once the wrapper is feature complete, we can look into optimizing it further. |
@altaybrusan can you show us the output of af::info() from C++ and the equivalent from .net ? |
First, the project is winform and I call the filter function AFTER I call setBackend. The out put of Wrapper is: same as the out put of the C++. Do you need any further details or something else? |
Indeed the problem that I have is "How to send a bitmap object to arrayfire" There is a loadMem function but I coud not find a way to send images into. |
I also monitor the NVidiea control panel>Manage GPU Utilization> GPU Utilization Graph. |
@altaybrusan is the image loading process part of the loop? What code is inside the timed loop and what code is outside? I just want to make sure we're all on the same page. |
@pavanky
Here is the code *_## inside *_the filter function
Here is the code *_## Outside *_the filter function
|
@pavanky As you see the process of sending an image to arrayfire from c# is really cumbersome! it would be good to short this whole process |
@pavanky
|
@altaybrusan I am not really familiar with windows or .Net. I was just trying to make sure the versions being picked are consistent with each other. I'll let @royalstream (who is doing a great job developing this project) and @shehzan10 (who's helping him out) take care of this. |
@altaybrusan is there any reason why you are going from C++ -> .NET? Most people would start with the wrapper (because they are most comfortable with a wrapper language) and then move to C++ to get to higher performance. |
@royalstream @shehzan10 The main body of the project is in C# (major packages, components). Indeed, this project is going to be a real application of ArrayFire in medical image processing (Fluoroscopy). I think instead of returning back to C++, If I find a way to process the video/image stream using the wrapper would be a good evidence that the ArrayFire in general and its .NET wrapper specifically is applicable in other image/video processing. |
I noticed the code inside the loop has many operations, some of them involving slicing, some of them involving convolutions, etc. In theory all the .Net to C++ marshalling shouldn't be heavy because all we're passing around are pointers to af_array objects and small objects like af_seq. |
ArrayFire performs copies on assignment if the LHS array of the assignment operation has more than one reference. Because of RAII in C++ this happens fairly rarely and copies are performed only occasionally. For garbage collected languages such as C#, any temporary references are not cleared until the garbage collector is called. As far as arrayfire knows, those references are still in play. So it performs copies every time you do slice + assignment operations. One hack around this would be to call the garbage collector in the wrapper right before any time |
@unbornchikken brought up this issue a few days ago when talking about the |
@pavanky could we accomplish the same with a call to af_release_array? If that's the case another option would be calling .Dispose() on every ArrayFire.Array object created inside the loop (which will in turn call af_release_array) or with using blocks (C#'s syntactic sugar to call .Dispose() automatically). |
@royalstream yes, calling On a related note I am assuming the |
@royalstream btw, calling For example when you do
The output of This will not slow down assigns, but if enough of these arrays exist, it increases the number of buffers in arrayfire's memory manager and can eventually slow down array creation. This may even result in out of memory errors (at which point you could call the garbage collector), but I recommend calling the garbage collector from time to time based on some criteria. For arrayfire-r, I just keep track of the amount of memory and buffers allocated and call GC whenever 1GB of memory is allocated or when the number of buffers is arrayfires memory manager is greater than 50. These numbers are arbitrary and can be changed to the user needs. |
@pavanky yes, that's exactly what the .net wrapper is doing, I took the R wrapper as a reference. |
@royalstream
I receive this error: |
@royalstream @altaybrusan Should we close this issue and create a new issue for this ? @altaybrusan BTW is there any reason you are using doubles for convolution ? The performance is going to be fairly bad for double precision on GPUs. |
But on the C++ I have excellent performance event with double data type! |
@altaybrusan I mean single precision performance on GPUs is usually 10x better on newer GPUs. But on older GPUs such as yours it is close to 2.5x better. |
Ill check it. thanks |
@altaybrusan I think pavansky is right on the money with his comment regarding the memory allocation/deallocation affecting performance. |
@royalstream
and
|
I just implement the Convolve method as you had done for the others |
@royalstream I think the problem with the speed is due to heavy operation in loading images then converting it into float[,]. Do you have any opinion on how can I make an array out of bitmap? |
@altaybrusan if you are doing the same thing in C++ it shouldn't matter. The problem is very likely with GC not killing off temporary references. |
@altaybrusan I agree with @pavanky and @unbornchikken |
Also consider that P/Invoke is very slow approach to call native code from .NET. You gotta use C++/CLI binding to expect the same performance, see: https://msdn.microsoft.com/en-us/library/ky8kkddw.aspx But unfortunately C++/CLI is a barely supported mess that every sane minded developer avoids at all cost, so I can stand by your decision to go with P/Invoke though. Just don't expect too much from it. |
@unbornchikken from personal experience I've never obtained any performance gains that would justify using C++/CLI. I've actually dropped entire implementations done in C++/CLI because it's barely supported and it just wasn't worth it. Data still has to be marshaled from the managed domain into the native heap and if all the parameters are simple, blittable types (like |
Hi,
I have developed a Arrayfire image processing algorithm in C++. The execution time was around 78-80 ms. Then, I re-implemented the algorithm via the .NET wrapper. Now the execution time is around 120 ms. Is it normal?
PS: I also noticed the .net wrapper leverage 50 of the GPU however the original one used 100 percents
One more thing, the .net wrapped version has no "warm up" time however the original C++ version needs about 1 min to execute for the first time. I am not sure yet if its normal or not!
The text was updated successfully, but these errors were encountered: