Skip to content

Workflow update - PART 1 #1416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 87 commits into from
Jun 13, 2024
Merged

Workflow update - PART 1 #1416

merged 87 commits into from
Jun 13, 2024

Conversation

Smartappli
Copy link
Contributor

@Smartappli Smartappli commented Apr 30, 2024

CUDA compiled with AVX
Remove Python 3.8
Remove macos-11 deprecated
Add python 3.9 when missing
Upgrade macos-13 to macos-latest in tests
Upgrade ubuntu-20.04 to ubuntu-latest
Upgrade windows-2019 to windows-latest
refactoring of metal building

@Smartappli Smartappli changed the title [WIP] Cuda with AVX Cuda with AVX Apr 30, 2024
@abetlen
Copy link
Owner

abetlen commented Apr 30, 2024

#1342 (comment)

I'll paste my comment here, and maybe we can open a new discussion, basically I'm concerned about the size of releases ballooning with the number of prebuilt wheel variants. I had some suggestions for long term solutions there but not sure what the right approach is.

Anecdotally @oobabooga claims to have run into issues with GitHub throttling his prebuilt wheel repo because of this.

@oobabooga
Copy link
Contributor

If you generate too many wheels, there is a 100% chance you will reach a storage quota and GitHub will ask you to start paying for storage else your wheels will fail to upload. It's not too expensive (a few $ a month at most), but it's worth keeping in mind.

@Smartappli
Copy link
Contributor Author

I avoided the API quota limit problems by adding a timer in my yaml

- name: ⌛ rate 1
    shell: pwsh
    run: |
      # add random sleep since we run on fixed schedule
      sleep (get-random -max 1200)
      
      # get currently authenticated user rate limit info
      $rate = gh api rate_limit | convertfrom-json | select -expandproperty rate
  
      # if we don't have at least 100 requests left, wait until reset
      if ($rate.remaining -lt 400) {
          $wait = ($rate.reset - (Get-Date (Get-Date).ToUniversalTime() -UFormat %s))
          echo "Rate limit remaining is $($rate.remaining), waiting for $($wait) seconds to reset"
          sleep $wait
          $rate = gh api rate_limit | convertfrom-json | select -expandproperty rate
          echo "Rate limit has reset to $($rate.remaining) requests"
      }

@Smartappli
Copy link
Contributor Author

#1342 (comment)

I'll paste my comment here, and maybe we can open a new discussion, basically I'm concerned about the size of releases ballooning with the number of prebuilt wheel variants. I had some suggestions for long term solutions there but not sure what the right approach is.

Anecdotally @oobabooga claims to have run into issues with GitHub throttling his prebuilt wheel repo because of this.

https://github.com/Smartappli/serge-wheels/actions

@Smartappli
Copy link
Contributor Author

Not enabling AVX penalizes LLaMa cpp python performance in both cpu and cuda.

@gaby
Copy link

gaby commented May 1, 2024

Maybe the list can be shrink down a bit. For example:

  • Not many people have AVX512, remove until there's enough demand.
  • Making AVX support a minimum?
  • Remove python3.8, it's EOL in a few months.

@gaby
Copy link

gaby commented May 1, 2024

@Smartappli Your hanges are adding AVX for CUDA wheels, is that needed? At that point the user is using the GPU.

It makes sense for the basic wheels to have AVX, and AVX2 wheels, not so much for the CUDA ones.

@Smartappli
Copy link
Contributor Author

Smartappli commented May 1, 2024

I copy that thx @gaby

in summary: AVX and AVX2 on CPU is
enough

@Smartappli Smartappli changed the title Cuda with AVX CPU with AVX and AVX2 May 1, 2024
@Smartappli Smartappli changed the title CPU with AVX and AVX2 [WIP] CPU with AVX and AVX2 May 1, 2024
@Smartappli
Copy link
Contributor Author

@Smartappli
Copy link
Contributor Author

ping @gaby

@Smartappli
Copy link
Contributor Author

@abetlen can you review plz?

@abetlen
Copy link
Owner

abetlen commented Jun 4, 2024

Hey @Smartappli thanks for your patience and the PR, busy month so just catching up on open PRs right now, do you mind splitting this one up into 2 with one that includes the following

CUDA compiled with AVX
Remove Python 3.8
Remove macos-11 deprecated
Add python 3.9 when missing
Upgrade macos-13 to macos-latest in tests
Upgrade ubuntu-20.04 to ubuntu-latest
Upgrade windows-2019 to windows-latest
refactoring of metal building

and another just for the cpu wheels changes?

@Smartappli Smartappli changed the title Workflow update Workflow update - PART 1 Jun 6, 2024
@Smartappli Smartappli changed the title Workflow update - PART 1 [WIP] Workflow update - PART 1 Jun 6, 2024
@Smartappli Smartappli changed the title [WIP] Workflow update - PART 1 Workflow update - PART 1 Jun 6, 2024
@Smartappli
Copy link
Contributor Author

Smartappli commented Jun 6, 2024

Hey @Smartappli thanks for your patience and the PR, busy month so just catching up on open PRs right now, do you mind splitting this one up into 2 with one that includes the following

CUDA compiled with AVX
Remove Python 3.8
Remove macos-11 deprecated
Add python 3.9 when missing
Upgrade macos-13 to macos-latest in tests
Upgrade ubuntu-20.04 to ubuntu-latest
Upgrade windows-2019 to windows-latest
refactoring of metal building

and another just for the cpu wheels changes?

@abetlen Done: #1515

@abetlen abetlen merged commit 9e396b3 into abetlen:main Jun 13, 2024
13 checks passed
@oobabooga
Copy link
Contributor

Has anyone managed to fix the CUDA workflows? Mine keep failing with error

C:\Miniconda3\envs\build\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [C:\Users\runneradmin\AppData\Local\Temp\tmpwbsbwtdg\build\CMakeFiles\CMakeScratch\TryCompile-uh6ciq\cmTC_cbbed.vcxproj]

See: https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/actions/runs/9457447475/job/26051277254.

I see that @abetlen's workflow also fail with the same error: https://github.com/abetlen/llama-cpp-python/actions/runs/9457182450/job/26051175939

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants