pytorch-labs / attention-gym Public

Notifications You must be signed in to change notification settings
Fork 44
Star 747

Code
Issues 60
Pull requests 3
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/attention-gym

Beta

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

60 Open 42 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

document masking with long sequences (8192) results in TypeError: cannot determine truth value of Relational

#133 opened Apr 7, 2025 by Lunarmony

[Question] Precision of bfloat16 computations

#131 opened Apr 4, 2025 by lukasHoel

RuntimeError in flex_attention: CUDA Device-Side Assertion Failure

#130 opened Mar 21, 2025 by NuanBaobao

Would FlexAttention be useful over SDPA for masked language modeling?

#129 opened Mar 20, 2025 by abdulfatir

Add Block Diffusion Attention Mask to attn_gym/masks

#126 opened Mar 13, 2025 by Skylion007

Hard-coded max_seq_len in document example

#124 opened Mar 1, 2025 by adamoyoung

BlockMask Value Error

#122 opened Feb 17, 2025 by Leo-T-Zang

Use case against efficient SDPA backend

#121 opened Feb 13, 2025 by TParcollet

FlexAttention customizability for softmax

#117 opened Feb 9, 2025 by veritas9872

Sigmoid attention?

#116 opened Feb 9, 2025 by lhallee

Error when using flex attention and F.sdpa together.

#115 opened Feb 8, 2025 by wuyushuwys

Does Flex attention API accepts a customized attention mask?

#112 opened Feb 6, 2025 by jiagaoxiang

Non-elementwise score_mod

#111 opened Jan 30, 2025 by AmoghDabholkar

Dynamic mask block sizes during inference

#109 opened Jan 30, 2025 by windsornguyen

Can FlexAttention Optimize Masks for Large Table Constraints?

#106 opened Jan 15, 2025 by RaphaelMouravieff

FlexAttention uses much more GPU memory than FlashAttention-2

#101 opened Jan 9, 2025 by ChenlongDeng

Building a composite mask with attention_mask from tokenizers

#98 opened Jan 2, 2025 by lhallee

Illegal memory access on backward when there are unused block masks (nightly build)

#96 opened Dec 28, 2024 by timt51

FlexAttention slower than eager in HF transformers

#95 opened Dec 27, 2024 by staghado

Doc mask returns negative sparsity

#93 opened Dec 21, 2024 by staghado

question about masking

#92 opened Dec 18, 2024 by esason

Short vs long sequences performance question

Further information is requested

#89 opened Dec 12, 2024 by francoishernandez

[Inquiry] Document Masking and Assigning Different Weights

#88 opened Dec 12, 2024 by yeahjack

flexattn with qwen2

#81 opened Nov 18, 2024 by NonvolatileMemory

Flex attention with dropout

#77 opened Nov 13, 2024 by zbh2047

Previous 1 2 3 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly