Skip to content

Releases: macrocosm-os/pretraining

Release v6.0.3

25 Mar 15:24
1e57d0a
Compare
Choose a tag to compare

Announcing Release 6.0.3

Changes

  • bumped taoverse dependency to 2.0.2
  • Syncing noise generation in the TTS denoiser using the same seed. This fixes an issues where the same might give different performance in difference runs.

Notes to validators

Since the taoverse dependency has been bumped, please reinstall dependencies:

$ python -m pip install -e .

Release v6.0.2

22 Mar 16:49
992ee4d
Compare
Choose a tag to compare

Announcing release v6.0.2

This is a fix to the introduce the following changes:

Changes

  • Skip eval tasks upon failed data loading.
  • Renormalize eval task weights in case of failure
  • Reduce number of attempts upon failed page loading in dataset loader. Sleep between attemps.
  • Increase epsilon decay to 10 days for the TTS competition.
  • Loader for FineWeb-Edu2 is back in both 3B and 14B competitions.
  • Reduce min vali stake from 200K to 50K for top miners selection
  • Bump taoverse dependency to 2.0.1.

Notes to validators

Some dependencies have changes. Please reinstall:

python -m pip install -e .

Release v6.0.1

21 Mar 14:49
43e2070
Compare
Choose a tag to compare

Announcing release v6.0.1

Changes

  • We have disabled evaluation on FineWeb-Edu2 since it seems to be broken for some days now. This seems to block evaluation for other competition. It will be deactivated in a future release once the issue is sorted.

NOTES TO VALIDATORS

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v6.0.0

17 Mar 20:00
8f1aff2
Compare
Choose a tag to compare

Announcing release v6.0.0

This release supports SN9's first multimodal competition.

Changes

NOTES TO VALIDATORS

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v5.1.1

17 Feb 18:19
707aeb0
Compare
Choose a tag to compare

Announcing release v5.1.1

Here are the main changes:

Changes

  • A small fix to model upload script and related mining utility functions
  • Bumped taoverse dependency to version 1.4.1

NOTES TO VALIDATORS

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v5.1.0

14 Feb 14:11
b163027
Compare
Choose a tag to compare

Announcing release v5.1.0

Here are the main changes:

Changes

  • Updated bittensor dependency to 9.0.0 to support the dTao release.
  • Cleaned out the 14B* (datamix experiment) code.
  • reduced subset loaders request ttl to 15seconds.

NOTES TO VALIDATORS

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v5.0.0

16 Jan 19:11
346e17f
Compare
Choose a tag to compare

Announcing release v5.0.0

Here are the main changes:

Changes

1. Native support for multi-dataset evaluation
Immediate effect

Implemented support for multi-dataset evaluation for all competitions using EvalTasks from taoverse 1.3.7

2. Updating the data mix
Activation block: 4_732_978

Both 3B and 14B will now be evaluated on the following mix:

  • HuggingFaceFW/fineweb (30%)
  • HuggingFaceFW/fineweb-edu-score-2 (25%)
  • bigcode/the-stack-v2-dedup (35%)
  • laion/pes2ox-fulltext (5%)
  • HuggingFaceTB/FineMath:finemath-3plus (3%)
  • HuggingFaceTB/FineMath:infiwebmath-3plus (2%)

Notice that the 14B* will be retired and replaced by the multi-dataset 14B with the above mix.

3. New epsilon lower bounds and decay intervals
Activation block 4_732_978

The epsilon decay interval and bounds will be updated for all competitions as follows:

  • 3B competition:
    Updated: decays from 0.005 to 0.0002 over 4 days
    Updated: decays from 0.005 to 0.0005 over 7 days

  • 14B competitions:
    Updated: decays from 0.005 to 0.0002 over 5 days
    Updated: decays from 0.005 to 0.0005 over 10 days

4. Updated emission distribution for competitions
Activation block 4_732_978

  • 3B → 30%
  • 14B → 70%

Other updates

  • Improved weight setting by spinning out separate threads that use different subtensors.
  • Switched from bt.logging to taoverse.logging.

NOTES TO VALIDATORS

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v4.6.4

19 Dec 15:29
28b37f6
Compare
Choose a tag to compare

Announcing release v4.6.4

Here are the main changes:

Changes

  • Bittensor dependency version was bumped to 8.5.1.

The only action needed for validators is to pull this new release and reinstall dependencies.

NOTES TO VALIDATORS

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v4.6.3

12 Dec 02:27
75356e4
Compare
Choose a tag to compare

Announcing release v4.6.3

Here are the main changes:

Changes

1. Bittensor version bumped to 8.4.3

We have noticed that we have a higher success rate for weight setting with this version than 6.9.4.

2. Changed S3 bucket URL for the stack v2-dedup dataset

The official doc provides two URLs to access the softwareheritage S3 bucket.
https://docs.softwareheritage.org/user/using_data/index.html#contents-on-s3

The URL we used before seem to fail in some region. The other URL https://softwareheritage.s3.amazonaws.com/content/<sha1> seem to be more robust, so we updated the dataloader to use it.

NOTES TO VALIDATORS

-IMPORTANT: The newly added dataset for code the-stack-v2-dedup requires a Hugging Face access token and S3 secret and access keys. You can learn how to obtain and configure those tokens in our validator documentation here.

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e

Release v4.6.2

03 Dec 17:03
bc90394
Compare
Choose a tag to compare

Announcing release v4.6.2

Here are the main changes:

Changes

1. Increasing code proportion in the data mix
Activation block: 4_453_709

We are replacing the-stack-dedup with the-stack-v2-dedup in the 14B-star competition and increasing the code proportion in the validation dataset from ~5% to ~15%.

Datasets used during evaluation for the 14B-star competition is as follows:

  • HuggingFaceFW/fineweb-edu-score-2 (85%)
  • bigcode/the-stack-v2-dedup (15%)

Datasets for the 3B and 14B competitions are left unchanged.

2. New epsilon lower bounds and decay intervals
Activation block 4_453_709

The epsilon decay interval and bounds will be updated for all competitions as follows:

  • 3B competition:
    Current: decays from 0.005 to 0.0005 over 7 days
    Updated: decays from 0.005 to 0.0002 over 4 days

  • 14B and 14Bstar competitions:
    Current: decays from 0.005 to 0.0005 over 7 days
    Updated: decays from 0.005 to 0.0002 over 5 days

3. Updated emission distribution for competitions
Activation block 4_453_709

  • 3B → 20%
  • 14B → 40%
  • 14B-star → 40%

4. Fixed all package version in requirements.txt
To avoid any installation issues and package compatibilities, we have fixed all dependency package versiosn in the requirements.txt file. The installation experience should be smoother now.

NOTES TO VALIDATORS

-IMPORTANT: The newly added dataset for code the-stack-v2-dedup requires a Hugging Face access token and S3 secret and access keys. You can learn how to obtain and configure those tokens in our validator documentation here.

-Please also make sure to rerun pip install to ensure updated dependencies.
python -m pip install -e