Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash_structure does not produce the same before & after ASE's write & read to XYZ #118

Open
stenczelt opened this issue Dec 8, 2024 · 2 comments

Comments

@stenczelt
Copy link
Member

One of the derived parameters is hash_structure, which collects structural information and hashes it for matching of structures. I've noticed that this does not product the same for a structure before and after passing through ASE's extxyz write & read.

This seems to make this hash function somewhat useless, since we can reasonably expect use cases where structures are being read/written/calculated a lot of times.

As demonstration, see a failing test:

import io

import ase
import ase.io
import numpy as np
import pytest
from ase.calculators.lj import LennardJones

from abcd.model import AbstractModel


@pytest.fixture
def rng():
    return np.random.default_rng(seed=42)


def test_hash_structure(rng):
    # create atoms & add a calculator
    atoms = ase.Atoms(
        "H3",
        positions=rng.random(size=(3, 3)),
        pbc=True,
        cell=[2, 2, 2],
    )
    atoms.calc = LennardJones()
    atoms.calc.calculate(atoms)

    # dump to XYZ
    buffer = io.StringIO()
    ase.io.write(buffer, atoms, format="extxyz")

    # read back
    buffer.seek(0)
    atoms_read = ase.io.read(buffer, format="extxyz")

    # read in both of them
    abcd_data = AbstractModel.from_atoms(atoms)
    abcd_data_after_read = AbstractModel.from_atoms(atoms_read)

    assert abcd_data["hash_structure"] == abcd_data_after_read["hash_structure"]

which fails with

>       assert abcd_data["hash_structure"] == abcd_data_after_read["hash_structure"]
E       AssertionError: assert '8bb52aa8ab4be61a8ad0f76b32a0c810' == '4cf0d1cbde645832830e6f6d60f86c9a'
E         
E         - 4cf0d1cbde645832830e6f6d60f86c9a
E         + 8bb52aa8ab4be61a8ad0f76b32a0c810
@ElliottKasoar
Copy link
Contributor

I think this is related to the error that appears in #120, which has changed the behaviour of Hasher slightly.

The change I made for ruff was related to https://docs.astral.sh/ruff/rules/function-call-in-default-argument/, so currently we have the issue that something along the lines of

from abcd.model import Hasher

hasher_1 = Hasher()

init_hash = hasher_1()
hasher_1.update(b"Test value")
assert hasher_1() != init_hash

hasher_2 = Hasher()
assert hasher_2() == init_hash

fails, as hasher_2() is instead equal to hasher_1(), since the function isn't reinitialised.

This is fixed by the changes in #120, due to the change

-class Hasher(object):
-    def __init__(self, method=md5()):
-        self.method = method
+class Hasher:
+    def __init__(self, method=md5):
+        self.method = method()

The main other question is whether we actually want to separate hashers when we're saving data, or whether actually the current behaviour (the structure hash is updated with additional keys) is preferable, since otherwise the hash could be the same for two different structures.

@ElliottKasoar
Copy link
Contributor

A few points to note, from talking to @alinelena, @oerc0122 and others:

  • Hashing is never going to be a perfect solution for comparing arrays of floats, similar to why we don't usually compare equality, so perhaps we can come up with better alternatives
  • This issue only arises if the position floats are higher precision than %16.8f, from the format map used when writing out
    • If the extxyz file was written out by ASE, I think this should be quite stable. I think this errors only occur when the file starts with a higher precision than ASE would write
  • It's also unclear more generally how in future ase.io.write and extxyz.write may interact, which could change things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants