Add gaussian/fchk format support#896
Conversation
📝 WalkthroughWalkthroughAdds a Gaussian FCHK parser that reads coordinates, energy, gradients (as forces), and lower‑triangular Hessian blocks, reconstructs and converts full symmetric Hessians, adds a Hessian unit conversion, registers a gaussian/fchk format and Hessian data type, and adds FCHK/log fixtures and tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as Caller
participant F as GaussianFChkFormat
participant P as dpdata.gaussian.fchk
participant R as register_hessian_data
participant S as LabeledSystem
U->>F: from_labeled_system(file, has_forces, has_hessian)
F->>P: to_system_data(file, has_forces, has_hessian)
P-->>F: data {atoms, coords, energies?, forces?, hessian?}
alt data includes hessian
F->>R: register_hessian_data(data)
R->>S: register DataType(name="hessian", shape=(NFRAMES,NATOMS,3,NATOMS,3))
end
F-->>U: data
sequenceDiagram
autonumber
participant P as fchk.to_system_data
participant Uc as UnitConversion
participant H as create_full_hessian
P->>Uc: convert coords (bohr → Å), energy (hartree → eV), gradients→forces (→ eV/Å)
opt Hessian present
P->>H: create_full_hessian(lower-triangular, natoms)
H-->>P: symmetric full Hessian (3N×3N)
P->>Uc: convert Hessian (hartree/bohr^2 → eV/Å^2)
end
P-->>P: assemble and return system data dict
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #896 +/- ##
==========================================
+ Coverage 85.94% 86.06% +0.11%
==========================================
Files 82 83 +1
Lines 7758 7886 +128
==========================================
+ Hits 6668 6787 +119
- Misses 1090 1099 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (13)
tests/gaussian/waterfreq.gaussianfchk (1)
1-1293: Consider storing large fixture via Git LFS (or compressing) to keep repo lean.1.3K-line numeric fixture can bloat the repo history. Using Git LFS (or gzipping and auto-unzipping in tests) will reduce clone size.
dpdata/unit.py (1)
155-176: Validate “^2” suffix and fail fast on malformed Hessian units.Currently we assume unit strings end with “^2” (e.g., bohr^2). Add a guard to avoid silent misparsing like “bohr2”.
Apply this diff:
class HessianConversion(Conversion): def __init__(self, unitA, unitB): @@ - eunitA, lunitA = self._split_unit(unitA) - eunitB, lunitB = self._split_unit(unitB) + eunitA, lunitA = self._split_unit(unitA) + eunitB, lunitB = self._split_unit(unitB) econv = EnergyConversion(eunitA, eunitB).value() lconv = LengthConversion(lunitA, lunitB).value() self.set_value(econv / lconv**2) @@ - def _split_unit(self, unit): - eunit = unit.split("/")[0] - lunit = unit.split("/")[1][:-2] + def _split_unit(self, unit): + eunit, lpart = unit.split("/") + if not lpart.endswith("^2"): + raise RuntimeError(f"Invalid Hessian unit (missing '^2'): {unit}") + lunit = lpart[:-2] return eunit, lunitdpdata/plugins/gaussian.py (1)
44-57: Tidy control flow and silence ruff TRY300.Use
else:aftertry/exceptto keep side-effects away from the exception path.Apply this diff:
@Format.register("gaussian/fchk") class GaussianFChkFormat(Format): def from_labeled_system( self, file_name: FileType, md=False, has_forces=True, has_hessian=True, **kwargs ): - try: - data = dpdata.gaussian.fchk.to_system_data( - file_name, md=md, has_forces=has_forces, has_hessian=has_hessian - ) - register_hessian_data(data) - return data - except AssertionError: - return {"energies": [], "forces": [], "hessian": [], "nopbc": True} + try: + data = dpdata.gaussian.fchk.to_system_data( + file_name, md=md, has_forces=has_forces, has_hessian=has_hessian + ) + except AssertionError: + return {"energies": [], "forces": [], "hessian": [], "nopbc": True} + else: + register_hessian_data(data) + return datadpdata/gaussian/fchk.py (3)
74-93:mdis unused; silence ruff or document.If kept for API parity, underscore it locally to satisfy linters.
Apply this diff:
-def to_system_data(file_name: FileType, md=False, has_forces=True, has_hessian=True): +def to_system_data(file_name: FileType, md=False, has_forces=True, has_hessian=True): @@ - data = {} + _ = md # API parity; not used for fchk + data = {}
102-155: Rename throwaway loop variable to avoid E741 and improve clarity.Replace
lwithline2(or similar) in the chunk-reading loops.Apply this diff (pattern repeated in all chunks):
- while len(atom_numbers) < n: - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - atom_numbers += [int(x) for x in l.split()] + while len(atom_numbers) < n: + line2 = next(fp) + if isinstance(line2, bytes): + line2 = line2.decode(errors="ignore") + atom_numbers += [int(x) for x in line2.split()]Repeat the same replacement for coordinate, gradient, and Hessian reads.
171-173: Cells length should match nframes.
cellsis currently shape (1, 3, 3). Prefer per-frame shape to satisfy data-type checks consistently.Apply this diff:
- data["cells"] = np.array([np.eye(3) * 100]) + nframes = len(coords_t) + data["cells"] = np.repeat(np.eye(3)[None, ...] * 100.0, nframes, axis=0)tests/test_gaussian_fchk.py (7)
5-9: Add Path import and TEST_DIR helper for fixture paths.-import numpy as np -from context import dpdata +import numpy as np +from pathlib import Path +try: + from context import dpdata # tests/context.py sets sys.path for local import +except ImportError: + import dpdata # fallback to installed dpdata + +TEST_DIR = Path(__file__).resolve().parent
16-18: Generalize hard-coded shapes to NATOMS/NFRAMES.Avoid baking 3 atoms into shapes; derive from metadata for fixture reusability.
self.nframes = 1 + self.natoms = sum(self.atom_numbs) self.atom_types = [1, 0, 0] @@ - self.assertEqual(self.system.data["forces"].shape, (1, 3, 3)) + self.assertEqual(self.system.data["forces"].shape, (self.nframes, self.natoms, 3)) @@ - self.assertEqual(self.system.data["hessian"].shape, (1, 9, 9)) + self.assertEqual( + self.system.data["hessian"].shape, + (self.nframes, 3 * self.natoms, 3 * self.natoms), + ) @@ - self.assertEqual(self.system.data["coords"].shape, (1, 3, 3)) + self.assertEqual(self.system.data["coords"].shape, (self.nframes, self.natoms, 3)) @@ - self.assertEqual(self.system.data["cells"].shape, (1, 3, 3)) + self.assertEqual(self.system.data["cells"].shape, (self.nframes, 3, 3))Also applies to: 47-51, 53-56, 59-63, 65-68
31-35: Prefer vector assert for atom_types.Cleaner and shows full diff on failure.
- for ii in range(len(self.atom_types)): - self.assertEqual(self.system.data["atom_types"][ii], self.atom_types[ii]) + np.testing.assert_array_equal(self.system.data["atom_types"], np.array(self.atom_types))
90-95: Remove unused conversion imports/variables.They’re not used and may trigger linter warnings.
- from dpdata.unit import EnergyConversion, ForceConversion, LengthConversion - - self.energy_convert = EnergyConversion("hartree", "eV").value() - self.force_convert = ForceConversion("hartree/bohr", "eV/angstrom").value() - self.length_convert = LengthConversion("bohr", "angstrom").value()
170-177: Make key set comparison less brittle.Future plugins may add keys; ensure log keys ⊆ fchk keys and only allow extra “hessian” on fchk.
- # Check that both have the same data keys - fchk_keys = set(self.system_fchk.data.keys()) - log_keys = set(self.system_log.data.keys()) - - # fchk has hessian, log doesn't, so we exclude it from comparison - fchk_keys.discard("hessian") - - self.assertEqual(fchk_keys, log_keys) + # Check that all keys from log are present in fchk; allow extra 'hessian' in fchk + fchk_keys = set(self.system_fchk.data.keys()) + log_keys = set(self.system_log.data.keys()) + self.assertTrue(log_keys.issubset(fchk_keys)) + self.assertLessEqual(fchk_keys - log_keys, {"hessian"})
52-56: Optional: Assert Hessian symmetry.Adds a quick sanity check that reconstructed Hessian is symmetric.
def test_hessian_is_symmetric(self): H = self.system.data["hessian"][0] np.testing.assert_allclose(H, H.T, rtol=0.0, atol=1e-8)
52-56: Optional: Cover saving hessian.npy.PR mentions saving Hessian; add an I/O roundtrip check.
def test_saving_includes_hessian_npy(self): import os, tempfile with tempfile.TemporaryDirectory() as tmpdir: out = os.path.join(tmpdir, "set.000") # Ensure this uses the LabeledSystem that has hessian (from fchk) self.system.to("deepmd/npy", out) self.assertTrue(os.path.isfile(os.path.join(out, "data", "hessian.npy")))If the to() signature differs in this repo, I can adjust once you confirm the expected API.
Also applies to: 80-88
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
dpdata/gaussian/fchk.py(1 hunks)dpdata/plugins/gaussian.py(3 hunks)dpdata/unit.py(1 hunks)tests/gaussian/waterfreq.gaussianfchk(1 hunks)tests/gaussian/waterfreq.gaussianlog(1 hunks)tests/test_gaussian_fchk.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/gaussian/fchk.pydpdata/unit.pydpdata/plugins/gaussian.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/gaussian/fchk.pydpdata/plugins/gaussian.py
🧬 Code graph analysis (3)
tests/test_gaussian_fchk.py (2)
dpdata/system.py (1)
LabeledSystem(1159-1357)dpdata/unit.py (4)
EnergyConversion(106-117)ForceConversion(134-152)LengthConversion(120-131)value(93-94)
dpdata/gaussian/fchk.py (2)
dpdata/utils.py (1)
open_file(137-170)dpdata/unit.py (5)
EnergyConversion(106-117)ForceConversion(134-152)HessianConversion(155-180)LengthConversion(120-131)value(93-94)
dpdata/plugins/gaussian.py (3)
dpdata/data_type.py (3)
Axis(15-21)DataType(35-131)register_data_type(138-149)dpdata/system.py (4)
System(55-1111)register_data_type(1094-1111)LabeledSystem(1159-1357)has_forces(1269-1270)dpdata/gaussian/fchk.py (1)
to_system_data(74-180)
🪛 Ruff (0.12.2)
dpdata/gaussian/fchk.py
57-60: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Unused function argument: md
(ARG001)
112-112: Ambiguous variable name: l
(E741)
114-114: Ambiguous variable name: l
(E741)
120-120: Ambiguous variable name: l
(E741)
122-122: Ambiguous variable name: l
(E741)
133-133: Ambiguous variable name: l
(E741)
135-135: Ambiguous variable name: l
(E741)
144-144: Ambiguous variable name: l
(E741)
146-146: Ambiguous variable name: l
(E741)
dpdata/plugins/gaussian.py
47-47: Unused method argument: kwargs
(ARG002)
54-54: Consider moving this statement to an else block
(TRY300)
🔇 Additional comments (6)
tests/gaussian/waterfreq.gaussianfchk (1)
376-386: Hessian block size matches 3N(3N+1)/2 expectation (N=3 ⇒ 45). LGTM.tests/gaussian/waterfreq.gaussianlog (1)
1-739: Fixture looks consistent with fchk; includes gradients and force constants. LGTM.dpdata/gaussian/fchk.py (3)
20-23: Unit factors set correctly for bohr/angstrom, hartree/eV, forces, and Hessians. LGTM.
26-71: Hessian reconstruction logic is sound and validates size. LGTM.
174-180: Conditional packing of energies/forces/Hessian is correct. LGTM.tests/test_gaussian_fchk.py (1)
6-6: Context import OK — tests/context.py present and adjusts sys.pathtests/context.py inserts the repo root into sys.path (line 6) and imports dpdata (line 7), so
from context import dpdatashould work in CI — no change required.
| def register_hessian_data(data): | ||
| if "hessian" in data: | ||
| dt = DataType( | ||
| "hessian", | ||
| np.ndarray, | ||
| (Axis.NFRAMES, -1, -1), | ||
| required=False, | ||
| deepmd_name="hessian", | ||
| ) | ||
| dpdata.System.register_data_type(dt) | ||
| dpdata.LabeledSystem.register_data_type(dt) | ||
|
|
There was a problem hiding this comment.
NameError: missing dpdata import; registration will crash.
register_hessian_data calls dpdata.System.register_data_type but dpdata isn’t imported here.
Apply this diff:
@@
-from typing import TYPE_CHECKING
+from typing import TYPE_CHECKING
@@
+import dpdata # required for System/LabeledSystem registration
@@
def register_hessian_data(data):
if "hessian" in data:
- dt = DataType(
+ dt = DataType(
"hessian",
np.ndarray,
(Axis.NFRAMES, -1, -1),
required=False,
deepmd_name="hessian",
)
- dpdata.System.register_data_type(dt)
- dpdata.LabeledSystem.register_data_type(dt)
+ # Avoid duplicate registrations if called multiple times
+ if not any(x.name == "hessian" for x in dpdata.System.DTYPES):
+ dpdata.System.register_data_type(dt)
+ if not any(x.name == "hessian" for x in dpdata.LabeledSystem.DTYPES):
+ dpdata.LabeledSystem.register_data_type(dt)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def register_hessian_data(data): | |
| if "hessian" in data: | |
| dt = DataType( | |
| "hessian", | |
| np.ndarray, | |
| (Axis.NFRAMES, -1, -1), | |
| required=False, | |
| deepmd_name="hessian", | |
| ) | |
| dpdata.System.register_data_type(dt) | |
| dpdata.LabeledSystem.register_data_type(dt) | |
| from typing import TYPE_CHECKING | |
| import dpdata # required for System/LabeledSystem registration | |
| def register_hessian_data(data): | |
| if "hessian" in data: | |
| dt = DataType( | |
| "hessian", | |
| np.ndarray, | |
| (Axis.NFRAMES, -1, -1), | |
| required=False, | |
| deepmd_name="hessian", | |
| ) | |
| # Avoid duplicate registrations if called multiple times | |
| if not any(x.name == "hessian" for x in dpdata.System.DTYPES): | |
| dpdata.System.register_data_type(dt) | |
| if not any(x.name == "hessian" for x in dpdata.LabeledSystem.DTYPES): | |
| dpdata.LabeledSystem.register_data_type(dt) |
🤖 Prompt for AI Agents
In dpdata/plugins/gaussian.py around lines 22 to 33, the function
register_hessian_data calls dpdata.System.register_data_type but dpdata is not
imported causing a NameError; add an import for dpdata at the top of the file
(e.g., add "import dpdata" alongside the other imports) so dpdata.System and
dpdata.LabeledSystem resolve correctly when register_hessian_data runs.
| self.system = dpdata.LabeledSystem( | ||
| "gaussian/waterfreq.gaussianfchk", fmt="gaussian/fchk" | ||
| ) |
There was a problem hiding this comment.
Make fixture paths robust to the working directory.
Use paths relative to this test file to avoid CI failures when pytest runs from repo root.
- self.system = dpdata.LabeledSystem(
- "gaussian/waterfreq.gaussianfchk", fmt="gaussian/fchk"
- )
+ self.system = dpdata.LabeledSystem(
+ str(TEST_DIR / "gaussian" / "waterfreq.gaussianfchk"), fmt="gaussian/fchk"
+ )
@@
- self.system_fchk = dpdata.LabeledSystem(
- "gaussian/waterfreq.gaussianfchk", fmt="gaussian/fchk"
- )
- self.system_log = dpdata.LabeledSystem(
- "gaussian/waterfreq.gaussianlog", fmt="gaussian/log"
- )
+ self.system_fchk = dpdata.LabeledSystem(
+ str(TEST_DIR / "gaussian" / "waterfreq.gaussianfchk"), fmt="gaussian/fchk"
+ )
+ self.system_log = dpdata.LabeledSystem(
+ str(TEST_DIR / "gaussian" / "waterfreq.gaussianlog"), fmt="gaussian/log"
+ )Also applies to: 82-87
🤖 Prompt for AI Agents
In tests/test_gaussian_fchk.py around lines 11-13 (and similarly at lines
82-87), the fixture file paths are hardcoded and can break when pytest is
invoked from the repository root; change them to build paths relative to this
test file using the test file's directory (e.g. use
Path(__file__).parent.joinpath("gaussian","waterfreq.gaussianfchk") and convert
to string where needed), add the necessary pathlib import at the top if missing,
and replace the other occurrences at 82-87 the same way so all fixture
references use a path computed from __file__.
CodSpeed WallTime Performance ReportMerging #896 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
dpdata/plugins/gaussian.py (1)
10-13: Explicitly import dpdata to make registration unambiguous.Relying on
import dpdata.gaussian.fchkto binddpdatais indirect. Add a top‑levelimport dpdatato avoid surprises and to satisfy tools.import numpy as np +import dpdata import dpdata.gaussian.fchk import dpdata.gaussian.gjf import dpdata.gaussian.log from dpdata.data_type import Axis, DataType
🧹 Nitpick comments (1)
dpdata/plugins/gaussian.py (1)
22-32: Guard against duplicate dtype registration (avoid warnings and churn).
register_hessian_datamay be called multiple times; current code will trigger duplicate dtype warnings each call.def register_hessian_data(data): if "hessian" in data: dt = DataType( "hessian", np.ndarray, (Axis.NFRAMES, Axis.NATOMS, 3, Axis.NATOMS, 3), required=False, deepmd_name="hessian", ) - dpdata.LabeledSystem.register_data_type(dt) + # Register once + if not any(x.name == "hessian" for x in dpdata.LabeledSystem.DTYPES): + dpdata.LabeledSystem.register_data_type(dt)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/plugins/gaussian.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/plugins/gaussian.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/plugins/gaussian.py
🧬 Code graph analysis (1)
dpdata/plugins/gaussian.py (3)
dpdata/data_type.py (3)
Axis(15-21)DataType(35-131)register_data_type(138-149)dpdata/system.py (3)
LabeledSystem(1159-1357)register_data_type(1094-1111)has_forces(1269-1270)dpdata/gaussian/fchk.py (1)
to_system_data(74-180)
🪛 Ruff (0.12.2)
dpdata/plugins/gaussian.py
46-46: Unused method argument: kwargs
(ARG002)
53-53: Consider moving this statement to an else block
(TRY300)
🔇 Additional comments (2)
dpdata/plugins/gaussian.py (2)
8-8: Import looks good.np is required for DataType and Hessian reshape in this module.
43-56: Fix Hessian shape mismatch vs registered DataType; clean up ruff findingsFile: dpdata/plugins/gaussian.py Lines: 43-56
- Normalize hessian from (nframes, 3N, 3N) -> (nframes, natoms, 3, natoms, 3) to match the registered dtype.
- Rename **kwargs -> **_kwargs (ARG002) and move the return into the else branch to satisfy TRY300.
@Format.register("gaussian/fchk") class GaussianFChkFormat(Format): def from_labeled_system( - self, file_name: FileType, md=False, has_forces=True, has_hessian=True, **kwargs + self, file_name: FileType, md=False, has_forces=True, has_hessian=True, **_kwargs ): - try: - data = dpdata.gaussian.fchk.to_system_data( - file_name, md=md, has_forces=has_forces, has_hessian=has_hessian - ) - register_hessian_data(data) - return data - except AssertionError: - return {"energies": [], "forces": [], "hessian": [], "nopbc": True} + try: + data = dpdata.gaussian.fchk.to_system_data( + file_name, md=md, has_forces=has_forces, has_hessian=has_hessian + ) + except AssertionError: + return {"energies": [], "forces": [], "hessian": [], "nopbc": True} + else: + # Normalize Hessian to (nframes, natoms, 3, natoms, 3) to match DataType. + if has_hessian and "hessian" in data and isinstance(data["hessian"], np.ndarray): + if data["hessian"].ndim == 3: # (nframes, 3N, 3N) + nframes, m, n = data["hessian"].shape + natoms = int(data["coords"].shape[1]) + if m == n == 3 * natoms: + data["hessian"] = data["hessian"].reshape(nframes, natoms, 3, natoms, 3) + register_hessian_data(data) + return dataI could not verify the upstream reader in this checkout — please confirm that dpdata.gaussian.fchk.to_system_data returns (nframes, 3N, 3N) (or run the grep below) before merging:
rg -n -C2 "data\['hessian'\]|reshape\(\s*\(3\s*\*.*?,\s*3\s*\*" dpdata/gaussian/fchk.py
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dpdata/gaussian/fchk.py (1)
1-179: Fix ruff violations in dpdata/gaussian/fchk.py
- TRY003 (dpdata/gaussian/fchk.py:55–60): shorten the exception message or move message formatting into an exception class; e.g. replace with a concise message like raise ValueError(f"Input length {hessian_block.size} != expected {expected_length}").
- ARG001 (dpdata/gaussian/fchk.py:74): unused parameter
md— rename to_md(preserve API) or remove if safe.- E741 (dpdata/gaussian/fchk.py:112–116, 118–124, 131–137, 142–148): rename ambiguous variable
ltolineorraw_linein all loops.
Run ruff check/format after these edits.
♻️ Duplicate comments (2)
tests/test_gaussian_fchk.py (2)
11-13: Make fixture path robust to CWD.Build paths relative to this test file to avoid CI flakiness. (Prev comment already raised this.)
- self.system = dpdata.LabeledSystem( - "gaussian/waterfreq.gaussianfchk", fmt="gaussian/fchk" - ) + self.system = dpdata.LabeledSystem( + str(TEST_DIR / "gaussian" / "waterfreq.gaussianfchk"), fmt="gaussian/fchk" + )Additional context to add near the imports:
+from pathlib import Path +TEST_DIR = Path(__file__).parent
82-87: Apply the same robust path handling here.Use TEST_DIR for both fchk and log.
- self.system_fchk = dpdata.LabeledSystem( - "gaussian/waterfreq.gaussianfchk", fmt="gaussian/fchk" - ) - self.system_log = dpdata.LabeledSystem( - "gaussian/waterfreq.gaussianlog", fmt="gaussian/log" - ) + self.system_fchk = dpdata.LabeledSystem( + str(TEST_DIR / "gaussian" / "waterfreq.gaussianfchk"), fmt="gaussian/fchk" + ) + self.system_log = dpdata.LabeledSystem( + str(TEST_DIR / "gaussian" / "waterfreq.gaussianlog"), fmt="gaussian/log" + )
🧹 Nitpick comments (7)
dpdata/gaussian/fchk.py (5)
57-60: Shorten exception message to satisfy Ruff TRY003.Use a concise error string.
- raise ValueError( - f"Input data length ({hessian_block.size}) does not match expected length ({expected_length}) " - f"for {natoms} atoms (matrix dimension {dim}x{dim})." - ) + raise ValueError( + f"Invalid Hessian length: got {hessian_block.size}, expected {expected_length} (natoms={natoms}, dim={dim})." + )
74-94: Silence unusedmdwhile keeping signature compatibility.Keep
mdfor the plugin interface but avoid ARG001.-def to_system_data(file_name: FileType, md=False, has_forces=True, has_hessian=True): +def to_system_data( + file_name: FileType, + md: bool = False, + has_forces: bool = True, + has_hessian: bool = True, +): @@ - data = {} + # fchk is effectively single-frame; keep `md` for API compatibility. + del md # unused + data = {}
112-116: Avoid ambiguous single-letterlvariable (Ruff E741).Rename to
chunkfor readability and lint compliance.- l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - atom_numbers += [int(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + atom_numbers += [int(x) for x in chunk.split()] @@ - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - coords_raw += [float(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + coords_raw += [float(x) for x in chunk.split()] @@ - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - forces_raw += [float(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + forces_raw += [float(x) for x in chunk.split()] @@ - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - hessian_raw += [float(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + hessian_raw += [float(x) for x in chunk.split()]Also applies to: 120-124, 133-137, 144-147
126-128: Make scalar energy parsing robust to fchk variants.Handle cases where the value isn’t on the same line (e.g., header with trailing tokens).
- elif "Total Energy" in line: - energy = float(line.split()[-1]) * energy_convert + elif "Total Energy" in line: + tokens = line.split() + try: + energy = float(tokens[-1]) * energy_convert + except ValueError: + nxt = next(fp) + if isinstance(nxt, bytes): + nxt = nxt.decode(errors="ignore") + energy = float(nxt.split()[0]) * energy_convert energy_t.append(energy)
154-160: Assertions tie behavior to flags; consider clearer error messages.The asserts are fine, but raising with context (which field is missing) improves UX; optional.
- if has_forces: - assert forces_t, "cannot find forces" - if has_hessian: - assert hessian_t, "cannot find hessian" + if has_forces and not forces_t: + raise ValueError("cannot find forces in fchk (disable with has_forces=False)") + if has_hessian and not hessian_t: + raise ValueError("cannot find hessian in fchk (disable with has_hessian=False)")tests/test_gaussian_fchk.py (2)
89-95: Remove unused conversion imports/variables.They’re not used; trim to keep tests minimal.
- # Get conversion factors from dpdata - from dpdata.unit import EnergyConversion, ForceConversion, LengthConversion - - self.energy_convert = EnergyConversion("hartree", "eV").value() - self.force_convert = ForceConversion("hartree/bohr", "eV/angstrom").value() - self.length_convert = LengthConversion("bohr", "angstrom").value() + # (no-op) conversions not needed for this comparison
31-35: Tighter assertion for atom_types.Use array equality instead of a loop.
- for ii in range(len(self.atom_types)): - self.assertEqual(self.system.data["atom_types"][ii], self.atom_types[ii]) + np.testing.assert_array_equal(self.system.data["atom_types"], self.atom_types)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dpdata/gaussian/fchk.py(1 hunks)tests/test_gaussian_fchk.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/gaussian/fchk.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/gaussian/fchk.py
🧬 Code graph analysis (2)
dpdata/gaussian/fchk.py (2)
dpdata/utils.py (1)
open_file(137-170)dpdata/unit.py (5)
EnergyConversion(106-117)ForceConversion(134-152)HessianConversion(155-180)LengthConversion(120-131)value(93-94)
tests/test_gaussian_fchk.py (2)
dpdata/system.py (1)
LabeledSystem(1159-1357)dpdata/unit.py (4)
EnergyConversion(106-117)ForceConversion(134-152)LengthConversion(120-131)value(93-94)
🪛 Ruff (0.12.2)
dpdata/gaussian/fchk.py
57-60: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Unused function argument: md
(ARG001)
112-112: Ambiguous variable name: l
(E741)
114-114: Ambiguous variable name: l
(E741)
120-120: Ambiguous variable name: l
(E741)
122-122: Ambiguous variable name: l
(E741)
133-133: Ambiguous variable name: l
(E741)
135-135: Ambiguous variable name: l
(E741)
144-144: Ambiguous variable name: l
(E741)
146-146: Ambiguous variable name: l
(E741)
🔇 Additional comments (3)
dpdata/gaussian/fchk.py (3)
26-71: Hessian reconstruction logic is sound.Size check, lower-tri fill, and symmetric mirroring are correct and numerically stable. Nice.
140-153: Nice: lower-tri Hessian to full tensor with unit conversion.The reshape to (natoms,3,natoms,3) matches the registered shape.
20-23: Unit conversion constants defined once at import — good.This avoids recomputation and keeps the parser fast and consistent.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (8)
dpdata/gaussian/fchk.py (8)
26-71: Create-full-Hessian logic is sound; tighten the error message to satisfy Ruff TRY003.
Shorten the long exception text.Apply this diff:
- if hessian_block.size != expected_length: - raise ValueError( - f"Input data length ({hessian_block.size}) does not match expected length ({expected_length}) " - f"for {natoms} atoms (matrix dimension {dim}x{dim})." - ) + if hessian_block.size != expected_length: + raise ValueError( + f"Invalid Hessian length: got {hessian_block.size}, expected {expected_length} " + f"(natoms={natoms}, dim={dim})." + )
74-93: md is unused; make intent explicit to silence Ruff ARG001.
Either consume it or mark intentionally unused.Apply this minimal change after initializing
data:- data = {} + data = {} + _ = md # fchk is effectively single-frame; md flag not usedAlternatively, raise if someone tries
md=True.
111-116: Rename ambiguous loop variable and accept FortranDexponents (Ruff E741, robustness).Apply this diff:
- while len(atom_numbers) < n: - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - atom_numbers += [int(x) for x in l.split()] + while len(atom_numbers) < n: + next_line = next(fp) + if isinstance(next_line, bytes): + next_line = next_line.decode(errors="ignore") + tokens = next_line.split() + atom_numbers += [int(x) for x in tokens] - while len(coords_raw) < n: - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - coords_raw += [float(x) for x in l.split()] + while len(coords_raw) < n: + next_line = next(fp) + if isinstance(next_line, bytes): + next_line = next_line.decode(errors="ignore") + coords_raw += [float(x.replace("D", "E")) for x in next_line.split()] - while len(forces_raw) < n: - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - forces_raw += [float(x) for x in l.split()] + while len(forces_raw) < n: + next_line = next(fp) + if isinstance(next_line, bytes): + next_line = next_line.decode(errors="ignore") + forces_raw += [float(x.replace("D", "E")) for x in next_line.split()] - while len(hessian_raw) < n: - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - hessian_raw += [float(x) for x in l.split()] + while len(hessian_raw) < n: + next_line = next(fp) + if isinstance(next_line, bytes): + next_line = next_line.decode(errors="ignore") + hessian_raw += [float(x.replace("D", "E")) for x in next_line.split()]Also applies to: 120-124, 133-137, 144-147
140-153: Add sanity checks for natoms and Hessian header length before reconstructing.
Prevents silent mis-shapes when files are malformed.Apply this diff:
elif "Cartesian Force Constants" in line and "R" in line: n = int(line.split()[-1]) hessian_raw = [] while len(hessian_raw) < n: next_line = next(fp) if isinstance(next_line, bytes): next_line = next_line.decode(errors="ignore") hessian_raw += [float(x.replace("D", "E")) for x in next_line.split()] - hessian_full = ( - create_full_hessian(hessian_raw, natoms) * hessian_convert - ) + if natoms <= 0: + raise ValueError("natoms must be parsed before Hessian block.") + expected = (3 * natoms) * (3 * natoms + 1) // 2 + if n != expected: + raise ValueError(f"Invalid Hessian header length: N={n}, expected {expected} for natoms={natoms}.") + hessian_full = create_full_hessian(hessian_raw, natoms) * hessian_convert # store as (natoms, 3, natoms, 3) to align with registered shape hessian_t.append(hessian_full.reshape(natoms, 3, natoms, 3))
154-160: Avoidassertfor runtime validation; raise explicit exceptions (asserts can be stripped).Apply this diff:
- assert coords_t, "cannot find coords" - assert energy_t, "cannot find energy" - if has_forces: - assert forces_t, "cannot find forces" - if has_hessian: - assert hessian_t, "cannot find hessian" + if not coords_t: + raise ValueError("cannot find coords") + if not energy_t: + raise ValueError("cannot find energy") + if has_forces and not forces_t: + raise ValueError("cannot find forces") + if has_hessian and not hessian_t: + raise ValueError("cannot find hessian")
161-168: Validate atom_numbers vs natoms to catch inconsistent files.Apply this diff:
- atom_symbols = [ELEMENTS[z - 1] for z in atom_numbers] + if natoms != len(atom_numbers): + raise ValueError(f"atomic numbers length {len(atom_numbers)} != natoms {natoms}") + atom_symbols = [ELEMENTS[z - 1] for z in atom_numbers]
169-171: Cells should match number of frames; broadcast instead of a single cell.
Prevents shape mismatches if fchk ever carries multiple snapshots.Apply this diff:
- data["orig"] = np.zeros(3) - data["cells"] = np.array([np.eye(3) * 100]) + nframes = len(coords_t) + data["orig"] = np.zeros(3) + data["cells"] = np.repeat((np.eye(3) * 100)[None, :, :], nframes, axis=0) data["nopbc"] = True
20-23: Optional: PEP8 constant naming for module-level conversions.
If you care about style consistency, rename to UPPER_CASE and update usages.Proposed diff (partial):
-length_convert = LengthConversion("bohr", "angstrom").value() -energy_convert = EnergyConversion("hartree", "eV").value() -force_convert = ForceConversion("hartree/bohr", "eV/angstrom").value() -hessian_convert = HessianConversion("hartree/bohr^2", "eV/angstrom^2").value() +LENGTH_CONVERT = LengthConversion("bohr", "angstrom").value() +ENERGY_CONVERT = EnergyConversion("hartree", "eV").value() +FORCE_CONVERT = ForceConversion("hartree/bohr", "eV/angstrom").value() +HESSIAN_CONVERT = HessianConversion("hartree/bohr^2", "eV/angstrom^2").value() - coords = np.array(coords_raw).reshape(-1, 3) * length_convert + coords = np.array(coords_raw).reshape(-1, 3) * LENGTH_CONVERT - energy = float(line.split()[-1]) * energy_convert + energy = float(line.split()[-1]) * ENERGY_CONVERT - forces = -np.array(forces_raw).reshape(-1, 3) * force_convert + forces = -np.array(forces_raw).reshape(-1, 3) * FORCE_CONVERT - hessian_full = create_full_hessian(hessian_raw, natoms) * hessian_convert + hessian_full = create_full_hessian(hessian_raw, natoms) * HESSIAN_CONVERTAlso applies to: 124-125, 127-129, 138-139, 149-153
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/gaussian/fchk.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/gaussian/fchk.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/gaussian/fchk.py
🧬 Code graph analysis (1)
dpdata/gaussian/fchk.py (2)
dpdata/utils.py (1)
open_file(137-170)dpdata/unit.py (5)
EnergyConversion(106-117)ForceConversion(134-152)HessianConversion(155-180)LengthConversion(120-131)value(93-94)
🪛 Ruff (0.12.2)
dpdata/gaussian/fchk.py
57-60: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Unused function argument: md
(ARG001)
112-112: Ambiguous variable name: l
(E741)
114-114: Ambiguous variable name: l
(E741)
120-120: Ambiguous variable name: l
(E741)
122-122: Ambiguous variable name: l
(E741)
133-133: Ambiguous variable name: l
(E741)
135-135: Ambiguous variable name: l
(E741)
144-144: Ambiguous variable name: l
(E741)
146-146: Ambiguous variable name: l
(E741)
🔇 Additional comments (2)
dpdata/gaussian/fchk.py (2)
20-23: Unit conversions look correct (Bohr→Å, Hartree→eV, Hartree/Bohr→eV/Å, Hartree/Bohr²→eV/Ų).
124-139: Approve — conversions & ndarray atom_types are fineEnd-to-end conversions/shapes verified; atom_types is used as np.ndarray across readers and System.DTYPES, and System.dump uses monty.serialization.dumpfn — monty's MontyEncoder handles numpy arrays, so keep atom_types as an ndarray. Convert to .tolist() only if you must produce plain JSON without monty's encoder. (docs.deepmodeling.org)
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
dpdata/gaussian/fchk.py (5)
26-72: Hessian reconstruction is correct; tighten the error message to satisfy ruff TRY003.The lower-triangular length check and symmetry fill are correct. To appease TRY003, shorten the exception text.
Apply this diff:
- if hessian_block.size != expected_length: - raise ValueError( - f"Input data length ({hessian_block.size}) does not match expected length ({expected_length}) " - f"for {natoms} atoms (matrix dimension {dim}x{dim})." - ) + if hessian_block.size != expected_length: + raise ValueError( + f"lower-tri length mismatch: got {hessian_block.size}, expected {expected_length} (natoms={natoms}, dim={dim})" + )
74-88:mdis unused (ruff ARG001).Keep the param for API compatibility but mark it used.
Apply this diff:
def to_system_data(file_name: FileType, md=False, has_forces=True, has_hessian=True): @@ - data = {} + data = {} + _ = md # reserved for future multi-frame supportAlso applies to: 94-101
112-116: Rename ambiguous variablel(ruff E741).Use clearer names in the read loops.
Apply this diff:
- l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - atom_numbers += [int(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + atom_numbers += [int(x) for x in chunk.split()] - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - coords_raw += [float(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + coords_raw += [float(x) for x in chunk.split()] - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - forces_raw += [float(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + forces_raw += [float(x) for x in chunk.split()] - l = next(fp) - if isinstance(l, bytes): - l = l.decode(errors="ignore") - hessian_raw += [float(x) for x in l.split()] + chunk = next(fp) + if isinstance(chunk, bytes): + chunk = chunk.decode(errors="ignore") + hessian_raw += [float(x) for x in chunk.split()]Also applies to: 120-124, 133-137, 144-147
154-160: Prefer explicit exceptions overassertfor required data.Asserts can be stripped with
-O. RaiseValueErrorto guarantee checks in production.Apply this diff:
- assert coords_t, "cannot find coords" - assert energy_t, "cannot find energy" - if has_forces: - assert forces_t, "cannot find forces" - if has_hessian: - assert hessian_t, "cannot find hessian" + if not coords_t: + raise ValueError("cannot find coords") + if not energy_t: + raise ValueError("cannot find energy") + if has_forces and not forces_t: + raise ValueError("cannot find forces") + if has_hessian and not hessian_t: + raise ValueError("cannot find hessian")
126-139: Minor robustness: tolerate scalar formatting variance in FCHK lines.
line.split()[-1]usually works; if Gaussian changes spacing, a regex for trailing number would be safer. Optional.Example:
- elif "Total Energy" in line: - energy = float(line.split()[-1]) * energy_convert + elif "Total Energy" in line: + import re + m = re.search(r'([-+]?\d+(\.\d+)?([Ee][-+]?\d+)?)\s*$', line) + if m: + energy = float(m.group(1)) * energy_convert + else: + raise ValueError("failed to parse Total Energy")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/gaussian/fchk.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/gaussian/fchk.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/gaussian/fchk.py
🧬 Code graph analysis (1)
dpdata/gaussian/fchk.py (2)
dpdata/utils.py (1)
open_file(137-170)dpdata/unit.py (5)
EnergyConversion(106-117)ForceConversion(134-152)HessianConversion(155-180)LengthConversion(120-131)value(93-94)
🪛 Ruff (0.12.2)
dpdata/gaussian/fchk.py
57-60: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Unused function argument: md
(ARG001)
112-112: Ambiguous variable name: l
(E741)
114-114: Ambiguous variable name: l
(E741)
120-120: Ambiguous variable name: l
(E741)
122-122: Ambiguous variable name: l
(E741)
133-133: Ambiguous variable name: l
(E741)
135-135: Ambiguous variable name: l
(E741)
144-144: Ambiguous variable name: l
(E741)
146-146: Ambiguous variable name: l
(E741)
🔇 Additional comments (4)
dpdata/gaussian/fchk.py (4)
20-23: Unit conversions look correct and consistent.The chosen unit pairs match Gaussian FCHK conventions and dpdata’s converters. No changes needed.
12-18: Module placement underdpdata/gaussian/follows repo guidelines.Conforms to “new format under dpdata//”. Formatting also looks ruff‑friendly after the above small fixes.
161-168: No change required — ELEMENTS is 0-based.dpdata/periodic_table.py defines ELEMENTS = ["H", "He", …] (H at index 0), so ELEMENTS[z - 1] correctly maps atomic number z to its symbol.
140-153: Approve: Hessian shape and registration matchHessian stored as (natoms, 3, natoms, 3) per frame; plugin registers (Axis.NFRAMES, Axis.NATOMS, 3, Axis.NATOMS, 3) and conversion is applied once — no changes required.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
dpdata/gaussian/fchk.py (2)
57-59: Consider extracting the error message to a constant.The static analysis tool suggests avoiding long messages directly in the exception. While this is a minor style preference, you could extract the message format to improve maintainability.
+# Error message template for Hessian validation +HESSIAN_SIZE_ERROR_MSG = "Input length {actual} != expected {expected}" + def create_full_hessian(hessian_raw: list | np.ndarray, natoms: int) -> np.ndarray: # ... existing code ... if hessian_block.size != expected_length: raise ValueError( - f"Input length {hessian_block.size} != expected {expected_length}" + HESSIAN_SIZE_ERROR_MSG.format( + actual=hessian_block.size, expected=expected_length + ) )
167-167: Centralize or document the hardcoded default cell size (100 Å)100 Å is hardcoded in dpdata/gaussian/fchk.py:167, dpdata/gaussian/gjf.py:326, and dpdata/plugins/xyz.py:44 — either make this a shared configurable constant or derive it from molecular dimensions (bounding box + padding); if keeping 100 Å, add a short comment explaining the rationale.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/gaussian/fchk.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/gaussian/fchk.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/gaussian/fchk.py
🧬 Code graph analysis (1)
dpdata/gaussian/fchk.py (3)
dpdata/utils.py (1)
open_file(137-170)dpdata/unit.py (5)
EnergyConversion(106-117)ForceConversion(134-152)HessianConversion(155-180)LengthConversion(120-131)value(93-94)dpdata/system.py (1)
has_forces(1269-1270)
🪛 Ruff (0.12.2)
dpdata/gaussian/fchk.py
57-59: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (10)
dpdata/gaussian/fchk.py (10)
1-24: LGTM! Clean imports and unit conversion setup.The imports are well-organized with proper type checking guards, and the unit conversion constants are correctly initialized using the dpdata unit conversion framework.
26-71: Well-implemented Hessian reconstruction with thorough validation.The function correctly reconstructs a symmetric Hessian matrix from lower triangular elements with proper input validation and clear documentation. The mathematical approach using
M_full = M_lower + M_lower.T - diag(M_lower)is correct and efficient.
73-90: Function signature and documentation look good.The parameters and return type are well-documented. The function signature correctly removes the unused
mdparameter mentioned in the AI summary.
99-112: Robust file parsing with proper encoding handling.The byte-to-string decoding with error handling is appropriate for parsing potentially problematic file formats. The atomic number parsing logic correctly handles multi-line data blocks.
113-122: Correct coordinate parsing and unit conversion.The coordinate parsing correctly handles multi-line blocks and applies the proper unit conversion from Bohr to Angstrom. The reshaping to
(-1, 3)format is appropriate for Cartesian coordinates.
123-125: Proper energy extraction and conversion.Energy parsing and conversion from Hartree to eV is correctly implemented.
126-136: Correct gradient-to-forces conversion.The conversion from Cartesian gradients to forces by negation (
F = -∇E) is physically correct and well-documented. The unit conversion and reshaping are appropriate.
137-149: Excellent Hessian processing implementation.The Hessian parsing, reconstruction via
create_full_hessian, unit conversion, and reshaping to(natoms, 3, natoms, 3)format aligns perfectly with the registered shape mentioned in the AI summary for the Gaussian plugin.
150-156: Appropriate data validation with conditional checks.The assertions ensure essential data is present, and the conditional validation for forces and Hessian based on the function parameters is well-implemented.
157-175: Well-structured data assembly and return.The data dictionary construction follows dpdata conventions with proper handling of atom information, coordinates, boundary conditions, and optional fields (energies, forces, Hessian). The use of
np.uniquefor atom type processing is efficient.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
dpdata/gaussian/fchk.py (3)
165-168: Provide one cell per frame (not a single 1x cell).If multiple frames are ever present, match cells’ length to coords’ length.
- data["coords"] = np.array(coords_t).reshape(-1, natoms, 3) - data["orig"] = np.zeros(3) - data["cells"] = np.array([np.eye(3) * 100]) + data["coords"] = np.array(coords_t).reshape(-1, natoms, 3) + nframes = data["coords"].shape[0] + data["orig"] = np.zeros(3) + data["cells"] = np.repeat((np.eye(3) * 100)[None, :, :], nframes, axis=0)
57-59: Shorten exception message to satisfy ruff TRY003 hint.Keep the message concise; detailed counts can be added to a custom exception if needed.
- raise ValueError( - f"Input length {hessian_block.size} != expected {expected_length}" - ) + raise ValueError("Invalid lower-triangular Hessian length")
151-156: Prefer explicit exceptions over asserts for library-facing validation.Asserts can be disabled with -O. Raise ValueError for reliability.
- assert coords_t, "cannot find coords" - assert energy_t, "cannot find energy" - if has_forces: - assert forces_t, "cannot find forces" - if has_hessian: - assert hessian_t, "cannot find hessian" + if not coords_t: + raise ValueError("coords not found in fchk") + if not energy_t: + raise ValueError("energy not found in fchk") + if has_forces and not forces_t: + raise ValueError("forces not found in fchk") + if has_hessian and not hessian_t: + raise ValueError("hessian not found in fchk")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/gaussian/fchk.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/gaussian/fchk.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/gaussian/fchk.py
🧬 Code graph analysis (1)
dpdata/gaussian/fchk.py (2)
dpdata/utils.py (1)
open_file(137-170)dpdata/unit.py (5)
EnergyConversion(106-117)ForceConversion(134-152)HessianConversion(155-180)LengthConversion(120-131)value(93-94)
🪛 Ruff (0.12.2)
dpdata/gaussian/fchk.py
57-59: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
dpdata/gaussian/fchk.py (2)
61-69: Hessian reconstruction logic is correct and efficient.Lower-triangular fill and symmetric mirroring are implemented properly.
158-158: Keep- 1— ELEMENTS is 0-indexed. dpdata/periodic_table.py defines ELEMENTS starting with "H" at index 0 and Element.from_Z uses ELEMENTS[Z - 1], so atom_symbols = [ELEMENTS[z - 1] for z in atom_numbers] is correct.
| def to_system_data(file_name: FileType, has_forces=True, has_hessian=True): | ||
| """Read Gaussian fchk file. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| file_name : str | ||
| file name | ||
| has_forces : bool, default True | ||
| whether to read force | ||
| Note: Cartesian Gradient in fchk file is converted to forces by taking negative sign | ||
| has_hessian : bool, default True | ||
| whether to read hessian | ||
|
|
There was a problem hiding this comment.
Relax defaults for has_forces/has_hessian to avoid unexpected assertion failures.
Most fchk files won’t include gradients/Hessian. Defaulting to True is too strict for general use.
-def to_system_data(file_name: FileType, has_forces=True, has_hessian=True):
+def to_system_data(file_name: FileType, has_forces=False, has_hessian=False):
@@
- has_forces : bool, default True
+ has_forces : bool, default False
@@
- has_hessian : bool, default True
+ has_hessian : bool, default False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def to_system_data(file_name: FileType, has_forces=True, has_hessian=True): | |
| """Read Gaussian fchk file. | |
| Parameters | |
| ---------- | |
| file_name : str | |
| file name | |
| has_forces : bool, default True | |
| whether to read force | |
| Note: Cartesian Gradient in fchk file is converted to forces by taking negative sign | |
| has_hessian : bool, default True | |
| whether to read hessian | |
| def to_system_data(file_name: FileType, has_forces=False, has_hessian=False): | |
| """Read Gaussian fchk file. | |
| Parameters | |
| ---------- | |
| file_name : str | |
| file name | |
| has_forces : bool, default False | |
| whether to read force | |
| Note: Cartesian Gradient in fchk file is converted to forces by taking negative sign | |
| has_hessian : bool, default False | |
| whether to read hessian | |
🤖 Prompt for AI Agents
In dpdata/gaussian/fchk.py around lines 73 to 85, the function to_system_data
currently defaults has_forces and has_hessian to True which causes unexpected
assertion failures for fchk files lacking gradients/Hessians; change the
parameter defaults to has_forces=False and has_hessian=False, update the
docstring to reflect the new defaults, and replace any unconditional assertions
that assume presence of gradient/Hessian with conditional checks that only
validate or convert those sections when the data actually exists (e.g., check
keys/flags in the parsed fchk content before accessing or asserting).
| elif "Total Energy" in line: | ||
| energy = float(line.split()[-1]) * energy_convert | ||
| energy_t.append(energy) | ||
| elif "Cartesian Gradient" in line: |
There was a problem hiding this comment.
Fix fragile “Total Energy” parsing to handle fchk scalar on next line (N= 1).
Many fchk files put scalar values on the next line when the header includes “N= 1”. Current code can read “1” instead of the energy. Parse robustly as below.
- elif "Total Energy" in line:
- energy = float(line.split()[-1]) * energy_convert
- energy_t.append(energy)
+ elif "Total Energy" in line:
+ # fchk may have scalar on the same line or next line when "N= 1" appears
+ if "N=" in line:
+ next_line = next(fp)
+ if isinstance(next_line, bytes):
+ next_line = next_line.decode(errors="ignore")
+ energy_val = float(next_line.split()[0])
+ else:
+ energy_val = float(line.split()[-1])
+ energy_t.append(energy_val * energy_convert)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif "Total Energy" in line: | |
| energy = float(line.split()[-1]) * energy_convert | |
| energy_t.append(energy) | |
| elif "Cartesian Gradient" in line: | |
| elif "Total Energy" in line: | |
| # fchk may have scalar on the same line or next line when "N= 1" appears | |
| if "N=" in line: | |
| next_line = next(fp) | |
| if isinstance(next_line, bytes): | |
| next_line = next_line.decode(errors="ignore") | |
| energy_val = float(next_line.split()[0]) | |
| else: | |
| energy_val = float(line.split()[-1]) | |
| energy_t.append(energy_val * energy_convert) | |
| elif "Cartesian Gradient" in line: |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
dpdata/plugins/gaussian.py (1)
8-13: Imports look good; past NameError is now resolved.
import dpdata.gaussian.fchkbinds the top-leveldpdatamodule, so laterdpdata.LabeledSystemreferences won’t raise. No further action needed.
🧹 Nitpick comments (2)
dpdata/plugins/gaussian.py (2)
22-33: Make Hessian dtype registration idempotent.Repeated calls will trigger duplicate-registration warnings. Guard the registration.
Apply:
def register_hessian_data(data): if "hessian" in data: dt = DataType( "hessian", np.ndarray, (Axis.NFRAMES, Axis.NATOMS, 3, Axis.NATOMS, 3), required=False, deepmd_name="hessian", ) - dpdata.LabeledSystem.register_data_type(dt) + # avoid duplicate registrations + if not any(x.name == "hessian" for x in dpdata.LabeledSystem.DTYPES): + dpdata.LabeledSystem.register_data_type(dt)Optional: explicitly
import dpdataat the top to decouple from submodule imports.
43-56: Handle fallback + lints; register dtype in both paths.
- ruff ARG002:
kwargsis unused.- TRY300: return in
trycan move toelse.- Ensure
register_hessian_dataruns even on fallback; respecthas_forces/has_hessian.Apply:
@Format.register("gaussian/fchk") class GaussianFChkFormat(Format): def from_labeled_system( self, file_name: FileType, has_forces=True, has_hessian=True, **kwargs ): - try: - data = dpdata.gaussian.fchk.to_system_data( - file_name, has_forces=has_forces, has_hessian=has_hessian - ) - register_hessian_data(data) - return data - except AssertionError: - return {"energies": [], "forces": [], "hessian": [], "nopbc": True} + del kwargs # noqa: ARG002 + try: + data = dpdata.gaussian.fchk.to_system_data( + file_name, has_forces=has_forces, has_hessian=has_hessian + ) + except AssertionError: + data = {"energies": [], "nopbc": True} + if has_forces: + data["forces"] = [] + if has_hessian: + data["hessian"] = [] + else: + pass + register_hessian_data(data) + return data
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dpdata/plugins/gaussian.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
dpdata/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use ruff to lint and format all Python code under dpdata/ before committing
Files:
dpdata/plugins/gaussian.py
dpdata/*/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
When adding a new format, implement it as a new submodule under dpdata//
Files:
dpdata/plugins/gaussian.py
🧬 Code graph analysis (1)
dpdata/plugins/gaussian.py (3)
dpdata/data_type.py (3)
Axis(15-21)DataType(35-131)register_data_type(138-149)dpdata/system.py (3)
LabeledSystem(1159-1357)register_data_type(1094-1111)has_forces(1269-1270)dpdata/gaussian/fchk.py (1)
to_system_data(73-175)
🪛 Ruff (0.12.2)
dpdata/plugins/gaussian.py
46-46: Unused method argument: kwargs
(ARG002)
53-53: Consider moving this statement to an else block
(TRY300)
Added reading of fchk format files and support for saving hessian.npy
Summary by CodeRabbit
New Features
Tests