Skip to content

sharegpt dataset convert error #6878

@JJJYmmm

Description

@JJJYmmm

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.35
  • Python version: 3.10.16
  • PyTorch version: 2.6.0+cu126 (GPU)
  • Transformers version: 4.49.0.dev0
  • Datasets version: 2.21.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA H100 80GB HBM3
  • GPU number: 8
  • GPU memory: 79.19GB

Reproduction

Just use sharegpt_hyper dataset would cause the error.

[rank0]:   File "/home/xxx/LLaMA-Factory/src/llamafactory/data/aligner.py", line 15
3, in convert_sharegpt                                                                  
[rank0]:     {"role": tag_mapping[message[dataset_attr.role_tag]], "content": message[da
taset_attr.content_tag]}                                                                
[rank0]: KeyError: 'user'

code here, when broken_data = True, it doesn't break and cause key error finally.

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions