Fix transformer training

I've noticed a bug in the way we're training transformers - currently, we are training them to predict token N, given tokens 0-N inclusive! 

To fix this we need to shift the targets by one, so

```
loss = ce_loss(logits, latent)
```

becomes 
```
loss = ce_loss(logits[:,:,:-1], quantizations_target[:,1:])
```

Alternatively we could change the return value of the inferer so it provides the shifted versions - any opinions @Warvito ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix transformer training #314

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix transformer training #314

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions