If that’s not helpful, were you getting at having the model return some rich data about the attention weights that went into generating some token?
If that’s not helpful, were you getting at having the model return some rich data about the attention weights that went into generating some token?