
1·
1 year agoYou certainly can combine all the tasks and datasets into a single instruction fine tuning dataset. Then you would have a separate dataset for the reinforcement learning half where the model is learning human preferences.
You certainly can combine all the tasks and datasets into a single instruction fine tuning dataset. Then you would have a separate dataset for the reinforcement learning half where the model is learning human preferences.
I’d like to think we dove deep, but let me know!
The only book he explicitly mentions is “Thinking Fast and Slow” by Daniel Kahneman, but I think there are a ton of books that would be great resources along side the papers. I just happened to pull a lot of the papers from the footnotes and concepts he mentioned.