@pydataamsterdam "Choice of training data is the most important [part] of an LLM!"

Data quality improvements "can be equivalent to a 2x-3x increase in size"