This has been a concern of mine for a long time. People act like docs and code bases are enough, but it’s obvious when looking up something niche that it isn’t. These models need a lot of input data, and we’re effectively killing the source(s) of new data.
It feels like less stack overflow is a narrowing, and that’s kind of where my question comes from. The remaining content for training is the actual authoritative library documentation source material. I’m not sure that’s necessarily bad, it’s certainly less volume, but it’s probably also higher quality.
I don’t know the answer here, but I think the situation is a lot more nuanced than all of the black and white hot takes.
This has been a concern of mine for a long time. People act like docs and code bases are enough, but it’s obvious when looking up something niche that it isn’t. These models need a lot of input data, and we’re effectively killing the source(s) of new data.
It feels like less stack overflow is a narrowing, and that’s kind of where my question comes from. The remaining content for training is the actual authoritative library documentation source material. I’m not sure that’s necessarily bad, it’s certainly less volume, but it’s probably also higher quality.
I don’t know the answer here, but I think the situation is a lot more nuanced than all of the black and white hot takes.