Small data is a simple and economic GPT model that supports 4k tokens (around 3 word pages of text).
Medium data is a more expensive model that supports 16k tokens (around 12 word pages of text) more data, more processing and cost.
Big data is a model that has no limit of content. However you need to pay for training such big data at every training session you need to do.
For custom purposes we can train from any format of data, excel, pdf, doc...