They want you to make good quality audio recordings of you speaking about 20 000 phrases. It could take 40 to 80 hours of speaking and recording, maximum 4 hours per day.
The amount of data depends on if there's a voice for the language already. If so, about 2 hours of data is usually good enough. Otherwise, 10-20 hours usually does it.
https://github.com/MycroftAI/mimic-recording-studio
https://mycroft.ai/contribute/