Those are very appropriate follow up questions I think. If someone tasks you to deal with 6 TiB of data, it is very appropriate to ask enough questions until you can provide a good solution, far better than to assume the questions are unknowable and blindly architect for all use cases.
How could anyone answer this without knowing how the data is to be used (query patterns, concurrent readers, writes/updates, latency, etc)?
Awk may be right for some scenarios, but without specifics it can't be a correct answer.