The problem is data scarcity. You're stuck with a few million rows of data, which just isn't enough given how noisy and subtle the genotype->phenotype relationships are. Neural nets work well on language, Chess, Go, etc, because we have gigantic datasets where the subtle patterns can be teased out without overfitting.