I don't think we're quite at the point where a mobile CPU would be able to pull this off, however filming and uploading a well shot video to a server for further processing should be a workable way to make a full model. I think a big challenge there is a several-minute feedback loop to learn if it worked and if you need to try it again.
Probably for some version of done - download VisualSfm and give it a shot yourself if you want: http://ccwu.me/vsfm/ or for a purely mobile approach try http://seene.co/