God bless people who implement models from academic articles that should frankly include them to begin with. What's more is that this implementation has clear instructions for extending this to your datasets.
> academic articles that should frankly include them to begin with.
Boo hoo. Researchers tell the world for free exactly how to implement their state-of-the-art work (which probably cost north of $1M to develop) and promise to release code, and we're indignant because they didn't do it quite fast enough for us.
(1) It's not free when the research is funded by public tax dollars (which I grant is not always, but often the case).
(2) I'm not aware of this promise you're speaking of... my understanding is that authors of papers are under no obligation to produce any implementation, let alone usable, documented implementations.
> (1) It's not free when the research is funded by public tax dollars (which I grant is not always, but often the case).
The work was done at FAIR (Facebook), so that's not applicable here.
> (2) I'm not aware of this promise you're speaking of... my understanding is that authors of papers are under no obligation to produce any implementation, let alone usable, documented implementations.
Look at the paper [1], it's the last sentence of the abstract: "Code will be made available."
Very rarely do these papers contain all the tricks needed to replicate the exact results presented. Usually, if you follow the paper to the letter you won’t get the same thing in the end. In particular training regimes and the particulars of data augmentation are often omitted “for brevity“.
I wonder with stuff like this, what happens if a self driving car is capable of processing reflections in glass windows? What if it sees a reflection of itself and is able to properly identify it as being itself? Does that make it self aware?
I'm being serious. People like to throw around terms like "self aware" with some assumption that it is a long way off, or impossible, to have a machine be self aware. But that would meet my definition. People will say "yeah but that's not what I mean." And I want to know what you actually mean, then.
Honestly nothing, to me. But other people think that ability to recognize oneself in a mirror is important.
To me the concept of self-awareness and consciousness is pretty much meaningless, especially if you are considering it something that machines don't have or can't have (or if they eventually do have it, we'll know).
The reason I mention it with this, and with self driving cars (which this particular system may not be fast or reliable for yet), is that for those people, it may register better because it seems more analogous to a human. With those robots you speak of, do they also recognize other robots? Do they have some sort of logic that knows that they are like those other robots in many ways, but in significant ways they are different (i.e. they have control over their own behavior but not over the others')?
The point is not that something particularly amazing is happening, the point is that it is getting easier to illustrate with real world examples that "self awareness" is not this magical thing we currently have no idea how it works.
I really want to try MaskRCNN with OpenImage instead of COCO - worried about training time though. I tried to train MaskRCNN on a K80 and it failed after a week.
Yea I mean transfer learning would bring over the first n convolutions so it would be faster than from scratch, but you still need the radiology data to get the last few steps.
Very cool! How is the performance? R-CNN used to be much slower comparing to YOLO or SSD; FCN seems to be very fast as well though requires a lot of GPU memory. Can that your version be used for realtime semantic segmentation?
This architecture is optimized for accuracy rather than speed. The official paper reports 200ms inferencing time per image on a GPU. This implementation is likely a bit slower because we use Python in a couple of layers. This is easy to optimize, but we haven't gotten around to it yet.
With that said, there are a lot of things you could do to make this much faster. For example, use ResNet50 instead of ResNet101. You can also reduce the number of anchors or the number of proposals to classify, and that should improve performance significantly at the expense of a little loss in accuracy.
The iPhone 7+, 8+, and X running iOS 11 have the ability to provide interpolated pixel depth maps for portrait photos, although it’s a fairly rough approximation. More details at https://developer.apple.com/documentation/avfoundation/avdep.... There’s also a really good video from WWDC this year that drills into it.
Surprised I still haven't seen a pytorch translation of deepmask / sharpmask. But glad to see atleast a tensorflow implementation. Will definitely try it out.
> Unfortunately, we could not fit the model into the GPU we have and there is some ambiguity in the paper as well, so we decided to stop the project and wait until the official code being released.
Evaluation code against MS COCO is included in the repository, both for bounding boxes and segmentation masks so it should be easy to run (but takes a long time).
We should publish more details, though. Thanks for bringing it up. Our implementation deviates a bit from the paper (as mentioned in the documentation), and optimizing for COCO was a 'nice to have' rather than being the main objective. We got pretty close to the reported numbers (within 3 to 4 percentage points) but that was with half the training steps compared to the paper. We'll try to add more details over the next few days.