TensorFlow Mask R-CNN code for pixelwise object detection and segmentation

jszymborski · on Nov 2, 2017

God bless people who implement models from academic articles that should frankly include them to begin with. What's more is that this implementation has clear instructions for extending this to your datasets.

naveen99 · on Nov 2, 2017

To be fair, the academic article did include the code : https://github.com/facebookresearch/deepmask

It was just in Lua.

db1024 · on Nov 2, 2017

Deepmask is something else.

naveen99 · on Nov 2, 2017

You are right, it's a different paper. But they both do instance segmentation and both build upon faster r cnn.

j_s · on Nov 2, 2017

[edit] > people who implement models from academic articles

Is anyone collecting the various HN discussions as these pop up? I would appreciate help finding them again.

iandanforth · on Nov 2, 2017

If you're looking for a way to find the implementations again I recommend http://www.gitxiv.com/

singularity2001 · on Nov 2, 2017

this should be the standard way for publishing

Choco31415 · on Nov 2, 2017

Any AI discussions in general, or something more specific? This sounds like a fun idea that’s worth trying.

throwaway287391 · on Nov 2, 2017

> academic articles that should frankly include them to begin with.

Boo hoo. Researchers tell the world for free exactly how to implement their state-of-the-art work (which probably cost north of $1M to develop) and promise to release code, and we're indignant because they didn't do it quite fast enough for us.

jszymborski · on Nov 2, 2017

(1) It's not free when the research is funded by public tax dollars (which I grant is not always, but often the case).

(2) I'm not aware of this promise you're speaking of... my understanding is that authors of papers are under no obligation to produce any implementation, let alone usable, documented implementations.

throwaway287391 · on Nov 2, 2017

> (1) It's not free when the research is funded by public tax dollars (which I grant is not always, but often the case).

The work was done at FAIR (Facebook), so that's not applicable here.

> (2) I'm not aware of this promise you're speaking of... my understanding is that authors of papers are under no obligation to produce any implementation, let alone usable, documented implementations.

Look at the paper [1], it's the last sentence of the abstract: "Code will be made available."

[1] https://research.fb.com/wp-content/uploads/2017/08/maskrcnn....

haeffin · on Nov 2, 2017

They said at ICCV that they will release the code after the CVPR paper submission deadline, so "soon".

0xbear · on Nov 2, 2017

Very rarely do these papers contain all the tricks needed to replicate the exact results presented. Usually, if you follow the paper to the letter you won’t get the same thing in the end. In particular training regimes and the particulars of data augmentation are often omitted “for brevity“.

robbrown451 · on Nov 2, 2017

This is cool.

I wonder with stuff like this, what happens if a self driving car is capable of processing reflections in glass windows? What if it sees a reflection of itself and is able to properly identify it as being itself? Does that make it self aware?

I'm being serious. People like to throw around terms like "self aware" with some assumption that it is a long way off, or impossible, to have a machine be self aware. But that would meet my definition. People will say "yeah but that's not what I mean." And I want to know what you actually mean, then.

halflings · on Nov 2, 2017

Many robots already have cameras pointed at themselves. What does that change really?

robbrown451 · on Nov 2, 2017

Honestly nothing, to me. But other people think that ability to recognize oneself in a mirror is important.

To me the concept of self-awareness and consciousness is pretty much meaningless, especially if you are considering it something that machines don't have or can't have (or if they eventually do have it, we'll know).

The reason I mention it with this, and with self driving cars (which this particular system may not be fast or reliable for yet), is that for those people, it may register better because it seems more analogous to a human. With those robots you speak of, do they also recognize other robots? Do they have some sort of logic that knows that they are like those other robots in many ways, but in significant ways they are different (i.e. they have control over their own behavior but not over the others')?

The point is not that something particularly amazing is happening, the point is that it is getting easier to illustrate with real world examples that "self awareness" is not this magical thing we currently have no idea how it works.

AndrewKemendo · on Nov 2, 2017

I really want to try MaskRCNN with OpenImage instead of COCO - worried about training time though. I tried to train MaskRCNN on a K80 and it failed after a week.

Cacti · on Nov 2, 2017

Thank you! This was on my list of things to implement but I hadn't had time yet, nor was I really looking forward to the custom op implementation. :)

Great job.

aabajian · on Nov 2, 2017

Wow, this is fascinating. From a radiology perspective, it could be the missing method for segmenting findings inside a convoluted radiograph.

AndrewKemendo · on Nov 2, 2017

Yes, and that's possible now with many different CNNs. The limiting factor is the training/validation/test data in your subject.

For example you couldn't implement Mask R-CNN with the COCO dataset as implemented here and get inference on your radiology problem set.

adyus · on Nov 2, 2017

Would transfer learning help with that?

AndrewKemendo · on Nov 2, 2017

Yea I mean transfer learning would bring over the first n convolutions so it would be faster than from scratch, but you still need the radiology data to get the last few steps.

bitL · on Nov 2, 2017

Very cool! How is the performance? R-CNN used to be much slower comparing to YOLO or SSD; FCN seems to be very fast as well though requires a lot of GPU memory. Can that your version be used for realtime semantic segmentation?

waleedka · on Nov 2, 2017

This architecture is optimized for accuracy rather than speed. The official paper reports 200ms inferencing time per image on a GPU. This implementation is likely a bit slower because we use Python in a couple of layers. This is easy to optimize, but we haven't gotten around to it yet.

With that said, there are a lot of things you could do to make this much faster. For example, use ResNet50 instead of ResNet101. You can also reduce the number of anchors or the number of proposals to classify, and that should improve performance significantly at the expense of a little loss in accuracy.

state_less · on Nov 2, 2017

Can someone add depth to the training set? I'd like depth estimates for objects in the frame too. It could be interesting to fly into a video.

Does the iPhone 8 have rgbd now for short range? Maybe someday we could get pixel by pixel depth estimates?

AndrewKemendo · on Nov 2, 2017

This exists in several different implementations and would be a separate DNN.

http://cs231n.stanford.edu/reports/2017/pdfs/203.pdf

OkGoDoIt · on Nov 2, 2017

The iPhone 7+, 8+, and X running iOS 11 have the ability to provide interpolated pixel depth maps for portrait photos, although it’s a fairly rough approximation. More details at https://developer.apple.com/documentation/avfoundation/avdep.... There’s also a really good video from WWDC this year that drills into it.

amelius · on Nov 2, 2017

It seems that to train this, you need to input precise masks.

I'm wondering if one day it would be possible to train a network without masks (just a classifier), and it will figure out the masks by itself.

naveen99 · on Nov 2, 2017

Surprised I still haven't seen a pytorch translation of deepmask / sharpmask. But glad to see atleast a tensorflow implementation. Will definitely try it out.

technics256 · on Nov 2, 2017

Been out for a couple months:

https://github.com/felixgwu/mask_rcnn_pytorch

naveen99 · on Nov 2, 2017

> Unfortunately, we could not fit the model into the GPU we have and there is some ambiguity in the paper as well, so we decided to stop the project and wait until the official code being released.

bradneuberg · on Nov 2, 2017

This looks great! Thanks for releasing this.

m3kw9 · on Nov 2, 2017

Anyone tried the inference speed?

amelius · on Nov 2, 2017

Just as I moved to PyTorch, they implement this for keras and tensorflow :(

curiousgal · on Nov 2, 2017

The repository includes:

Pre-trained weights for MS COCO

can't seem to find them anywhere.

waleedka · on Nov 2, 2017

They're in the "Releases" section. It's a 250MB .h5 Keras file.

amelius · on Nov 2, 2017

How do they measure accuracy?

DFASDF · on Nov 2, 2017

no performance evaluation on MSCOCO

waleedka · on Nov 2, 2017

Evaluation code against MS COCO is included in the repository, both for bounding boxes and segmentation masks so it should be easy to run (but takes a long time).

We should publish more details, though. Thanks for bringing it up. Our implementation deviates a bit from the paper (as mentioned in the documentation), and optimizing for COCO was a 'nice to have' rather than being the main objective. We got pretty close to the reported numbers (within 3 to 4 percentage points) but that was with half the training steps compared to the paper. We'll try to add more details over the next few days.

iaw · on Nov 2, 2017

Did you create an account just to criticize the original posters work?