Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
TensorFlow Mask R-CNN code for pixelwise object detection and segmentation (github.com/matterport)
159 points by llebttam on Nov 1, 2017 | hide | past | favorite | 40 comments


God bless people who implement models from academic articles that should frankly include them to begin with. What's more is that this implementation has clear instructions for extending this to your datasets.


To be fair, the academic article did include the code : https://github.com/facebookresearch/deepmask

It was just in Lua.


Deepmask is something else.


You are right, it's a different paper. But they both do instance segmentation and both build upon faster r cnn.


[edit] > people who implement models from academic articles

Is anyone collecting the various HN discussions as these pop up? I would appreciate help finding them again.


If you're looking for a way to find the implementations again I recommend http://www.gitxiv.com/


this should be the standard way for publishing


Any AI discussions in general, or something more specific? This sounds like a fun idea that’s worth trying.


> academic articles that should frankly include them to begin with.

Boo hoo. Researchers tell the world for free exactly how to implement their state-of-the-art work (which probably cost north of $1M to develop) and promise to release code, and we're indignant because they didn't do it quite fast enough for us.


(1) It's not free when the research is funded by public tax dollars (which I grant is not always, but often the case).

(2) I'm not aware of this promise you're speaking of... my understanding is that authors of papers are under no obligation to produce any implementation, let alone usable, documented implementations.


> (1) It's not free when the research is funded by public tax dollars (which I grant is not always, but often the case).

The work was done at FAIR (Facebook), so that's not applicable here.

> (2) I'm not aware of this promise you're speaking of... my understanding is that authors of papers are under no obligation to produce any implementation, let alone usable, documented implementations.

Look at the paper [1], it's the last sentence of the abstract: "Code will be made available."

[1] https://research.fb.com/wp-content/uploads/2017/08/maskrcnn....


They said at ICCV that they will release the code after the CVPR paper submission deadline, so "soon".


Very rarely do these papers contain all the tricks needed to replicate the exact results presented. Usually, if you follow the paper to the letter you won’t get the same thing in the end. In particular training regimes and the particulars of data augmentation are often omitted “for brevity“.


This is cool.

I wonder with stuff like this, what happens if a self driving car is capable of processing reflections in glass windows? What if it sees a reflection of itself and is able to properly identify it as being itself? Does that make it self aware?

I'm being serious. People like to throw around terms like "self aware" with some assumption that it is a long way off, or impossible, to have a machine be self aware. But that would meet my definition. People will say "yeah but that's not what I mean." And I want to know what you actually mean, then.


Many robots already have cameras pointed at themselves. What does that change really?


Honestly nothing, to me. But other people think that ability to recognize oneself in a mirror is important.

To me the concept of self-awareness and consciousness is pretty much meaningless, especially if you are considering it something that machines don't have or can't have (or if they eventually do have it, we'll know).

The reason I mention it with this, and with self driving cars (which this particular system may not be fast or reliable for yet), is that for those people, it may register better because it seems more analogous to a human. With those robots you speak of, do they also recognize other robots? Do they have some sort of logic that knows that they are like those other robots in many ways, but in significant ways they are different (i.e. they have control over their own behavior but not over the others')?

The point is not that something particularly amazing is happening, the point is that it is getting easier to illustrate with real world examples that "self awareness" is not this magical thing we currently have no idea how it works.


I really want to try MaskRCNN with OpenImage instead of COCO - worried about training time though. I tried to train MaskRCNN on a K80 and it failed after a week.


Thank you! This was on my list of things to implement but I hadn't had time yet, nor was I really looking forward to the custom op implementation. :)

Great job.


Wow, this is fascinating. From a radiology perspective, it could be the missing method for segmenting findings inside a convoluted radiograph.


Yes, and that's possible now with many different CNNs. The limiting factor is the training/validation/test data in your subject.

For example you couldn't implement Mask R-CNN with the COCO dataset as implemented here and get inference on your radiology problem set.


Would transfer learning help with that?


Yea I mean transfer learning would bring over the first n convolutions so it would be faster than from scratch, but you still need the radiology data to get the last few steps.


Very cool! How is the performance? R-CNN used to be much slower comparing to YOLO or SSD; FCN seems to be very fast as well though requires a lot of GPU memory. Can that your version be used for realtime semantic segmentation?


This architecture is optimized for accuracy rather than speed. The official paper reports 200ms inferencing time per image on a GPU. This implementation is likely a bit slower because we use Python in a couple of layers. This is easy to optimize, but we haven't gotten around to it yet.

With that said, there are a lot of things you could do to make this much faster. For example, use ResNet50 instead of ResNet101. You can also reduce the number of anchors or the number of proposals to classify, and that should improve performance significantly at the expense of a little loss in accuracy.


Can someone add depth to the training set? I'd like depth estimates for objects in the frame too. It could be interesting to fly into a video.

Does the iPhone 8 have rgbd now for short range? Maybe someday we could get pixel by pixel depth estimates?


This exists in several different implementations and would be a separate DNN.

http://cs231n.stanford.edu/reports/2017/pdfs/203.pdf


The iPhone 7+, 8+, and X running iOS 11 have the ability to provide interpolated pixel depth maps for portrait photos, although it’s a fairly rough approximation. More details at https://developer.apple.com/documentation/avfoundation/avdep.... There’s also a really good video from WWDC this year that drills into it.


It seems that to train this, you need to input precise masks.

I'm wondering if one day it would be possible to train a network without masks (just a classifier), and it will figure out the masks by itself.


Surprised I still haven't seen a pytorch translation of deepmask / sharpmask. But glad to see atleast a tensorflow implementation. Will definitely try it out.



> Unfortunately, we could not fit the model into the GPU we have and there is some ambiguity in the paper as well, so we decided to stop the project and wait until the official code being released.


This looks great! Thanks for releasing this.


Anyone tried the inference speed?


Just as I moved to PyTorch, they implement this for keras and tensorflow :(


The repository includes:

Pre-trained weights for MS COCO

can't seem to find them anywhere.


They're in the "Releases" section. It's a 250MB .h5 Keras file.


How do they measure accuracy?


no performance evaluation on MSCOCO


Evaluation code against MS COCO is included in the repository, both for bounding boxes and segmentation masks so it should be easy to run (but takes a long time).

We should publish more details, though. Thanks for bringing it up. Our implementation deviates a bit from the paper (as mentioned in the documentation), and optimizing for COCO was a 'nice to have' rather than being the main objective. We got pretty close to the reported numbers (within 3 to 4 percentage points) but that was with half the training steps compared to the paper. We'll try to add more details over the next few days.


Did you create an account just to criticize the original posters work?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: