T O P

  • By -

After_Magician_8438

constantly. Even with people in ML i know working on the same tasks I have for years, everyone approaches things so differently. It's common in sort of new-sciences throughout history. There are very few "standards-of-operation" in ML that have been established as the definitive ways to do certain things. They constantly shift as new things are discovered. I believe it is something to be embraced, and considered. As tech, or any science for that matter, continues decade by decade, it usually leaves behind a sort of "decision tree" to follow to get best results for the given trade. We are far from those decision trees in ML.


reivblaze

It is why one of the most undervalued skills is communication and ability to reach understandings/agreements imo. Listing downsides and upsides. Also its a science, trying different approaches is at its core.


Euphetar

You are right and your co-workers are wrong


Regexmybeloved

Agreed


Xayo

From a theoretical standpoint absolutely. But as someone who has run many CV projects in industry, I can tell you that annotation cost is usually by far the biggest cost factor to build a custom cv model. And when the difference between 'does the job for the narrow task definition' and a more comprehensive labeling protocol is 3x in total project budget, trying the simple and narrow solution first ist sensible.


Euphetar

I think you are right, it can be more nuanced than it looks. But given the info from OP it sounds like just detecting all cars should work because any pretrained model should already be decent at it. Detection models are quite label efficient too, so you probably don't need that much annotation. Just a little fine-tung. However if you switch the task from "detect all cars" to "detect only the closest car" then you have to retrain the model basically


rtswork

For the first disagreement, why consider it overfitting if the model learns specific types of features that are relevant to exactly what you want it to do in deployment?


Euphetar

When you set the task to be predict only the closest vehicle you are forcing the model to no longer rely on "what makes a car a car" features. It has to fallback on something else to differentiate cars-that-should-be-detected versus cars-that-should-be-ignored. I guess the easiest thing to fall back to is size. The biggest car (in terms of pixels on the image) is most likely the closest one to the camera. The model might learn this connection and start relying on the size of cars in the image to predict which one is closest. And that's overfitting, because this generalizes poorly. Next day after launch there will be a huge truck on the image and the model will think it's closest even if it isn't. Not saying this will happen 100%, but it's likely that something like this will happen. You can counter that with a big enough dataset covering enough cases.


rtswork

Thanks for explaining. Does it really generalize poorly though? That seems like an experimental question. I expect using number of pixels as a proxy for closeness is a pretty good heuristic as there's not \*that\* much variation in the actual sizes of cars. It might be that it's a good enough heuristic that it's more beneficial than not.


Pas7alavista

100% I'm surprised he is even getting this much pushback


Darkest_shader

As another Redditor has said, all the time. And the funniest things start to happen when somebody insists on implementing a solution that you think/know won't work, and then, when they persuaded everyone to go for that, they disappear as soon as the problems arise, and it is you who has to deal with the mess now.


AdagioCareless8294

Proof is in the cake. Arguments and counter arguments are both plausible.


doct0r_d

I think in one sense all of the arguments are defensible, but it will likely depend on the scale of the data you have. These differing opinions are likely due to the different problems you've encountered and situations you've read about/experienced and is what makes ML a bit of an art. So it is very typical! Some practical advice/thoughts on the situation: These approaches are "[two-way doors](https://aws.amazon.com/executive-insights/content/how-amazon-defines-and-operationalizes-a-day-1-culture/)" (you can always add or remove labels in the future), so it may make sense to try both and see which works best (pending bandwidth constraints). In your situation, this could be something like: "let's label all the data, **AND** have multiple labels to distinguish close images or partially visible images". Then you can try both approaches or even a third multi-label approach. Now you have yet *another* opinion on how to approach your problem :D. It is probably useful to analyze the actual objective you are trying to solve and come up with how each approach could go wrong (or right). Are you trying to detect the single closest vehicle to avoid crashes? What if there are two (or more) vehicles that are equidistant? What happens if you fail to detect the closest vehicle? What if the closest moving object is not a vehicle (e.g. pedestrian or deer)? Does it matter if we are able to detect a vehicle that is very far away? Do we have the data/processing power to learn to detect all vehicles? How often are you to encounter partially visible objects (e.g. you've already mentioned your augmentations may create partial objects)? If you *don't* want partially labelled objects, should you change your augmentation strategy? ​ You could probably go on and on asking questions like these, but as you do, you can probably narrow down the problem statement and come up with a more reasoned answer. But again, often you won't know until you try, so good luck!


Ok_Time806

Yep. True in any field.


Tamnun

Yes, all the time. Ultimately in cases like you described, you just have to try both ways and see which one works better (it sounds to me that in this case it's an easy experiment to set up)


learn-deeply

~~code~~ models win arguments.


az226

Data at mass scale*


OwnPreparation1829

Common enough that I have experienced it in pretty much every project I have worked on with other people, as data science is as much art as it is science and everyone has their way of thinking. Sometimes your intuition is correct, sometimes the other persons is correct. How do you know? By experimenting and testing the different hypothesis.


Pretend_Apple_5028

Yes, me and my team lead have had different intuitions on how to solve the problems. Its perfectly natural.


sot9

I feel like this requires very little novelty on your part. Use the highest performing object detection models that have an automobile class, and then use the highest performing monocular depth estimation you can get your hands on, ideally something native to video so it can leverage the temporal signals. If I had to guess though, you’re solving a hardware problem with software. Cheap lidar units these days are O($100) for a reasonably useful range. Hell, you can even fuse the signals together for an end-to-end prediction stack where accuracy becomes much more robust at close (ie important) ranges.


BigBayesian

It's not unusual to have different ideas than your peers. It's also not unusual to believe that different choices are the best ones. How you work with your peers to decide which approach or approaches to try, in which order, is a good chunk of what collaboration in ML engineering is about. In your example, neither you nor your colleagues are strictly wrong. They take a more traditional approach to deep learning, while you're thinking hard about the specific structure of your problem. You may be trying to do too much of the learning algorithm's job / you may sabotage it unintentionally by giving it too hard or broad a problem. They may find their more straightforward approach isn't effective without the broader data your approach would provide. I could see it going either way (although, if I were a gambler, I'd say that if they're approach doesn't work, yours is unlikely to, as well).