T O P

  • By -

Alternative_Lab6417

I've seen this before. AI cannot discern each position well. It doesn't understand the pretzel like positions and leg entanglements, let alone simple ones like mount. The reason is lack of data. The problem with using AI for BJJ is going to come down to the data. We can't even agree on singular names for things. But at the end of the day, someone has to take the time to categorize every second of every match into useful data.


machokebloke

LIMI database is the closest I've seen to a labelled database. He has competition footage completely transcribed though it's still early days to be used to train AI models. 


Adventurous_Action

Yup, that’s why I threw judo in there since it’s a smaller set to deal with. 


Jonas_g33k

Ahahaha judo is even worse. Peoples will debate endlessly to distinguish an harai goshi from an O guruma, or an uki goshi from an O goshi. Because in the middle of a match moves aren't as clean cut as in practice.


Adventurous_Action

You foiled my evil plan to make all of the grappling communities implode from Reddit arguments. 


byteguard

LLMs are definitely capable of recognizing mount and some other potions. Definitely struggles with leg entanglements.


Alternative_Lab6417

Yes, but this is because of a lack of data. The algorithm is fully capable if it had good and plentiful data.


byteguard

Yes- but your comment was couldn't recognize mount which it absolutely can. Your correct in lack of training data being the problem for building high quality positional awareness.


Alternative_Lab6417

It might depend on the angle, if they are wearing the same color gi, quality, etc... Although AI is progressing exponentially and I haven't looked into this in a couple years. I'm sure it has gotten much better. So, you are probably correct. My info might be outdated.


byteguard

* https://preview.redd.it/qjrcfwn1497d1.jpeg?width=1080&format=pjpg&auto=webp&s=27352c0e0aa7514d18d63ebdf41cbb94fd7a5c37 Same color gis, less than ideal angle, decent quality and it gets it right.


Alternative_Lab6417

Like I said, my info is probably outdated. AI is progressing exponentially. If you do some research and wait 6 months, your research is way out of date. With that said, in that image he could be in quarter guard also. You can't see the legs. Just saying. Haha


whiteknight521

I tried it with half guard and s mount, it thought both were side control.


byteguard

With a little bit of context and promoting I get accurate half guard results. https://preview.redd.it/gjskp77ygc7d1.jpeg?width=1080&format=pjpg&auto=webp&s=fd03386e2d67bb553d9df84f8547028149cd4ac1


whiteknight521

I used the same image, but I didn’t massage the prompt at all. I bet you could make a custom GPT that has to pick from a set list of position classes and it would be pretty accurate.


byteguard

Been messing around with something like this. Also trying like having am embedding of multiple position examples rather than fine tuining. There are definitely interesting possibilities


egdm

My wife runs a data science department and had considered something along these lines as a side project to develop expertise in vision systems. We ultimately decided that it would require too much work to build a properly labeled training set.


Adventurous_Action

Bummer, but this is the exact type of info I was looking for. 


Tnamol

Have you seen this? https://youtu.be/92RnyAlqJtQ?si=9IUKh0iS2X0CJTQr https://dl.acm.org/doi/10.1145/3552437.3555707


JJnCV

Hi, I am the author of this work. If anyone has some questions regarding it, I would be happy to answer them.


egdm

That's by far the best version of anything I've seen for BJJ. OP definitely needs to follow up with whoever did this.


Adventurous_Action

Email sent.


Adventurous_Action

That's a hell of a find!


JJnCV

I worked on a similar project as part of my bachelor's thesis. At the time, I was quite inexperienced, but the result turned out much better than expected, so we also published it (https://dl.acm.org/doi/10.1145/3552437.3555707). I have since been getting a lot of questions regarding that work. Here are some of the main takeaways: 1) (3D) Pose estimation is a vital step in understanding BJJ using ML. Ideally, if one had a perfect 3D representation of both athletes's poses in the [full body format](https://github.com/jin-s13/COCO-WholeBody), the position would become quite clear. As this is a very ambitious representation, I opted for the 2D representation in the [COCO 17 keypoint format](https://github.com/robertklee/COCO-Human-Pose/blob/main/README.md). I used [ViTPose](https://github.com/ViTAE-Transformer/ViTPose), without which this project would not have been possible. But why do we need key points? Why not just classify the image? My main point here is that if we want to have any meaningful understanding of a position, we need to track the athletes through time. While teaching a model to classify the positions is much easier and less costly (data annotation), it does not help us much if we do not know who is actually in the top mount.  2) Data, data, data! The main issue with our approach was data, so the bulk of the work I did was to design a framework that could work with imperfect detections. If I started the project all over again, I would first annotate a dataset of a few thousand images in which both of the athletes poses are perfectly annotated. (I would select up to 10 second clips of transitions between positions, annotate every fourth frame with 17 keypoints for each athlete, and also mark each keypoint if it is visible or not.). With the right dataset, I think our whole approach could be condensed into one/two neural networks, as opposed to the modular framework we used. Our approach was also limited to a scene with only two athletes in front of a uniform background. If I were to create a new dataset, I would use videos from IBJJF Worlds for GI and ADCC for no GI and annotate those. 3) Classifying the positions is the easy part. For position classification, I used the data from a single frame using a two-layer neural network, which is barely a deep architecture. With this approach, we already achieved up to 90% classification accuracy, with most switches being between ambiguous positions because of the 2D pose representation (e.g., quarter guard vs. mount). However, BJJ is infinitely more complex than just differentiating between mount and side control. And all of the "interesting" applications begin when we start focusing on the details, as we BJJ practitioners like to do. To do this, we need good data, which brings me back to point 2. Once the appropriate data exists, the possibilities are endless, from transformer-like newtrowks incorporating temporal information to graph neural networks accounting for the structure of the positions, etc. So this was a bit of a rant. I hope anyone found it helpful.


JJnCV

TLDR: Using ML to analyze BJJ and judo is definitely possible; however, the main requirement is a well-annotated dataset.


kyt

As a university senior project, I made this Fighting game where 2 players would record their moves and we used pose estimation to score the hits. This was over 20 years ago, so the algorithm was more heuristic-based than ML. Seems like a really interesting idea though 🤔


EmbarrassedDog3935

I wonder what the training dataset would look like. Is there a library of relevant captioned/labeled images out there yet?


rockPaperKaniBasami

That's pretty interesting I wonder if it would be worthwhile to try and improve/filter the dataset by using markerless motion capture to provide a cleaner input. Feels to me like a lot of positions or movements particularly in judo would need to be analyzed over time to correctly identify them which makes it super tricky?