Hello,
So, i am working on a bit of a complex problem where I want to detect faults of a system using sensors data including cameras.
Since i only need two states pass / fail as output, I directed my attention to contrastive learning (specificlly self-supervised) version of it.
I am currently facing a challenge in defining the positive and negative pairs for training. In principle, I should use the image and its augmented versions as +ve pairs and the image and all other images as -ve. However, this is problematic as the other images may not be dissimilar.
So, any suggestions on how to tackle this problem ?
Thanks
You mean that even if the pairs are inaccurate… the network will handle that during tunning ?