Yeah, that's because for the most popular women (Ruby Rivera is currently around the 30th most-subscribed on the site, out of over 78,000), whenever they make a new Instagram post a bunch of people will rush to upload it, and the various resolutions/etc available means they won't always be blocked by the automatic duplicate detection.
I've been trying to think of a way to disincentivize this but haven't come up with anything simple and likely to work. If anyone has ideas I'm all ears.
I've previous also asked what criteria is used to detect dups.
This is an area where AI might possibly be useful to scan images to flag up possible dups but I don't know how practical or useful it would be for you here even if it were possible to implement. Otherwise uploads say, to the most popular models (if named as they are uploaded) get held before manual mod approval?
Can you please tell me How does the similar image detection work ? By meta data ?
If yes it will be difficult to disincentivize.
It’s by a library the creates a signature based on the image content. The challenge is that it is CPU intensive. Anything with a high CPU hit will impact site performance and hence degrade the experience to people visiting the site.
AI based solutions would likely impact the CPU even more.
Finding the right solution involves balancing many considerations and factors.
There are two ways we detect duplicates:
It's possible that many of those Leyvina uploads were in fact caught by the second one of these, but a mod hadn't yet had a chance to look at the image reports.
Beyond basic file hash, there isn’t. I suppose a check based of video length and model name could be used, but it may be a source of too many false positives and it wouldn’t deal with videos which are partial videos of another.
Could probably do exact length down to the millisecond, but anything falling on exact second marks would have to be ignored as too common (possibly also 10's of milliseconds?)
This looks like someone tried to play tic-tac-toe.