This is an improved version of the gesture-based picture taking concept, using modern MediaPipe Hand Landmark Detection instead of the original pixel-level skin classification approach.
MediaPipe tracks 21 hand landmarks per hand in real time, enabling precise fingertip position tracking. The gesture is recognized when both hands’ fingertips are close together below the face, avoiding the need for skin color classification or region labeling.
Bring both hands together below your face to trigger the capture. The system fires when at least 2 of 3 fingertip pairs (thumb, index, middle) are close together, wrists are separated, and fingertips are above the wrists — forming a similar triangular shape as the original, but detected via landmark geometry instead of skin-color classification.