A neural network architecture consisting of two or more identical subnetworks with shared weights, designed to learn similarity functions between inputs. Siamese networks are used in multimodal systems for matching, verification, and few-shot learning tasks.
A Siamese network processes two inputs through identical encoder subnetworks that share the same weights. Each input is mapped to an embedding vector, and a distance metric (cosine similarity or Euclidean distance) measures how similar the two embeddings are. The network is trained so that similar pairs produce close embeddings and dissimilar pairs produce distant ones.
Siamese networks are trained using contrastive loss or triplet loss on pairs or triplets of examples. The shared-weight architecture ensures consistent embedding computation regardless of input order. At inference time, embeddings can be pre-computed and cached for one side, making similarity search efficient. Siamese networks generalize well to unseen classes, making them ideal for few-shot and zero-shot scenarios.