Mixpeek Logo
    Models/Text Extraction/Salesforce/codet5p-110m-embedding
    HFCode Extractionbsd-3-clause

    codet5p-110m-embedding

    by Salesforce

    Unified code understanding and generation with T5 architecture

    102Kdl/month
    68likes
    110Mparams
    Identifiers
    Model ID
    Salesforce/codet5p-110m-embedding
    Feature URI
    mixpeek://document_extractor@v1/salesforce_codet5p_v1

    Overview

    CodeT5+ is a family of encoder-decoder code LLMs that support both understanding and generation tasks. The 110M embedding variant is optimized for producing high-quality code embeddings for retrieval.

    On Mixpeek, CodeT5+ provides an alternative to CodeBERT for code embedding extraction, with support for more programming languages and stronger performance on code search tasks.

    Architecture

    T5-based encoder-decoder. The 110M embedding variant uses only the encoder, trained with contrastive learning on code-text pairs. Supports 10+ programming languages.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    
    await mx.collections.ingest({
      collection_id: "my-collection",
      source: { url: "https://example.com/codebase.zip" },
      feature_extractors: [{
        name: "code_extraction",
        version: "v1",
        params: {
          model_id: "Salesforce/codet5p-110m-embedding"
        }
      }]
    });

    Capabilities

    • High-quality code embeddings for retrieval
    • 10+ programming language support
    • Code-to-text and text-to-code generation
    • Compact model size (110M params)

    Use Cases on Mixpeek

    Code search across technical documentation and repositories
    Code snippet recommendation based on natural language
    Cross-language code similarity matching

    Specification

    FrameworkHF
    OrganizationSalesforce
    FeatureCode Extraction
    Outputcode + language
    Modalitiesdocument
    RetrieverCode Search
    Parameters110M
    Licensebsd-3-clause
    Downloads/mo102K
    Likes68

    Research Paper

    CodeT5+: Open Code Large Language Models for Code Understanding and Generation

    arxiv.org

    Build a pipeline with codet5p-110m-embedding

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder