模型加载器 API

1. 引言

有关引言和用例，请参见 explainer.md。

出于说明目的，API 和示例使用 TF Lite flatbuffer 格式。

2. API

enum MLModelFormat {
  // Tensorflow-lite flatbuffer。
  "tflite" 
};

enum MLDevicePreference {
  // 让后端选择最合适的设备。
  "auto",
  // 后端将使用 GPU 进行模型推理。如果某些算子不受
  // GPU 支持，则将回退到 CPU。
  "gpu",
  // 后端将使用 CPU 进行模型推理。
  "cpu"
};

enum MLPowerPreference {
  // 让后端选择最合适的行为。
  "auto",
  // 优先考虑执行速度，而不是功耗。
  "high-performance",
  // 优先考虑功耗，而不是执行速度等其他考虑因素。
  "low-power",
};

dictionary MLContextOptions {
  // 要使用的首选设备种类。
  MLDevicePreference devicePreference = "auto";

  // 与功耗相关的偏好。
  MLPowerPreference powerPreference = "auto";

  // 模型加载器 API 的模型格式。
  MLModelFormat modelFormat = "tflite";
  
  // 要使用的线程数量。
  // "0" 表示后端可以自动决定。
  unsigned long numThreads = 0;
};

[Exposed=Window]
interface ML {
  Promise<MLContext> createContext(optional MLContextOptions options = {});
};

enum MLDataType {
  // "Unknown" 并不表示 "unsupported"。后端可以支持比这里明确列出
  // 的类型更多的类型（例如，TfLite 有复数）。
  // 我们将它们视为 "unknown"，以避免一开始就暴露过多
  // 后端细节。
  "unknown",
  "int64",
  "uint64",
  "float64",
  "int32",
  "uint32",
  "float32",
  "int16",
  "uint16",
  "float16",
  "int8",
  "uint8",
  "bool",
};

dictionary MLTensor {
  required ArrayBufferView data;
  required sequence<unsigned long> dimensions;
};

dictionary MLTensorInfo {
  required DOMString name;
  required MLDataType type;
  required sequence<unsigned long> dimensions;
};

[SecureContext, Exposed=Window]
interface MLModel {
  Promise<record<DOMString, MLTensor>> compute(record<DOMString, MLTensor> inputs);      
  sequence<MLTensorInfo> inputs();
  sequence<MLTensorInfo> outputs();
};

[Exposed=Window]
interface MLModelLoader {
  constructor(MLContext context);
  Promise<MLModel> load(ArrayBuffer modelBuffer);
};

3. 示例

// 首先，创建一个 MLContext。这与 WebNN API 一致。并且我们将
// 添加两个新字段：“numThread” 和 "modelFormat"。 
const context = await navigator.ml.createContext(
                                     { devicePreference: "cpu",
                                       powerPreference: "low-power",
                                       numThread: 0,   // 默认值 0 表示
                                                       // “自动决定”。 
                                       modelFormat: "tflite" });
// 然后使用 ML 上下文创建模型加载器。
loader = new MLModelLoader(context);
// 在第一个版本中，我们只支持从 ArrayBuffer 加载模型。我们
// 相信这涵盖了大多数使用场景。Web 开发者可以下载
// 模型，例如通过 fetch API。将来如果确实需要，
// 我们可以添加新的 "load" 函数。
const modelUrl = 'https://path/to/model/file';
const modelBuffer = await fetch(modelUrl)
                            .then(response => response.arrayBuffer());
// 加载模型。
model = await loader.load(modelBuffer);
// 使用 `model.compute` 函数，从某些输入获得模型的输出。
// 使用此函数的示例方式包括：
// 1. 当模型只有一个输入张量时，可以直接输入该
// 张量，而不指定它的名称（用户仍可根据需要通过名称
// 指定此输入张量）。
z = await model.compute({ data: new Float32Array([10]), 
                          dimensions: [1]) });
// 2. 当存在多个输入张量时，用户必须通过其名称指定
// 输入张量的名称。
z = await model.compute({ x: { data: new Float32Array([10]), 
                               dimensions: [1] },
                          y: { data: new Float32Array([20]), 
                               dimensions: [1] } });
// 3. 客户端也可以指定输出张量。这与 WebNN API 一致，
// 并且可能很有用，例如当输出张量是 GPU 缓冲区时。此时，
// 该函数将返回一个空 promise。指定的输出张量维度必须
// 与模型输出张量的维度匹配。
z_buffer = ml.tensor({data: new Float64Array(1), 
                      dimensions: [1] });
await model.compute({ data: new Float32Array([10]), 
                      dimensions: [1] },
                    z_buffer);
// 对于输出张量：
// 与输入参数类似，如果只有一个输出张量，在情况 1 和 2 中，
// `compute` 函数会返回一个张量，而在情况 3 中不需要
// 指定输出张量的名称。但如果有多个输出张量，
// 情况 1 和 2 中的输出将是从张量名称到张量的映射，
// 而在情况 3 中，输出参数也必须是从张量名称到
// 张量的映射。
// 对于情况 1 和 2，实际输出数据的位置将取决于
// context：如果是 CPU context，输出张量的缓冲区将是 RAM 缓冲区；
// 如果 context 是 GPU context，输出张量的缓冲区将是 GPU
// 缓冲区。

模型加载器 API

摘要

本文档的状态

1. 引言

2. API

3. 示例

一致性

文档约定

索引

本规范定义的术语

通过引用定义的术语

参考文献

规范性参考文献

IDL 索引

模型加载器 API

摘要

本文档的状态

1. 引言

2. API

3. 示例

一致性

文档 约定

索引

本规范定义的术语

通过引用定义的术语

参考文献

规范性参考文献

IDL 索引

文档约定