Skip to main content

Logging Experiment and Model Metrics

ModelBox integrates with metrics storage services to store training hardware, experiment and model metrics.

Python SDK

Metrics can be logged against any object in ModelBox - models, experiments, specific model versions, etc. A MetricValue is logged for the object id at a given timestamp.

API

  • MetricValue
class MetricValue:
step: int
wallclock_time: int
value: Union[float, str, bytes]

The value could be a float to represent a scaler value or bytes or strings to represent serialized tensors. The step is optional and should be a real number if it represents the logical step at a given time of an experiment. The wallclock time is the physical clock time at which the metric was logged.

  • SDK API
log_metrics(self, parent_id: str, key: str, value: MetricValue)

gRPC API

// Log Metrics for an experiment, model or checkpoint
rpc LogMetrics(LogMetricsRequest) returns (LogMetricsResponse);

// Get metrics logged for an experiment, model or checkpoint.
rpc GetMetrics(GetMetricsRequest) returns (GetMetricsResponse);

// Metrics contain the metric values for a given key
message Metrics {
string key = 1;

repeated MetricsValue values = 2;
}

// Metric Value at a given point of time.
message MetricsValue {
uint64 step = 1;

uint64 wallclock_time = 2;

oneof value {
float f_val = 5;

string s_tensor = 6;

bytes b_tensor = 7;
}
}

// Message for logging a metric value at a given time
message LogMetricsRequest {
string parent_id = 1;

string key = 2;

MetricsValue value = 3;
}

message LogMetricsResponse {}

message GetMetricsRequest {
string parent_id = 1;
}

message GetMetricsResponse {
repeated Metrics metrics = 1;
}