Logging Experiment and Model Metrics

ModelBox integrates with metrics storage services to store training hardware, experiment and model metrics.

Python SDK

Metrics can be logged against any object in ModelBox - models, experiments, specific model versions, etc. A MetricValue is logged for the object id at a given timestamp.


  • MetricValue
class MetricValue:
step: int
wallclock_time: int
value: Union[float, str, bytes]

The value could be a float to represent a scaler value or bytes or strings to represent serialized tensors. The step is optional and should be a real number if it represents the logical step at a given time of an experiment. The wallclock time is the physical clock time at which the metric was logged.

log_metrics(self, parent_id: str, key: str, value: MetricValue)


// Log Metrics for an experiment, model or checkpoint
rpc LogMetrics(LogMetricsRequest) returns (LogMetricsResponse);

// Get metrics logged for an experiment, model or checkpoint.
rpc GetMetrics(GetMetricsRequest) returns (GetMetricsResponse);

// Metrics contain the metric values for a given key
message Metrics {
string key = 1;

repeated MetricsValue values = 2;

// Metric Value at a given point of time.
message MetricsValue {
uint64 step = 1;

uint64 wallclock_time = 2;

oneof value {
float f_val = 5;

string s_tensor = 6;

bytes b_tensor = 7;

// Message for logging a metric value at a given time
message LogMetricsRequest {
string parent_id = 1;

string key = 2;

MetricsValue value = 3;

message LogMetricsResponse {}

message GetMetricsRequest {
string parent_id = 1;

message GetMetricsResponse {
repeated Metrics metrics = 1;