CLI

Generator

The generator subcommand is a traffic generator for schema backed topics.

Usage: tansu generator [OPTIONS] --schema-registry <SCHEMA_REGISTRY> <TOPIC>

Arguments:
  <TOPIC>  The topic to generate messages into

Options:
      --broker <BROKER>
          The URL of the broker to produce messages into [env: ADVERTISED_LISTENER_URL=tcp://192.168.1.50:9092] [default: tcp://localhost:9092]
      --partition <PARTITION>
          The partition to produce messages into [default: 0]
      --schema-registry <SCHEMA_REGISTRY>
          Schema registry examples are: file://./etc/schema or s3://tansu/, containing: topic.json, topic.proto or topic.avsc [env: SCHEMA_REGISTRY=file://./etc/schema]
      --batch-size <BATCH_SIZE>
          Message batch size used by every producer [default: 1]
      --per-second <PER_SECOND>
          The maximum number of messages per second
      --throughput <THROUGHPUT>
          Message throughput
      --producers <PRODUCERS>
          The number of producers generating messages [default: 1]
      --duration-seconds <DURATION_SECONDS>
          Stop sending messages after this time
      --otlp-endpoint-url <OTLP_ENDPOINT_URL>
          OTEL Exporter OTLP endpoint [env: OTEL_EXPORTER_OTLP_ENDPOINT=]
  -h, --help
          Print help

The generator subcommand will automatically load environment variables from a file named .env in the current directory or any of its parents.

Example

This example uses a customer topic backed by a Protocol Buffer schema etc/schema/customer.proto.

The Value of the Kafka message contains the detail of the customer:

message Value {
    string email_address = 1 [(generate).script = "safe_email()"];
    string full_name = 2 [(generate).script = "first_name() + ' ' + last_name()"];
    Address home = 3;
    repeated string industry = 4 [(generate).repeated = {script: "industry()", range: {min: 1, max: 3}}];
}

Tansu uses FieldOption metadata to embed rhai scripts into protocol buffer schemas to generate fake data for a topic.

A customer has:

  • an email_address that is generated by the function safe_email()
  • a full_name combining the result of first_name() and last_name()
  • a home address that is covered separately below
  • an industry that is random list of between 1 and 3 calls to industry()

The generator is defined in the schema as:

import "google/protobuf/descriptor.proto";

extend google.protobuf.FieldOptions {
  Generator generate = 51215;
}

message Generator {
    oneof apply {
        bool skip = 1;
        string script = 2;
        Repeated repeated = 3;
    }
}

message Repeated {
    oneof size {
        uint32 len = 1;
        Range range = 2;
    }
    string script = 3;
}

message Range {
    uint32 min = 1;
    uint32 max = 2;
}

A customer has an Address defined in the schema as:

message Address {
    string building_number = 1 [(generate).script = "building_number()"];
    string street_name = 2 [(generate).script = "street_name()"];
    string city = 3 [(generate).script = "city_name()"];
    string post_code = 4 [(generate).script = "post_code()"];
    string country_name = 5 [(generate).script = "country_name()"];
}

An Address uses:

The generate_message_kind function in Tansu, shows how simple it is to register the fake data functions with the rhai scripting engine.

For reference, the full schema for customer is here.

We can generate test data using the generator subcommand:

tansu generator --schema-registry=file://./etc/schema \
                --per-second=160 \
                --producers=8 \
                --batch-size=20 \
                --duration-seconds=180 \
                customer

The generator uses the generic cell rate algorithm from the governor crate to limit the rate of message generation. In this example, generating 160 messages per second, using 8 producers with a batch size of 20 messages for a duration of 3 minutes.

You can fetch messages from topics using the cat sub command, decoding Avro, Protobuf or JSON Schema using the schema registry into JSON:

tansu cat consume --schema-registry=file://./etc/schema customer

Will return a series of JSON encoded customer messages showing the generated fake data:

[{"key":null,
  "value": {
    "emailAddress":"dedric@example.org",
    "fullName":"Shawn Auer",
    "home": {
        "buildingNumber":"108",
        "city":"Howell view",
        "countryName":"Saint Martin",
        "postCode":"66180-9718",
        "streetName":"Huel Green"},
    "industry":["Restaurants","Newspapers"]}}]
Previous
cat
Next
proxy