Skip to main content

Faker Dataset

Generate a deterministic set of user records with correlated person and location fields, using all of the Faker provider's generator kinds.

Source

examples/faker-dataset/rank.toml
manifestVersion = 1

[package]
name = "faker-dataset"
version = "0.1.0"
source = "."

[providers]
faker = { registry = "npm", package = "@rank-lang/plugin-faker", version = "0.1.0" }
examples/faker-dataset/main.rank
/// Generates a deterministic seed dataset of users using the Faker provider.
/// Shows correlated Person and Location projections, all generator kinds
/// (UUID, Int, Boolean, DateTime, Person, Location), and multi-record
/// collection generation with Emit::manifest.

use faker::{ Boolean, DateTime, Generate, Int, Location, Person, UUID }
use faker::types::{ Spec }
use std::Time
use std::list::{ range }
use std::collections::{ map }
use std::Emit
use std::Path

/// --- Types ---

Address = Object {
street: string,
city: string,
country: string,
}

User = Object {
id: string,
firstName: string,
lastName: string,
email: string,
age: number,
address: Address,
role: string,
active: bool,
createdAt: Time::Instant,
score: number,
}

/// --- Generator descriptors ---

seed = 1

refDate: Time::Instant = Time::parse(`2025-01-01T00:00:00Z`, { format: `RFC3339` })

person = Person { sex: `female` }
location = Location {}

userSpec: Spec<User> = {
id: UUID {},
firstName: person.firstName,
lastName: person.lastName,
email: person.email,
age: Int { min: 18, max: 65 },
address: {
street: location.street,
city: location.city,
country: location.country,
},
role: `member`,
active: Boolean { probability: 0.8 },
createdAt: DateTime {
past: { years: 3 },
},
score: Int { min: 0, max: 1000 },
}

/// --- Generate 20 users ---

count = 20

users: [User] = range(count)
|> map(|_, ctx| Generate<User> {
spec: userSpec,
seed,
itemKey: ctx.index,
locale: `en_US`,
refDate,
})

/// --- Emit ---

pub main = || Emit::manifest({
entries: [
{
path: Path::join([`users.json`]),
format: `json`,
value: users,
},
]
})

Output

// users.json (first two records shown)
[
{
"id": "a3f1c2d4-...",
"firstName": "Sophia",
"lastName": "Martinez",
"email": "Sophia.Martinez@example.com",
"age": 34,
"address": {
"street": "742 Evergreen Terrace",
"city": "Springfield",
"country": "United States"
},
"role": "member",
"active": true,
"createdAt": { "epochMillis": 1672531200000, "offsetMinutes": 0 },
"score": 812
},
{
"id": "b7e8d9f0-...",
"firstName": "Olivia",
"lastName": "Chen",
"email": "Olivia.Chen@example.com",
"age": 27,
"address": {
"street": "18 Oak Lane",
"city": "Portland",
"country": "United States"
},
"role": "member",
"active": false,
"createdAt": { "epochMillis": 1698796800000, "offsetMinutes": 0 },
"score": 441
}
]

Key concepts

  • Person projectionsperson.firstName, person.lastName, person.email, person.age, person.address are correlated: they all come from the same generated person, so the email matches the name on every record.
  • Location projectionslocation.street, location.city, location.country are similarly correlated from a single generated location.
  • Spec<T> — the spec mirrors the shape of User. Literal values (role: \member``) are passed through unchanged; generator descriptors are evaluated per record.
  • seed + itemKeyseed fixes the base RNG; itemKey: ctx.index derives a distinct sub-stream per record, so record 0 and record 1 are always different but always the same across runs.
  • refDate — anchors relative date generators like DateTime { past: { years: 3 } } to a fixed point in time, keeping the dataset stable even as the current date changes.
  • Boolean { probability: 0.8 } — approximately 80% of users will have active: true.

Run it

rank examples/faker-dataset --file-root out/

Writes out/users.json. Requires the @rank-lang/plugin-faker package — see the Faker Provider docs for installation.