Squiggle logoSquiggle
Ecosystem

Style Guide

This is a style guide for how to write good Squiggle code. It's mostly designed to help LLMs write good Squiggle code, but could be useful for humans as well.


Data and Calculations

Estimations

  • When using the to format, like 3 to 10, remember that this represents the 5th and 95th percentile. This is a very large range. Be paranoid about being overconfident and too narrow in your estimates.
  • One good technique, when you think there's a chance that you might be very wrong about a variable, is to use a mixture that contains a very wide distribution. For example, mx([300 to 400, 50 to 5000], [0.9, 0.1]), or mx([50k to 60k, 1k to 1M], [0.95, 0.05]). This way if you are caught by surprise, the wide distribution will still give you a reasonable outcome. This technique should be used in cases of particularly high uncertainty.
  • Be wary of using the uniform or the triangular distributions. These are mainly good for physical simulations, not to represent uncertainty over many real life variables.
  • If the outcome of a model is an extreme probability (less than 0.05 or more than 0.95), be suspicious of the result. It should be very rare for an intervention to have an extreme effect or have an extreme impact on the probability of an event.
  • Be paranoid about the uncertainty ranges of your variables. If you are dealing with a highly speculative variable, the answer might have 2-8 orders of magnitude of uncertainty, like 100 to 100K. If you are dealing with a variable that's fairly certain, the answer might have 2-4 sig figs of uncertainty. Be focused on being accurate and not overconfident.
  • Break down key variables that are uncertain, significant, and divisible into components. For instance, when analyzing a business's costs and benefits, if labor expenses represent a major portion of total costs, decompose them into specific elements like wages, benefits, recruitment costs, and staffing levels.
  • Be careful with sigmoid functions. Sigmoid curves with distributions can have very little uncertainty in the middle, and very high uncertainty at the tails. If you are unsure about these values, consider using a mixture distribution. For example, this curve has very high certainty in the middle, and very high uncertainty at the tails: adoption_rate(t) = 1 / (1 + exp(-normal(0.1, 0.08) * (t - 30)))
  • If a variable cannot be negative, make sure an appropriate distribution is used. Lognormal distributions are often a good choice. If you use something symetric like a normal distribution, you can truncate it to be positive. For example, truncateLeft(normal(3, 2), 0) will be a positive distribution.
  • Make sure to flag any variables that are speculative. Use @doc() to explain that the variable is speculative and to give a sense of the uncertainty. Explain your reasoning, but also warn the reader that the variable is speculative. For example:
@doc("This variable is particularly uncertain. I'm presuming that the true value is around 50%, but it could easily be 10% or 90%. This depends a lot on the value of productivity, which is highly uncertain.")

Or,

@doc("This assumes that the model is for an average American. If the person is older, has a lower hourly wage, this number could be much lower.")

Percentages / Probabilities

  • Use a @format() tag, like .0% to format percentages.
  • If using a distribution, remember that it shouldn't go outside of 0% and 100%. You can use beta distributions or truncate() to keep values in the correct range.
  • If you do use a beta distribution, keep in mind that there's no ({p5, p95}) format. You can use beta(alpha:number, beta:number) or beta({mean: number, stdev: number}) to create a beta distribution.
  • Write percentages as 5% instead of 0.05. It's more readable.

Domains

  • Prefer using domains to throwing errors, when trying to restrict a variable. For example, don't write, if year < 2023 then throw("Year must be 2023 or later"). Instead, write f(t: [2023, 2050]).
  • Err on the side of using domains in cases where you are unsure about the bounds of a function, instead of using if/throw or other error handling methods.
  • If you only want to set a min or max value, use a domain with Number.maxValue or -Number.maxValue as the other bound.
  • Do not use a domain with a complete range, like [-Number.maxValue, Number.maxValue]. This is redundant. Instead, just leave out the domain, like f(t).
// Do not use this
f(t: [-Number.maxValue, Number.maxValue]) + 1
 
// Do this
f(t) = t + 1

Structure and Naming Conventions

Structure

  • Don't have more than 10 variables in scope at any one time. Feel free to use many dictionaries and blocks in order to keep things organized. For example,
@name("Key Inputs")
inputs = {
  @name("Age (years)")
  age = 34
 
  @name("Hourly Wage ($/hr)")
  hourlyWage = 100
 
  @name("Coffee Price ($/cup)")
  coffeePrice = 1
  {age, hourlyWage, coffeePrice}
}

Note: You cannot use tags within dicts like the following:

// This is not valid. Do not do this.
inputs = {
  @name("Age (years)")
  age: 34,
 
  @name("Hourly Wage ($/hr)")
  hourlyWage: 100,
}
  • At the end of the file, don't return anything. The last line of the file should be the @notebook tag.
  • You cannot start a line with a mathematical operator. For example, you cannot start a line with a + or - sign. However, you can start a line with a pipe character, ->.
  • Prettier will be run on the file. This will change the spacing and formatting. Therefore, be conservative with formatting (long lines, no risks), and allow this to do the heavy lifting later.
  • If the file is over 50 lines, break it up with large styled blocks comments with headers. For example:
// ===== Inputs =====
 
// ...
 
// ===== Calculations =====

Naming Conventions

  • Use camelCase for variable names.
  • All variable names must start with a lowercase letter.
  • In functions, input parameters that aren't obvious should have semantic names. For example, instead of nb use net_benefit.

Dictionaries

  • In dictionaries, if a key name is the same as a variable name, use the variable name directly. For example, instead of {value: value}, just use {value}. If there's only one key, you can type it with a comma, like this: {value,}.

Unit Annotation

  • You can add unit descriptions to @name(), @doc() tags, and add them to comments.
  • In addition to regular units (like "population"), add other key variables; like the date or the type of variable. For example, use "Number of Humans (Population, 2023)" instead of just "Number of Humans". It's important to be precise and detailed when annotating variables.
  • Squiggle does support units directly, using the syntax foo :: unit. However, this is not recommended to use, because this is still a beta feature.
  • Show units in parentheses after the variable name, when the variable name is not obvious. For example, use "Age (years)" instead of just "Age". In comments, use the "(units)" format. Examples:
@name("Number of Humans (2023)")
numberOfHumans = 7.8B
 
@name("Net Benefit ($)")
netBenefit = 100M
 
@name("Temperature (°C)")
temperature = 22
 
@name("Piano Tuners in New York City (2023)")
tuners = {
    pianosPerTuner = 100 to 1k // (pianos per tuner)
    pianosInNYC = 1k to 50k // (pianos)
    pianosInNYC / pianosPerTuner
}
  • Maintain Consistent Units. Ensure that related variables use the same units to prevent confusion and errors in calculations.
@name("Distance to Mars (km)")
distanceMars = 225e6
 
@name("Distance to Venus (km)")
distanceVenus = 170e6

Numbers

  • Use abbreviations, when simple, for numbers outside the range of 10^4 to 10^3. For example, use "10k" instead of "10000".
  • For numbers outside the range of 10^10 or so, use scientific notation. For example, "1e10".
  • Don't use small numbers to represent large numbers. For example, don't use '5' to represent 5 million.

Don't use the code:

@name("US Population (millions)")
usPopulation = 331.9

Instead, use:

@name("US Population")
usPopulation = 331.9M

More examples:

// Correct representations
worldPopulation = 7.8B
annualBudget = 1.2T
distanceToSun = 149.6e6  // 149.6 million kilometers
 
// Incorrect representations (avoid these)
worldPopulation = 7800  // Unclear if it's 7800 or 7.8 billion
annualBudget = 1200  // Unclear if it's 1200 or 1.2 trillion
  • There's no need to use @format on regular numbers. The default formatting is fairly sophistated.
  • Remember to use Number.sum and Number.product, instead of using Reduce in those cases.

Lists of Structured Data

  • When you want to store complex data as code, use lists of dictionaries, instead of using lists of lists. This makes things clearer. For example, use:
[
  {year: 2023, value: 1},
  {year: 2024, value: 2},
]
instead of:
[
  [2023, 1],
  [2024, 2],
]

You can use lists instead when you have a very long list of items (20+), very few keys, and/or are generating data using functions.

  • Tables are a great way to display structured data.
  • You can use the @showAs tag to display a table if the table can show all the data. If this takes a lot of formatting work, you can move that to a helper function. Note that helper functions must be placed before the @showAs tag. The ozziegooen/helpers library has a dictsToTable function that can help convert lists of dictionaries into tables.

For example:

@hide
strategiesTable(data) = Table.make(
  data,
  {
    columns: [
      { name: "name", fn: {|f| f.n} },
      { name: "costs", fn: {|f| f.c} },
      { name: "benefits", fn: {|f| f.b} },
    ],
  }
)
 
@name("AI Safety Strategies")
@doc("List of 10 AI safety strategies with their costs and benefits")
@showAs(strategiesTable)
strategies = [
  { n: "AI Ethics", c: 1M to 5M, b: 5M to 20M },
  { n: "Alignment Research", c: 2M to 10M, b: 10M to 50M },
  { n: "Governance", c: 500k to 3M, b: 2M to 15M },
  ...
]

Tags and Annotations

@name, @doc, @hide, @showAs

  • Use @name for simple descriptions and shortened units. Use @doc for further details (especially for detailing types, units, and key assumptions), when necessary. It's fine to use both @name and @doc on the same variable - but if so, don't repeat the name in the doc; instead use the doc() for additional information only.
  • In @name, add units wherever it might be confusing, like "@name("Ball Speed (m/s)"). If the units are complex or still not obvious, add more detail in the @doc().
  • For complex and important functions, use @name to name the function, and @doc to describe the arguments and return values. @doc should represent a docstring for the function. For example:
@doc("Adds a number and a distribution.
\`\`\`squiggle
add(number, distribution) -> distribution
\`\`\`")
  • Variables that are small function helpers, and that won't be interesting or useful to view the output of, should get a @hide tag. Key inputs and outputs should not have this tag.
  • Use @showAs to format large lists, as tables and to show plots for dists and functions where appropriate.
  • Tags are only useful when variables are externally accessible. If a function is only used internally inside a block, then don't use a tag. In these cases, regular comments are preferred.

For example:

cost = {
  @name("Coffee") // This variable is not exported, so no use for tags.
  coffee = 10 to 15,
  maintenance = 100 to 200,
  total = coffee + maintenance
  total // This is the only output of the function.
}

@format()

  • Use @format() for numbers, distributions, and dates that could use obvious formatting.
  • The @format() tag is not usable with dictionaries, functions, or lists. It is usable with variable assignments. Examples:
netBenefit(costs, benefits) = benefits - costs // not valid for @format()
netBenefit = benefits - costs // valid for @format()
  • This mainly makes sense for dollar amounts, percentages, and dates. .0% is a decent format for percentages, and $,.0f can be used for dollars.
  • Choose the number of decimal places based on the stdev of the distribution or size of the number.
  • Do not use () instead of - for negative numbers. So, do not use ($,.0f for negative numbers, use $,.0f instead.

Limitations

  • There is no bignum type. There are floating point errors at high numbers (1e50 and above) and very small numbers (1e-10 and below). If you need to work with these, use logarithms if possible.

Comments

  • Add a short 1-2 line comment on the top of the file, summarizing the model.
  • Add comments throughout the code that explain your reasoning and describe your uncertainties. Give special attention to probabilities and probability distributions that are particularly important and/or uncertain. Flag your uncertainties.
  • Use comments next to variables to explain what units the variable is in, if this is not incredibly obvious. The units should be wrapped in parentheses.
  • There shouldn't be any comments about specific changes made during editing.
  • Do not use comments to explain things that are already obvious from the code.

Visualizations

Tables

  • Tables are a good way of displaying structured data. They can take a bit of formatting work.
  • Tables are best when there are fewer than 30 rows and/or fewer than 4 columns.
  • The table visualization is fairly simple. It doesn't support sorting, filtering, or other complex interactions. You might want to sort or filter the data before putting it in a table.

Notebooks

  • Use the @notebook tag for long descriptions intersperced with variables. This must be a list with strings and variables alternating.
  • If you want to display variables within paragraphs, generally render dictionaries as items within the notebook list. For example:
@notebook
@startOpen
summary = [
"This model evaluates the cost-effectiveness of coffee consumption for a 34-year-old male, considering productivity benefits, health effects, and financial costs.",
{
   optimalCups,
   result.netBenefit,
},
]

This format will use the variable tags to display the variables, and it's simple to use without making errors. If you want to display a variable that's already a dictionary, you don't need to do anything special.

  • String concatenation (+) is allowed, but be hesitant to do this with non-string variables. Most non-string variables don't display well in the default string representation. If you want to display a variable, consider using a custom function or formatter to convert it to a string first. Note that tags are shown in the default string representation, so you should remove them (Tag.clear(variable)) before displaying.
  • Make sure to format Squiggle numbers and Squiggle dates when used in notebooks. For example, "String(45.235, "%.2")".
  • To convert distributions into strings, use scripts like String(Dist.inv(dist, 0.05), ",.2f") + " to " + String(Dist.inv(dist, 0.95), ",.2f").
  • Separate items in the list will be displayed with blank lines between them. This will break many kinds of formatting, like lists. Only do this in order to display full variables that you want to show.

Don't do:

summary = [
  "- Point 1",
  "- Point 2",
  "- Point 3",
]

Instead, do:

summary = [
  "- Point 1
- Point 2
- Point 3",
]
  • Use markdown formatting for headers, lists, and other structural elements.
  • Use bold text to highlight key outputs. Like, "The optimal number of coffee cups per day is " + String(optimal_cups, ",.2f") + "".

Example: (For a model with 300 lines)

@notebook
@startOpen
summary = [
  "## Summary
  This model evaluates the cost-effectiveness of coffee consumption for a 34-year-old male, considering productivity benefits, health effects, and financial costs.",
  {inputs, final_answer},
  "## Major Assumptions & Uncertainties
  - The model places a very high value on productivity. If you think that productivity is undervalued, coffee consumption may be underrated.
  - The model only includes 3 main factors: productivity, cost, and health. It does not take into account other factors, like addiction, which is a major factor in coffee consumption.
  - The model does not take into account the quality of sleep, which is critical.
  "
  "## Outputs
  The optimal number of coffee cups per day: **" + String(optimal_cups, ",.2f") + "**
  The net benefit at optimal consumption: **" + String(result.net_benefit, ",.2f") + "**",
  "## Key Findings
  - Moderate amounts of coffee consumption seem surprisingly beneficial.
  - Productivity boost from coffee shows steeply diminishing returns as consumption increases, as would be expected.
  - The financial cost of coffee is the critical factor in determining optimal consumption.
  ## Detailed Analysis
  The model incorporates several key factors:
  1. Productivity boost: Modeled with diminishing returns as coffee consumption increases.
  2. Health impact: Considers both potential benefits and risks of coffee consumption.
  3. Financial cost: Accounts for the direct cost of purchasing coffee.
  4. Monetary values: Includes estimates for the value of time (hourly wage) and health (QALY value).
 
  The optimal consumption level is determined by maximizing the net benefit, which is the sum of monetized productivity and health benefits minus the financial cost.
 
  It's important to note that this model is based on general estimates and may not apply to all individuals. Factors such as personal health conditions, caffeine sensitivity, and lifestyle choices could significantly alter the optimal consumption for a specific person.
  "
]

Plots

  • Plots are a good way of displaying the output of a model.
  • Use Scale.symlog() and Scale.log() whenever you think the data is highly skewed. This is very common with distributions.
  • Use Scale.symlog() instead of Scale.log() when you are unsure if the data is above or below 0. Scale.log() is preferred when you know the data is always positive, but Scale.symlog() is if some data is or might be negative.
  • Function plots use plots equally spaced on the x-axis. This means they can fail if only integers are accepted. In these cases, it can be safer just not to use the plot, or to use a scatter plot.
  • When plotting 2-8 distributions over the same x-axis, it's a good idea to use Plot.dists(). For example, if you want to compare 5 different costs of a treatment, or 3 different adoption rates of a technology, this can be a good way to display the data.
  • When plotting distributions in tables or if you want to display multiple distributions under each other, and you don't want to use Plot.dists, it's a good idea to have them all use the same x-axis scale, with custom min and max values. This is a good way to make sure that the x-axis scale is consistent across all distributions. But again, it only makes sense if you have a small number of distributions and they use the same rough scale and x-axis.

Here's an example of how to display multiple distributions over the same x-axis, with a custom x-axis range:

strategies = [
  { n: "AI Ethics", c: 1M to 5M, b: 5M to 20M },
  { n: "Alignment Research", c: 2M to 10M, b: 10M to 50M },
  ...
]
 
rangeOfDists(dists) = {
  min: Number.min(List.map(dists, {|d| Dist.quantile(d, 0.05)})),
  max: Number.max(List.map(dists, {|d| Dist.quantile(d, 0.95)})),
}
 
plotOfResults(fn) = {
  |r|
  range = List.map(strategies, fn) -> rangeOfDists
  Plot.dist(fn(r), { xScale: Scale.linear(range) })
}
 
table = Table.make(
  strategies,
  {
    columns: [
      { name: "Strategy", fn: {|r| r.name} },
      { name: "Cost", fn: plotOfResults({|r| r.c}) },
      { name: "Benefit", fn: plotOfResults({|r| r.b}) },
    ],
  }
)

Tests

  • Use sTest to test squiggle code.
  • Test all functions that you are unsure about. Be paranoid.
  • Use one describe block, with the variable name 'tests'. This should have several tests with in it, each with one expect statement.
  • Use @startClosed tags on variables that are test results. Do not use @hide tags.
  • Do not test if function domains return errors when called with invalid inputs. The domains should be trusted.
  • If you set variables to sTest values, @hide them. They are not useful in the final output.
  • Do not test obvious things, like the number of items in a list that's hardcoded.
  • Feel free to use helper functions to avoid repeating code.
  • The expect.toThrowAnyError() test is useful for easily sanity-checking that a function is working with different inputs.

Example:

@hide
describe = sTest.describe
 
@hide
test = sTest.test
 
tests = describe(
  "Coffee Consumption Model Tests",
  [
    // ...tests
  ]
)

Summary Notebook

  • For models over 5 lines long, you might want to include a summary notebook at the end of the file using the @notebook tag.
  • Aim for a summary length of approximately (N^0.6) * 1.2 lines, where N is the number of lines in the model.
  • Use the following structure:
    1. Model description
    2. Major assumptions & uncertainties (if over 100 lines long)
    3. Outputs (including relevant Squiggle variables)
    4. Key findings (flag if anything surprised you, or if the results are counterintuitive)
    5. Detailed analysis (if over 300 lines long)
    6. Important notes or caveats (if over 100 lines long)
  • The summary notebook should be the last thing in the file. It should be a variable called summary.
  • Draw attention to anything that surprised you, or that you think is important. Also, flag major assumptions and uncertainties.
  • There should be a mix of paragraphs and bullet points.

Example: (For a model with 300 lines)

@notebook
@startOpen
summary = [
  "## Summary
  This model evaluates the cost-effectiveness of coffee consumption for a 34-year-old male, considering productivity benefits, health effects, and financial costs. It finds that moderate coffee consumption is surprisingly beneficial, with diminishing returns as consumption increases. The financial cost of coffee is the critical factor in determining optimal consumption.
 
Note that this model rests on several speculative assumptions. Of particular importance is the value of productivity. If you think that productivity is undervalued, coffee consumption may be underrated.
  {inputs, final_answer},
  "## Major Assumptions & Uncertainties
  - The model places a very high value on productivity. If you think that productivity is undervalued, coffee consumption may be underrated.
  - The model only includes 3 main factors: productivity, cost, and health. It does not take into account other factors, like addiction, which is a major factor in coffee consumption.
  - The model does not take into account the quality of sleep, which is critical.
  "
  "## Outputs
  The optimal number of coffee cups per day: **" + String(optimal_cups, ",.2f") + "**
  The net benefit at optimal consumption: **" + String(result.net_benefit, ",.2f") + "**",
  "## Key Findings
  - Moderate amounts of coffee consumption seem surprisingly beneficial.
  - Productivity boost from coffee shows steeply diminishing returns as consumption increases, as would be expected.
  - The financial cost of coffee is the critical factor in determining optimal consumption.
  ## Detailed Analysis
  The model incorporates several key factors:
  1. Productivity boost: Modeled with diminishing returns as coffee consumption increases.
  2. Health impact: Considers both potential benefits and risks of coffee consumption.
  3. Financial cost: Accounts for the direct cost of purchasing coffee.
  4. Monetary values: Includes estimates for the value of time (hourly wage) and health (QALY value).
 
  The optimal consumption level is determined by maximizing the net benefit, which is the sum of monetized productivity and health benefits minus the financial cost.
 
  It's important to note that this model is based on general estimates and may not apply to all individuals. Factors such as personal health conditions, caffeine sensitivity, and lifestyle choices could significantly alter the optimal consumption for a specific person.
  "
]

On this page