Style Guide
This is a style guide for how to write good Squiggle code. It's mostly designed to help LLMs write good Squiggle code, but could be useful for humans as well.
Data and Calculations
Estimations
- When using the
to
format, like3 to 10
, remember that this represents the 5th and 95th percentile. This is a very large range. Be paranoid about being overconfident and too narrow in your estimates. - One good technique, when you think there's a chance that you might be very wrong about a variable, is to use a mixture that contains a very wide distribution. For example,
mx([300 to 400, 50 to 5000], [0.9, 0.1])
, ormx([50k to 60k, 1k to 1M], [0.95, 0.05])
. This way if you are caught by surprise, the wide distribution will still give you a reasonable outcome. This technique should be used in cases of particularly high uncertainty. - Be wary of using the uniform or the triangular distributions. These are mainly good for physical simulations, not to represent uncertainty over many real life variables.
- If the outcome of a model is an extreme probability (less than 0.05 or more than 0.95), be suspicious of the result. It should be very rare for an intervention to have an extreme effect or have an extreme impact on the probability of an event.
- Be paranoid about the uncertainty ranges of your variables. If you are dealing with a highly speculative variable, the answer might have 2-8 orders of magnitude of uncertainty, like
100 to 100K
. If you are dealing with a variable that's fairly certain, the answer might have 2-4 sig figs of uncertainty. Be focused on being accurate and not overconfident. - Break down key variables that are uncertain, significant, and divisible into components. For instance, when analyzing a business's costs and benefits, if labor expenses represent a major portion of total costs, decompose them into specific elements like wages, benefits, recruitment costs, and staffing levels.
- Be careful with sigmoid functions. Sigmoid curves with distributions can have very little uncertainty in the middle, and very high uncertainty at the tails. If you are unsure about these values, consider using a mixture distribution. For example, this curve has very high certainty in the middle, and very high uncertainty at the tails:
adoption_rate(t) = 1 / (1 + exp(-normal(0.1, 0.08) * (t - 30)))
- If a variable cannot be negative, make sure an appropriate distribution is used. Lognormal distributions are often a good choice. If you use something symetric like a normal distribution, you can truncate it to be positive. For example,
truncateLeft(normal(3, 2), 0)
will be a positive distribution. - Make sure to flag any variables that are speculative. Use
@doc()
to explain that the variable is speculative and to give a sense of the uncertainty. Explain your reasoning, but also warn the reader that the variable is speculative. For example:
Or,
Percentages / Probabilities
- Use a
@format()
tag, like.0%
to format percentages. - If using a distribution, remember that it shouldn't go outside of 0% and 100%. You can use beta distributions or truncate() to keep values in the correct range.
- If you do use a beta distribution, keep in mind that there's no
({p5, p95})
format. You can usebeta(alpha:number, beta:number)
orbeta({mean: number, stdev: number})
to create a beta distribution. - Write percentages as
5%
instead of0.05
. It's more readable.
Domains
- Prefer using domains to throwing errors, when trying to restrict a variable. For example, don't write,
if year < 2023 then throw("Year must be 2023 or later")
. Instead, writef(t: [2023, 2050])
. - Err on the side of using domains in cases where you are unsure about the bounds of a function, instead of using if/throw or other error handling methods.
- If you only want to set a min or max value, use a domain with
Number.maxValue
or-Number.maxValue
as the other bound. - Do not use a domain with a complete range, like
[-Number.maxValue, Number.maxValue]
. This is redundant. Instead, just leave out the domain, likef(t)
.
Structure and Naming Conventions
Structure
- Don't have more than 10 variables in scope at any one time. Feel free to use many dictionaries and blocks in order to keep things organized. For example,
Note: You cannot use tags within dicts like the following:
- At the end of the file, don't return anything. The last line of the file should be the @notebook tag.
- You cannot start a line with a mathematical operator. For example, you cannot start a line with a + or - sign. However, you can start a line with a pipe character,
->
. - Prettier will be run on the file. This will change the spacing and formatting. Therefore, be conservative with formatting (long lines, no risks), and allow this to do the heavy lifting later.
- If the file is over 50 lines, break it up with large styled blocks comments with headers. For example:
Naming Conventions
- Use camelCase for variable names.
- All variable names must start with a lowercase letter.
- In functions, input parameters that aren't obvious should have semantic names. For example, instead of
nb
usenet_benefit
.
Dictionaries
- In dictionaries, if a key name is the same as a variable name, use the variable name directly. For example, instead of
{value: value}
, just use{value}
. If there's only one key, you can type it with a comma, like this:{value,}
.
Unit Annotation
- You can add unit descriptions to
@name()
,@doc()
tags, and add them to comments. - In addition to regular units (like "population"), add other key variables; like the date or the type of variable. For example, use "Number of Humans (Population, 2023)" instead of just "Number of Humans". It's important to be precise and detailed when annotating variables.
- Squiggle does support units directly, using the syntax
foo :: unit
. However, this is not recommended to use, because this is still a beta feature. - Show units in parentheses after the variable name, when the variable name is not obvious. For example, use "Age (years)" instead of just "Age". In comments, use the "(units)" format. Examples:
- Maintain Consistent Units. Ensure that related variables use the same units to prevent confusion and errors in calculations.
Numbers
- Use abbreviations, when simple, for numbers outside the range of 10^4 to 10^3. For example, use "10k" instead of "10000".
- For numbers outside the range of 10^10 or so, use scientific notation. For example, "1e10".
- Don't use small numbers to represent large numbers. For example, don't use '5' to represent 5 million.
Don't use the code:
Instead, use:
More examples:
- There's no need to use @format on regular numbers. The default formatting is fairly sophistated.
- Remember to use
Number.sum
andNumber.product
, instead of using Reduce in those cases.
Lists of Structured Data
- When you want to store complex data as code, use lists of dictionaries, instead of using lists of lists. This makes things clearer. For example, use:
You can use lists instead when you have a very long list of items (20+), very few keys, and/or are generating data using functions.
- Tables are a great way to display structured data.
- You can use the
@showAs
tag to display a table if the table can show all the data. If this takes a lot of formatting work, you can move that to a helper function. Note that helper functions must be placed before the@showAs
tag. Theozziegooen/helpers
library has adictsToTable
function that can help convert lists of dictionaries into tables.
For example:
Tags and Annotations
@name, @doc, @hide, @showAs
- Use
@name
for simple descriptions and shortened units. Use@doc
for further details (especially for detailing types, units, and key assumptions), when necessary. It's fine to use both@name
and@doc
on the same variable - but if so, don't repeat the name in the doc; instead use the doc() for additional information only. - In
@name
, add units wherever it might be confusing, like"@name("Ball Speed (m/s)")
. If the units are complex or still not obvious, add more detail in the@doc()
. - For complex and important functions, use
@name
to name the function, and@doc
to describe the arguments and return values.@doc
should represent a docstring for the function. For example:
- Variables that are small function helpers, and that won't be interesting or useful to view the output of, should get a
@hide
tag. Key inputs and outputs should not have this tag. - Use
@showAs
to format large lists, as tables and to show plots for dists and functions where appropriate. - Tags are only useful when variables are externally accessible. If a function is only used internally inside a block, then don't use a tag. In these cases, regular comments are preferred.
For example:
@format()
- Use
@format()
for numbers, distributions, and dates that could use obvious formatting. - The
@format()
tag is not usable with dictionaries, functions, or lists. It is usable with variable assignments. Examples:
- This mainly makes sense for dollar amounts, percentages, and dates.
.0%
is a decent format for percentages, and$,.0f
can be used for dollars. - Choose the number of decimal places based on the stdev of the distribution or size of the number.
- Do not use
()
instead of-
for negative numbers. So, do not use($,.0f
for negative numbers, use$,.0f
instead.
Limitations
- There is no bignum type. There are floating point errors at high numbers (1e50 and above) and very small numbers (1e-10 and below). If you need to work with these, use logarithms if possible.
Comments
- Add a short 1-2 line comment on the top of the file, summarizing the model.
- Add comments throughout the code that explain your reasoning and describe your uncertainties. Give special attention to probabilities and probability distributions that are particularly important and/or uncertain. Flag your uncertainties.
- Use comments next to variables to explain what units the variable is in, if this is not incredibly obvious. The units should be wrapped in parentheses.
- There shouldn't be any comments about specific changes made during editing.
- Do not use comments to explain things that are already obvious from the code.
Visualizations
Tables
- Tables are a good way of displaying structured data. They can take a bit of formatting work.
- Tables are best when there are fewer than 30 rows and/or fewer than 4 columns.
- The table visualization is fairly simple. It doesn't support sorting, filtering, or other complex interactions. You might want to sort or filter the data before putting it in a table.
Notebooks
- Use the @notebook tag for long descriptions intersperced with variables. This must be a list with strings and variables alternating.
- If you want to display variables within paragraphs, generally render dictionaries as items within the notebook list. For example:
This format will use the variable tags to display the variables, and it's simple to use without making errors. If you want to display a variable that's already a dictionary, you don't need to do anything special.
- String concatenation (+) is allowed, but be hesitant to do this with non-string variables. Most non-string variables don't display well in the default string representation. If you want to display a variable, consider using a custom function or formatter to convert it to a string first. Note that tags are shown in the default string representation, so you should remove them (
Tag.clear(variable)
) before displaying. - Make sure to format Squiggle numbers and Squiggle dates when used in notebooks. For example, "String(45.235, "%.2")".
- To convert distributions into strings, use scripts like
String(Dist.inv(dist, 0.05), ",.2f") + " to " + String(Dist.inv(dist, 0.95), ",.2f")
. - Separate items in the list will be displayed with blank lines between them. This will break many kinds of formatting, like lists. Only do this in order to display full variables that you want to show.
Don't do:
Instead, do:
- Use markdown formatting for headers, lists, and other structural elements.
- Use bold text to highlight key outputs. Like, "The optimal number of coffee cups per day is " + String(optimal_cups, ",.2f") + "".
Example: (For a model with 300 lines)
Plots
- Plots are a good way of displaying the output of a model.
- Use
Scale.symlog()
andScale.log()
whenever you think the data is highly skewed. This is very common with distributions. - Use
Scale.symlog()
instead ofScale.log()
when you are unsure if the data is above or below 0.Scale.log()
is preferred when you know the data is always positive, butScale.symlog()
is if some data is or might be negative. - Function plots use plots equally spaced on the x-axis. This means they can fail if only integers are accepted. In these cases, it can be safer just not to use the plot, or to use a scatter plot.
- When plotting 2-8 distributions over the same x-axis, it's a good idea to use
Plot.dists()
. For example, if you want to compare 5 different costs of a treatment, or 3 different adoption rates of a technology, this can be a good way to display the data. - When plotting distributions in tables or if you want to display multiple distributions under each other, and you don't want to use
Plot.dists
, it's a good idea to have them all use the same x-axis scale, with custom min and max values. This is a good way to make sure that the x-axis scale is consistent across all distributions. But again, it only makes sense if you have a small number of distributions and they use the same rough scale and x-axis.
Here's an example of how to display multiple distributions over the same x-axis, with a custom x-axis range:
Tests
- Use
sTest
to test squiggle code. - Test all functions that you are unsure about. Be paranoid.
- Use one describe block, with the variable name 'tests'. This should have several tests with in it, each with one expect statement.
- Use @startClosed tags on variables that are test results. Do not use @hide tags.
- Do not test if function domains return errors when called with invalid inputs. The domains should be trusted.
- If you set variables to sTest values, @hide them. They are not useful in the final output.
- Do not test obvious things, like the number of items in a list that's hardcoded.
- Feel free to use helper functions to avoid repeating code.
- The expect.toThrowAnyError() test is useful for easily sanity-checking that a function is working with different inputs.
Example:
Summary Notebook
- For models over 5 lines long, you might want to include a summary notebook at the end of the file using the
@notebook
tag. - Aim for a summary length of approximately (N^0.6) * 1.2 lines, where N is the number of lines in the model.
- Use the following structure:
- Model description
- Major assumptions & uncertainties (if over 100 lines long)
- Outputs (including relevant Squiggle variables)
- Key findings (flag if anything surprised you, or if the results are counterintuitive)
- Detailed analysis (if over 300 lines long)
- Important notes or caveats (if over 100 lines long)
- The summary notebook should be the last thing in the file. It should be a variable called
summary
. - Draw attention to anything that surprised you, or that you think is important. Also, flag major assumptions and uncertainties.
- There should be a mix of paragraphs and bullet points.
Example: (For a model with 300 lines)