Competition rules

Updated version for the 2024 competition. The rules were changed to require predictions only for 2024, and optionally prediction intervals for 2024. We also require that a quarto document be provided to reproduce the results. See details below.

Climate change stresses environments in urgent and unpredictable ways, from increasing the frequency of natural disasters to threatening the availability of food and drinkable water. Even in Mason’s local community, the DMV area, climate change alters the timing of major life-cycle events—perhaps most notably the day cherry trees bloom in D.C. These trees are a visible marker of the broad and largely invisible impact of climate change.

Over the previous decade, cherry trees have bloomed earlier than any decade on record. But variation in climate conditions makes annual predictions extremely difficult. According to the National Parks Service,

Forecasting peak bloom is almost impossible more than 10 days in advance. The cherry trees’ blossom development is dependent on weather conditions. National Park Service horticulturists monitor bud development and report the status of the blossoms.

National Parks Service (https://www.nps.gov/subjects/cherryblossom/bloom-watch.htm)

That’s where you come in!

For this competition, we seek accurate, interpretable predictions that offer strong narratives about the factors that determine when cherry trees bloom and the broader consequences for local and global ecosystems. We will provide you with all the publicly available data on the bloom date of cherry trees we can find, including Washington, D.C. (USA), Kyoto (Japan), and Liestal-Weideli (Switzerland). You will then use this data in combination with any other publicly available data (e.g., climate data) to provide reproducible predictions of the peak bloom date of trees at the following five sites:

LocationLatitude (°)Longitude (°)Altitude (m)Years availableBloom definitionSpecies
Kyoto (Japan)35.0120135.676144801–202380%Prunus jamasakura
Liestal-Weideli (Switzerland)47.48147.7305193501895–202325%Prunus avium
Washington, D.C. (USA)38.8853–77.038601921–202370%Prunus × yedoensis ‘Somei-yoshino’
Vancouver, BC (Canada)49.2237–123.1636242022–2023170%Prunus × yedoensis ‘Akebono’
New York City, NY (USA) 40.73040 –73.99809 8.52019–2023270%Prunus × yedoensis

1 The Vancouver Cherry Blossom Festival has updated information on the bloom status. Moreover, casual observations have been recorded in the way of photos posted to the VCBF Neighbourhood Blog for Kerrisdale. You can search the forum for the keywords ‘Akebono’ (i.e., the name of the cultivar) and “Maple Grove Park” (i.e., the location of the trees).

2 This data can be found in the data files provided by USA-NPN. The site ID for Washington Square Park is 32789 and the species ID is 228.

Your task is to predict the peak bloom date for 2024 and to estimate a prediction interval, a lower and upper endpoint of dates during which peak bloom is most probable—the organizers must be able to reproduce your predictions. Complete entries will then be evaluated in multiple categories:

  1. best prediction for 2024 (for all five sites),
  2. most accurate prediction intervals for 2024 (for all five sites),
  3. best narrative (500–1000 words),
  4. best use of the data provided by USA-NPN.

The first category will be based on mean absolute error according to the peak bloom date declared by the National Park Service (Washington, D.C.), Japan Meteorological Agency (Kyoto), MeteoSwiss (Liestal-Weideli), the Vancouver Cherry Blossom Festival in collaboration with the UBC Botanical Garden (Vancouver, BC), and the Washington Square Park Eco projects in collaboration with the Local Nature Lab (New York City, NY). The second category will be determined by whether the prediction intervals cover the true peak bloom dates and the length of the prediction intervals; shorter intervals covering the true peak bloom date receive more points than wider intervals or intervals not covering the true peak bloom date. The prize for the best narrative and best use of the data provided by USA-NPN will be based on the reviews of expert judges.

For an entry to be complete, it must be submitted before February 29th, 2024 midnight AOE (anywhere on earth). It must contain all five predictions, five prediction intervals (optional) and the data and code to reproduce the results as quarto document. It must also contain a short narrative of between five hundred and one thousand words. Examples that can’t be reproduced or without sufficiently complete or coherent narratives will be rejected at the discretion of the organizers.

Complete entries will be eligible to win more than $5,000 in cash and prizes, split among the categories. Select entries will be hosted on this competition website and promoted by the George Mason Department of Statistics, our partners, and sponsors. Any participant can be part of only one team.

How to enter the competition

A complete entry to the competition must contain three parts:

  1. Predictions for the five sites for 2024 (as CSV file, see below for a template).
  2. Optionally prediction intervals for the five sites for 2024 in the same CSV file as above. The interval is inclusive of both lower- and upper endpoint and must contain the predicted peak bloom date, i.e., lower endpoint ≤ prediction ≤ upper endpoint.
  3. A short, blinded narrative between 500–1000 words. The narrative must be uploaded as a PDF document and should not contain any identifying information (author names, affiliations, etc.). HTML or Microsoft Word documents should be exported to PDF.
  4. A link to a publicly accessible Git repository with all code and data required to reproduce the analysis.

The predictions are to be uploaded as a text file with comma separated values (CSV), containing the five site IDs, the predictions for the five sites as the day of the year (Jan 1st = 1, Jan 2nd = 2, …), and optionally the lower- and upper endpoint of the prediction intervals. The CSV file must have exactly 6 lines (1 header line, 5 lines of predictions) and either 2 columns (location, prediction) or optionally 4 columns (location, prediction, lower, upper), exactly in the following format:

"location","prediction"
"washingtondc",22
"liestal",30
"kyoto",25
"vancouver",32
"newyorkcity",35

If you also estimate prediction intervals, the file should look like this:

"location","prediction","lower","upper"
"washingtondc",22,15,24
"liestal",30,28,34
"kyoto",25,20,30
"vancouver",32,20,40
"newyorkcity",35,20,50

Note: These predictions are not based on statistical modeling and are only to demonstrate the format of the file. For additional guidance see the GitHub template repository.