Skip to main content

Important for you to know:

  • New to WDFN: Find the water data available in WDFN using the new home page on https://waterdata.usgs.gov. Use our new landing page to discover data near you. Learn more here.

Statistics documentation Supporting information for daily data statistical values

Statistics overview

Daily data statistics are available throughout the USGS Water Data APIs and Water Data for the Nation ecosystem. These statistics are generated from approved daily value data. Daily data is supplied by Water Science Centers, and in general, are the mean (average) of all continuously sampled data for a single day.

Learn more about daily and continuous data

Types of statistics include standard monthly and annual values as well as historical calculations based on the specific month or day of the year. The same logic is used in calculations of statistics across the ecosystem.

Description of features offered

Day of year statistics

Day of year statistics are calculated based on all approved historic observations for a numeric day of the year. For example, January 1, is day 1 of a calendar year and December 31 is day 365 of a non-leap year.

Day of year statistics represent the expected conditions for a generic day of the year at a location based on approved historical observations.

Day of year statistics available include the daily mean of daily means, daily maximum of daily means, daily median of daily means, daily minimum of daily means, and daily percentiles of daily means based on the same numeric day of the year in the period of record.

Month of year statistics

Month of year statistics are calculated based on all approved historic observations within a given month. For example, all October data ever collected and approved at a monitoring location would be included in one set of monthly statistics.

Month of year statistics represent the expected conditions for a generic month of the year at a location based on approved historical observations.

Month of year statistics available include monthly mean of daily means, monthly median of daily means, monthly maximum of daily means, monthly minimum of daily means, and monthly percentiles of daily means based on the same month in the period of record.

Monthly statistics

Monthly statistics are calculated based on the approved data for a specific month. These statistics are reported for each month of each year available. For example, statistics would be available for June 1984.

Monthly statistics represent the conditions that existed for a specific month when data was collected.

Monthly statistics available include mean of daily means, maximum of daily means, median of daily means, minimum of daily means, and percentiles for of daily mean data for each specific month in the period of record.

Annual statistics

Annual statistics are calculated based on yearly records of approved data.

Annual statistics are calculated for calendar years (starting January 1) and water years (starting October 1).

Annual statistics represent the conditions that existed for a specific year (calendar or water) when data was collected.

Annual statistics available include mean of daily means, maximum of daily means, median of daily means, minimum of daily means, and percentiles of daily means for each specific year in the period of record.

Annual Cumulative statistics

Annual Cumulative statistics are generated on-the-fly when rendering accumulation graphs and are not available through an API.

Annual cumulative graphs are generated for discharge when there is at least 10 years of daily mean data. The graphs render a shaded section which shows the bounds of the 25th and 75th percentiles for the daily mean data accumulated up to that day in the year. The daily mean data is also accumulated for the year and is rendered as a dark line ending at the last value that has been received for the current year.

View a sample accumulation graph

Where/how to access statistics data

Daily data statistics are available in many features in Water Data for the Nation user interface. Statistics data are also available for download and are accessible through statistics APIs.

Day of year statistics on MLP

Water Data for the Nation Monitoring location pages deliver day of year statistics for a selected parameter when available. Day of year statistics are calculated based on the numeric day of the year. These data appear under the graph and show historic statistics for the selected parameter.

View a sample Monitoring location page

Statistics Tables

Statistics tables are available on the Statistics page associated with each monitoring location. Users can use controls to select their preferred statistics. Values only appear for days that exist in the month. Tables will display 'NA' if a value cannot be calculated due to lack of data.

View a sample Statistics page

Graphical statistics display

Within Water Data for the Nation, daily statistical data is available through several graphical options.

  • Monitoring location pages

    When available, day of year median daily mean data can be shown on Monitoring location page graphs with continuous data. This option is selectable from within the Continuous data selection area.

    View a sample Monitoring location page

  • All graphs for continuous data

    The All graphs for continuous data page displays a graph for every data type at a monitoring location that has collected continuous data in the last 120 days. By default, if day of year median daily mean data is available, that data is shown on the graph.

    View a sample All Graphs page

  • Daily graphs

    The Daily graphs page will show two specialty graphs when daily statistics are available.

    • Duration graph

      Duration graphs allow recent observations to be viewed in the context of expected values based on historical data for a data type at a monitoring location.

      A duration graph is a graphical presentation of recent daily mean discharge (streamflow) observed at an individual monitoring location, plotted over the long-term statistics for discharge for each day of the year at that location.

      The statistics for the duration graph are based on quality assured and approved data which include the maximum daily mean discharge recorded during the period of record for each day of the year, the 90th percentile discharge for each day, the interquartile range (the area bounded by the 75th and 25th percentiles), the 10th percentile, and the minimum mean daily discharge recorded for each day.

      The plot covers a period of two years with the statistics being identical for each of the two years.

      Data used to generate these graphs are from daily or monthly statistics and most recent observations are from the daily values service.

      Duration graphs come in two varieties: one that uses day of year statistics and another that uses month of year. In general each type of duration graph works well with certain data types. For example:

    • Accumulation graph

      The Accumulation graph is a graphical presentation of recent cumulative daily mean data observed at a monitoring location, plotted over the cumulative day of year statistics of discharge. The graph shows the cumulative 25th and 75th percentile bins as a shaded area for the entire year with the accumulated daily mean data up to the last available daily data for the year.

      View a sample accumulation graph

Download functionality via user interface

Statistical data may also be downloaded from the user interface.

Application Programing Interface (API)

Statistical data can be accessed through an API. View the Statistics API documentation

To learn more about the Statistics API and see and test sample queries, try the Swagger documentation .

Most recent statistic on maps

Most recent conditions are plotted on maps to help visualize how current data compare to historic data. Maps show active monitoring locations as colored dots that represent the most recent observation in context to historical conditions for the same day of the year. Maps also allow for the comparison of spatial variation of current conditions. Features that show these mapped statistics include the National Water Dashboard and Water Data for the Nation State pages.

Terms and descriptions used in calculations

Minimum

The minimum is the smallest value in a time series. The minimum is useful for understanding the extent of variation in the data and can help identify any potential outliers or unusual values. In practical terms, knowing the minimum is important for assessing the overall context of the data, such as determining the lowest measurement.

Maximum

The maximum is the largest value in a time series. It indicates the upper limit of the range and shows the highest observation recorded. Similar to the minimum, the maximum is an important statistic for understanding data variation and can also help identify outliers.

Median

The median is the middle value in a data set when the values are arranged in ascending or descending order. If there is an even number of values, the median is the average of the two middle numbers. The median is helpful for understanding the central tendency, especially in skewed distributions.

Mean

The mean, often called the average, is calculated by adding all the values in a time series and then dividing by the number of values. It provides a measure of central tendency, indicating the typical value in the data set. The mean is less resistant to changes in magnitudes of outlying observations than the median.

Percentile

A percentile indicates the relative standing of a data value when data are sorted into numerical order from smallest to largest. Percentile classes offered by Water Data for the Nation are the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles.

Day of year statistic

The day of year statistic is a historic value based on the period of record for a specific day through time. A value of “NA” means no data was reported for that day of the year.

Yearly statistic

A yearly statistic is determined by the values of a given year. This statistic can be calculated on the water year (October 1st to September 30th) or the calendar year.

Monthly statistics

Computed based on the values of a given month/year

Month of Year statistics

Historic value based on the utilized period of record for a specific month through time. A value of “NA” means no data was reported for that month of the year.

How are daily data statistics calculated

What data is used

Water Data for the Nation derives statistics from the USGS Water Data Daily Values API. These values reflect the measurements from continuous monitoring locations across the USGS network. Field visit data are not considered in statistics calculations. No checks on incorrect, missing, or negative values are performed. Statistics statistical methods do not take into consideration discrepancies in discharge (streamflow) data that may potentially skew calculations, such as:

  • negative flows associated with tidal-influenced locations or other phenomena
  • regulated flows or transitions from regulated to unregulated flows (or vice versa)
  • major watershed changes

View the USGS Water Data Daily Values API documentation

For a hands-on view of the API and to test sample queries, try the Swagger documentation for Daily data.

Differences in daily data availability

While USGS collects hydrologic data at many monitoring locations across the country, daily data are only computed for locations that are collecting continuous data at a monitoring station. The availability of daily data may vary over time due to various factors that affect data collection at a monitoring location, such as ice or equipment failure.

How to handle multiple daily data types if available in statistics output

Some USGS offices choose to calculate multiple daily values on one parameter, such as calculating the mean daily value, the maximum daily value, and the value at midnight. The Statistics API derives statistics from all available daily values. Water Data for the Nation’s user interface uses the mean daily value for statistics calculations and display.

Significant Figures

Significant figures are not handled by the current version of the Statistics API. Instead, rounding rules are applied. See Precision section below for additional details.

Minimum period of record/number of observations

Any time-series with approved public data will have calculated statistics. There is no minimum amount of data that is required for the statistic to be calculated.

The API provides a 'sample_count' in its response which indicates the of the number of observations used in the calculation.

  • The 'sample_count' values from the 'observationNormal' endpoint reflect the number of daily and monthly values for day of year and month of year respectively.

  • The 'sample_count' values from the 'observationInterval' endpoint represent the number of daily values used for the interval. For example, a day of year statistic for May 4, will indicate the number of May 4ths used in the calculation. Month of year calculations specify the number of monthly medians used. Intervals indicate number of daily observations within the interval. When new May 4th data is available, the statistics are recalculated and the 'sample_count' increases.

Only using approved data

Statistics are calculated using approved data. Provisional data are not used in calculating statistics. However, provisional data may be displayed on graphs to reflect the most recent observations.

Statistics are recalculated as new approved data becomes available or when approved data is revised.

View information about USGS provisional data

Time spans used in calculations

Period Of Record

A period of record (POR) is defined as the data from oldest to newest, inclusive of the most recent data point requested. Sometimes, local USGS data managers set a different start date, which adjusts the time frame where statistical calculations are valid. This adjustment eliminates inconsistency in data due to changing datums, hydrologic regulation changes, or other impacts that may skew statistics.

Packages and logic used

NumPy

The code used to produce the statistics utilizes the Python library: NumPy for arithmetic mean, median, min, and max calculations.

View the NumPy documentation

Pandas

The Python library: Pandas holds the information to be computed in a data frame where functions can be applied.

View the Pandas documentation

Hydrologic Surface Water Analysis Package (HySWAP)

The USGS python package hyswap is used in calculation of day of year, water year, and calendar year percentile classes.

View the HySWAP documentation

Hydrologic Analysis Package (HASP)

The calculations performed to produce month of year statistics are a Python implementation of pertinent R code originally implemented in HASP.

View the HASP documentation

Calculation considerations

The logic below dictates how statistical calculations are made. These rules are shared for visibility and replicability.

Precision

The rounding logic takes a numeric value and rounds to three decimal places using Python’s built-in 'round()' function. The result is that the values returned will have more precision than the legacy system provided. Display values will have fewer than three decimal places if the raw calculation result has fewer than three decimal places.

Handling of non-numeric values

If a statistics calculation results in one or more nan (not a number) values, the API returns a 'nan' for the individual result or percentile class.

Visualization rules

Plots and graphs generated in the Water Data for the Nation user interface are rendered using the daily mean value in statistics calculations. Plots and graphs, or components of them, will not render if not enough data exist. For example, if a month only has 9 observations instead of the required 10 observations, it will not appear on a monthly duration graph.

Leap Years

For visualizing accumulated values for leap years, February 29th's day of year statistics are only accumulated if there are 10 years worth of data. This is true for any day of year statistics but generally all other days of the year will have data for the same number of years.

For more detailed information about leap year calculations, view the HySWAP leap year adjustment documentation.

Changes from legacy

NWISWeb (the legacy version of Water Data for the Nation) also provided statistical calculations and plots. The methodology used to derive statistics and generate plots has changed, and this document provides users full visibility into the methods currently used in Water Data for the Nation. NWISWeb and Water Data for the Nation statistics may not always be identical, and we encourage users to contact us with any questions.

Caveats and Constraints

Improved handling of significant figures will be addressed in a future release.

References

US Geological Survey. (n.d.). hyswap: Tools for hydrologic time-series data exchange and transformation. https://github.com/DOI-USGS/hyswap

US Geological Survey. (n.d.). HASP: Hydrologic Analysis Package. https://github.com/DOI-USGS/HASP

The pandas development team. (n.d.). pandas. https://pandas.pydata.org

NumPy Developers. (n.d.). NumPy. https://numpy.org

Disclaimer

Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner.