Streamflow calculations using the dilution tracer method

Overview

Calculating streamflow can be difficult in small mountainous streams due to changing flows, rocky bed surfaces, and steep, fast flowing sections followed by pools. While there are a multitude of ways one can calculate streamflow, a standard procedure for calculating streamflow is the dilution tracer method.

This method involves injecting the stream with a known amount of salt and measuring the electrical conductivity. A sensor is placed in the stream and starts collecting the background conductivity of the stream. Then, a known amount of salt is dissolved in a bucket of the stream water and is dumped upstream of the sensor. As the water passes past the sensor, the electrical conductivity will increase and then drop back down to the baseline conductivity once all the salt has passed through. This is called the breakthrough curve. Discharge (Q), or streamflow, can then be calculated from the area under the curve and the mass of the salt.

However, the sensor reads the electrical conductivity in uS/cm. So in order to be able to use the electrical conductivity, the values need to be applied to a calibration curve to covert to concentration. From there, discharge can be calculated in g/L.

While there are textbook standards of a calibration curve, it is more accurate to create a calibration curve for your specific sensor.

This can be completed in the lab using DI water and dissolving small amounts of salt into the water.

While this process sounds relatively simple, when I was trying to calculate streamflow with data I collected, there was no standard code I could follow. Thus, from many trials and error I was able to:

create a calibration curve
calculate discharge

Therefore, the purpose of this tutorial is to walk through how to create a calibration curve from data collected in the lab. Then how to use the slope from the calibration curve and apply it to the calculations for calculating discharge.

The data used in this tutorial is the raw data collected when creating the calibration curve. For 5 different salt concentrations (0g, 0.5g, 1.0g, 1.5g, and 2g) the data the sensor collected was:

the datetime
electrical conductivity in uS/cm
temperature

The stream dilution tracer data is an example dataset of the data the sensor collects which is:

the datetime
electrical conductivity in uS/cm
temperature

All datasets will be read in from a .csv file. Metadata sheets from the calibration data and streamflow are also included which include the important times needed during the calculations.

Calibration Curve

Read in the data

First we need to install and load in the packages needed.

#uncomment any of these that you don't already have installed
#install.packages("tidyverse") 
#install.packages("hms")
#install.packages("pracma")

# Load in the packages
library(tidyverse)
library(hms)
library(pracma)

Now we can read in DI_intervals_test2.csv. The data collected in this csv was from DI water, it shows the salt intervals, and was the second test we conducted.

# create a new dataframe by reading in the csv file DI_roomtemp_intervals_test2.csv
cal <- read_csv("DI_roomtemp_intervals_test2.csv", show_col_types = FALSE)

head(cal)

# A tibble: 6 × 1
  `Plot Title: BridgerCreek_EC`                                                 
  <chr>                                                                         
1 #,Date Time, GMT-06:00,Full Range, μS/cm (LGR S/N: 10778332, SEN S/N: 1077833…
2 1,03/24/26 11:32:34 AM,275.2,21.09,,,,,                                       
3 2,03/24/26 11:32:35 AM,274.8,21.07,,,,,                                       
4 3,03/24/26 11:32:36 AM,274.5,21.07,,,,,                                       
5 4,03/24/26 11:32:37 AM,274.1,21.07,,,,,                                       
6 5,03/24/26 11:32:38 AM,274.1,21.07,,,,,

Hmmm… well that doesn’t look right. Why don’t we read in the csv, skip the first column so the data is properly categorized into columns, and only keep columns 2 through 4, which is the datetime, conductivity, and temperature.

#re-read in the csv with better data organization
cal <- read_csv("DI_roomtemp_intervals_test2.csv",
                skip = 1,
                col_select = 2:4,
                show_col_types = FALSE)

head(cal)

# A tibble: 6 × 3
  `Date Time, GMT-06:00` Full Range, μS/cm (LGR S/N: 10…¹ Temp, °C (LGR S/N: 1…²
  <chr>                                             <dbl>                  <dbl>
1 03/24/26 11:32:34 AM                               275.                   21.1
2 03/24/26 11:32:35 AM                               275.                   21.1
3 03/24/26 11:32:36 AM                               274.                   21.1
4 03/24/26 11:32:37 AM                               274.                   21.1
5 03/24/26 11:32:38 AM                               274.                   21.1
6 03/24/26 11:32:39 AM                               274.                   21.0
# ℹ abbreviated names:
#   ¹`Full Range, μS/cm (LGR S/N: 10778332, SEN S/N: 10778332)`,
#   ²`Temp, °C (LGR S/N: 10778332, SEN S/N: 10778332)`

Okay that is better - but the column names are far too long and complicated. Let’s fix that! We want column one to be titled “time”, column 2 to be titled “conductivity_muS_cm”, and column 3 to be “temperature_C”. This gives informative titles that include the units of the measurements as well!

#Clean up column names
colnames(cal)[1] <- "time"
colnames(cal)[2] <- "conductivity_muS_cm"
colnames(cal)[3] <- "temperature_C"

head(cal)

# A tibble: 6 × 3
  time                 conductivity_muS_cm temperature_C
  <chr>                              <dbl>         <dbl>
1 03/24/26 11:32:34 AM                275.          21.1
2 03/24/26 11:32:35 AM                275.          21.1
3 03/24/26 11:32:36 AM                274.          21.1
4 03/24/26 11:32:37 AM                274.          21.1
5 03/24/26 11:32:38 AM                274.          21.1
6 03/24/26 11:32:39 AM                274.          21.0

Data exploration

Why don’t we go ahead and see what type of data we are working with using the str() function.

str(cal)

spc_tbl_ [2,117 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ time               : chr [1:2117] "03/24/26 11:32:34 AM" "03/24/26 11:32:35 AM" "03/24/26 11:32:36 AM" "03/24/26 11:32:37 AM" ...
 $ conductivity_muS_cm: num [1:2117] 275 275 274 274 274 ...
 $ temperature_C      : num [1:2117] 21.1 21.1 21.1 21.1 21.1 ...
 - attr(*, "spec")=
  .. cols(
  ..   `#` = col_skip(),
  ..   `Date Time, GMT-06:00` = col_character(),
  ..   `Full Range, μS/cm (LGR S/N: 10778332, SEN S/N: 10778332)` = col_double(),
  ..   `Temp, °C (LGR S/N: 10778332, SEN S/N: 10778332)` = col_double(),
  ..   `Coupler Detached (LGR S/N: 10778332)` = col_skip(),
  ..   `Coupler Attached (LGR S/N: 10778332)` = col_skip(),
  ..   `Host Connected (LGR S/N: 10778332)` = col_skip(),
  ..   `Stopped (LGR S/N: 10778332)` = col_skip(),
  ..   `End Of File (LGR S/N: 10778332)` = col_skip()
  .. )
 - attr(*, "problems")=<externalptr>

From this we can see that our time column is actually a character when we want it to be a time! Let’s fix that using the as.POSIXct(). Then let’s extract just the time stamp using as_hms() since we just need to work with hours, minutes, and seconds.

# convert time column to datetime
cal$time <- as.POSIXct(cal$time, format = "%m/%d/%y %I:%M:%S %p")
cal$time_only <- as_hms(cal$time)

Why don’t we run some basic summary statistics to get a general idea about our data:

summary(cal)

      time                     conductivity_muS_cm temperature_C  
 Min.   :2026-03-24 11:32:34   Min.   :  3.7       Min.   :20.71  
 1st Qu.:2026-03-24 11:41:23   1st Qu.: 41.1       1st Qu.:26.03  
 Median :2026-03-24 11:50:12   Median :414.2       Median :26.41  
 Mean   :2026-03-24 11:50:12   Mean   :386.9       Mean   :26.31  
 3rd Qu.:2026-03-24 11:59:01   3rd Qu.:669.0       3rd Qu.:26.81  
 Max.   :2026-03-24 12:07:50   Max.   :884.6       Max.   :27.61  
   time_only       
 Min.   :11:32:34  
 1st Qu.:11:41:23  
 Median :11:50:12  
 Mean   :11:50:12  
 3rd Qu.:11:59:01  
 Max.   :12:07:50

Let’s check that everything looks good before we move on.

str(cal)

spc_tbl_ [2,117 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ time               : POSIXct[1:2117], format: "2026-03-24 11:32:34" "2026-03-24 11:32:35" ...
 $ conductivity_muS_cm: num [1:2117] 275 275 274 274 274 ...
 $ temperature_C      : num [1:2117] 21.1 21.1 21.1 21.1 21.1 ...
 $ time_only          : 'hms' num [1:2117] 11:32:34 11:32:35 11:32:36 11:32:37 ...
  ..- attr(*, "units")= chr "secs"
 - attr(*, "spec")=
  .. cols(
  ..   `#` = col_skip(),
  ..   `Date Time, GMT-06:00` = col_character(),
  ..   `Full Range, μS/cm (LGR S/N: 10778332, SEN S/N: 10778332)` = col_double(),
  ..   `Temp, °C (LGR S/N: 10778332, SEN S/N: 10778332)` = col_double(),
  ..   `Coupler Detached (LGR S/N: 10778332)` = col_skip(),
  ..   `Coupler Attached (LGR S/N: 10778332)` = col_skip(),
  ..   `Host Connected (LGR S/N: 10778332)` = col_skip(),
  ..   `Stopped (LGR S/N: 10778332)` = col_skip(),
  ..   `End Of File (LGR S/N: 10778332)` = col_skip()
  .. )
 - attr(*, "problems")=<externalptr>

Great! Looks like we are good to go.

Finding average conductivity

To create our calibration curve, we need to find the average conductivity from our 5 salt samples:

0 g
0.5 g
1.0 g
1.5 g
2.0 g

For reference, the sensor was placed in a bucket with 5 L of DI water. At each salt increase, the sensor sat in the water for 5 minutes. The data is recorded in the “20260324_cal_metadata.docx” file. The 5 minutes was a buffer, so we will take the middle 3 minutes of the time range and take the average of that. For simplicity in this tutorial, I will provide the time ranges for each salt interval to use:

11:34 to 11:38
11:43 to 11:46
11:50 to 11:53
11:56 to 11:59
12:02 to 12:05

Now that we have the times, let’s calculate average conductivity for each interval. To do this, let’s define a new dataframe called cal_subset. We will use just the time_only variable and find the data that is greater than or equal to the lower time bound, and less than or equal to the upper time bound. We will then create a new dataframe called avg_xxxg that takes the mean conductivity_muS_cm in the cal_subet dataframe. We repeat this for each interval time and salt mass.

#calculate average conductivity from 11:34 to 11:38
# this is 0 g added salt
cal_subset <- cal[
  cal$time_only >= as_hms("11:34:00") & 
  cal$time_only <= as_hms("11:38:00"), ]
avg_0g <- mean(cal_subset$conductivity_muS_cm, na.rm = TRUE)

#repeat for 11:43 to 11:46
#this is 0.5 g of added salt
cal_subset <- cal[
  cal$time_only >= as_hms("11:43:00") & 
  cal$time_only <= as_hms("11:46:00"), ]
avg_500mg <- mean(cal_subset$conductivity_muS_cm, na.rm = TRUE)

#repeat for 11:50 to 11:53
#this is 1.0 g of added salt
cal_subset <- cal[
  cal$time_only >= as_hms("11:50:00") & 
  cal$time_only <= as_hms("11:53:00"), ]
avg_1000mg <- mean(cal_subset$conductivity_muS_cm, na.rm = TRUE)

#repeat for 11:56 to 11:59
#this is 1.5 g of added salt
cal_subset <- cal[
  cal$time_only >= as_hms("11:56:00") & 
  cal$time_only <= as_hms("11:59:00"), ]
avg_1500mg <- mean(cal_subset$conductivity_muS_cm, na.rm = TRUE)

# repeat for 12:02 to 12:05
#this is 2.0 g of added salt
cal_subset <- cal[
  cal$time_only >= as_hms("12:02:00") & 
  cal$time_only <= as_hms("12:05:00"), ]
avg_2000mg <- mean(cal_subset$conductivity_muS_cm, na.rm = TRUE)

Let’s print these values using the print() function so we can visually see all the values.

print(avg_0g)

[1] 11.79917

print(avg_500mg)

[1] 235.9934

print(avg_1000mg)

[1] 456.3558

print(avg_1500mg)

[1] 669.1912

print(avg_2000mg)

[1] 875.0735

Great now we have the average conductivity for each interval. Let’s plot those values!

Create calibration curve

First, let’s create a list of the salt values in milligrams. We will also create a list of the conductivity values in the same order as the salt values using the values stored in the data frames we created for each interval. Then we will just use the plot() function since we are working with two datasets that each contain only one variable. We will add axis titles using xlab and ylab and a main title using main.

milligram_values <- c(0, 500, 1000, 1500, 2000) #put the salt values into a list
conductivity_values <- c(avg_0g, avg_500mg, avg_1000mg, avg_1500mg, avg_2000mg) #use the average conductivity values we generated earlier and put them into a list
plot(milligram_values, conductivity_values, #use plot() since we are just working with basic x and y variables that are not in the same dataframe
     xlab = "Milligram of NaCl", 
     ylab = "Average Conductivity (µS/cm)",
     main = "Calibration Curve for NaCl Solutions - Roomtemp DI")

From first glance we can see that the points are somewhat linear - that’s great! But let’s fit a linear regression line to the plot to figure out the slope and intercept, which are needed later in the discharge calculations.

Create a new dataframe called model (this is because we are using the linear regression model). Then use lm() which applies the linear regression model to the data. lm() is set up as response ~ predictor, so we will set it up as conductivity_values ~ milligram_values since the conductivity values are dependent on the milligram values. To visually see the line we use abline() which adds a straight line through the current plot and color the line red.

#Linear regression for data
plot(milligram_values, conductivity_values)
model <- lm(conductivity_values ~ milligram_values)
abline(model, col = "red")

Now, let’s extract the slope and intercept value from the model. How we do this is using the coef() function. Let’s see what happens when we run that.

coef(model)

     (Intercept) milligram_values 
      17.7333142        0.4319493

Okay we can see that it gave us the intercept and also the slope, but the slope is recorded under “milligram_values” which is not very intuitive. Instead, let’s extract both of those values one at a time and assign them to a dataframe.

#pull out slope and intercept
slope <- coef(model)[2] #pulls out just slope
intercept <- coef(model)[1] #pulls out just intercept

We can also see the full output of the model using summary(). Let’s give that a try.

summary(model)


Call:
lm(formula = conductivity_values ~ milligram_values)

Residuals:
     1      2      3      4      5 
-5.934  2.285  6.673  3.534 -6.558 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      17.733314   5.300378   3.346   0.0442 *  
milligram_values  0.431949   0.004328  99.809 2.22e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.843 on 3 degrees of freedom
Multiple R-squared:  0.9997,    Adjusted R-squared:  0.9996 
F-statistic:  9962 on 1 and 3 DF,  p-value: 2.217e-06

Now let’s print the results so we can see what the values are!

print(slope)

milligram_values 
       0.4319493

print(intercept)

(Intercept) 
   17.73331

Well those labels don’t make a lot of sense. Let’s try again using cat which allows you concatenate and print. To use the function, we are first putting the text we want into the “quote” followed by the dataframe we are using, and then we use "\n" to create a new line so everything isn’t squished!

If you want to know more, type ?cat into the console!

#display slope and intercept
cat("Slope:", slope, "\n")

Slope: 0.4319493

cat("Intercept:", intercept, "\n")

Intercept: 17.73331

We have now successfully created a calibration curve and have extracted the slope and intercept values that we will apply in the next section: discharge calculations!