Your First Analysis#

This tutorial walks through a complete GuPPy pipeline run from raw CSV data to a PSTH plot. You will use the small sample dataset that lives inside the GuPPy repository (under Git LFS), so the only setup is the source-install steps below.

By the end you will have:

  • Launched the GuPPy GUI

  • Selected your data and reviewed the analysis parameters

  • Labeled your channels with storenames

  • Loaded the raw data into HDF5

  • Preprocessed the signal

  • Computed the PSTH

  • Visualized the results

Prerequisites#

  • GuPPy installed from source, with Git LFS. This tutorial uses sample CSV files that live in the GuPPy repository under stubbed_testing_data/csv/sample_data_csv_1/, which is tracked by Git LFS. A plain git clone will fetch only LFS pointer files (a few hundred bytes each) instead of the real CSVs, so you need both Git LFS installed on your machine and a clone with the relevant LFS payload pulled in:

    # Install Git LFS once per machine; see https://git-lfs.com for OS-specific instructions.
    git lfs install
    
    git clone https://github.com/LernerLab/GuPPy.git
    cd GuPPy
    git lfs pull --include="stubbed_testing_data/csv/sample_data_csv_1/*"
    pip install -e .
    

    See the README for the full installation guide, including how to set up a conda environment first. The plain pip install guppy-neuro path also works for installing the GUI itself, but you would still need to clone the repo (with Git LFS) separately to access the sample data.

  • The sample data lives at stubbed_testing_data/csv/sample_data_csv_1/ inside the cloned repository. It contains three CSV files:

    File

    Description

    Sample_Control_Channel.csv

    Isosbestic (405 nm) control channel

    Sample_Signal_Channel.csv

    Calcium-dependent (470 nm) signal channel

    Sample_TTL.csv

    Event timestamps

    Each signal CSV has three columns: timestamps (seconds), data (fluorescence), and sampling_rate. The TTL CSV has a single timestamps column.

Step 0: Launch the GuPPy GUI#

guppy

A browser tab opens showing the GuPPy dashboard.

GuPPy Input Parameters GUI homepage

The page is split into a sidebar on the left and a main area on the right. The sidebar lists the pipeline buttons in run order, from Save Input Parameters at the top through Visualization at the bottom, with a progress bar directly under each step that performs background work. The main area is where you select your data folder and configure parameters; settings are grouped into three collapsible cards: Individual Analysis (the only one we use in this tutorial), Group Analysis, and Visualization Parameters. The “Step N” labels in the sidebar match the step numbering used in the rest of this tutorial.

Step 1: Select your data and set parameters#

This step has two parts. You will pick the session folder you want to analyze, then look over (but not change) the analysis parameters that the rest of the pipeline will use.

Select your data#

Inside the Individual Analysis card, use the file browser at the top of the card to navigate to stubbed_testing_data/csv/sample_data_csv_1/. Click >> to move that folder into the Selected files pane on the right. The card supports selecting multiple session folders at once for batch analysis; for this tutorial we are running a single session.

GuPPy homepage Individual Analysis card showing the file browser with the sample_data_csv_1 folder available for selection

The Data Source toggle at the top lets you switch between local (the default, file-system browsing) and dandi (streaming NWB sessions directly from DANDI). We are using local files here.

Set parameters#

Below the file browser, the same Individual Analysis card lists the parameters that drive the rest of the pipeline. For this tutorial the defaults are fine, so you do not need to change anything; the screenshot below is for orientation, not for hunting and clicking.

GuPPy Individual Analysis card showing the parameter widgets: number of cores, combine data, isosbestic control, z-score method, baseline window, and PSTH window

Each parameter is documented in the parameter reference.

Step 2: Label your channels#

A storename is the human-readable label GuPPy uses for one of your data channels. Raw acquisition files come with cryptic, format-specific names (here, the CSV filenames Sample_Control_Channel, Sample_Signal_Channel, Sample_TTL); GuPPy needs you to map each one to a meaningful name like control_A, signal_A, or RewardPort. Those mapped names are what every downstream step (preprocessing, PSTH, plots, group analysis) refers to. The mapping is saved as storesList.csv inside an output folder created next to the session.

Click Open Storenames GUI in the sidebar. A new browser tab opens with the Storenames panel for the selected folder.

Storenames GUI with the three sample CSV channels listed as available options

The three CSV filenames appear in the left list (Filter available options) of the Store Names Selection widget. Walk through these substeps:

  1. Move all three channels to the right list. Click each of Sample_Control_Channel, Sample_Signal_Channel, Sample_TTL in the left list and use the >> button to move it to the right. (Or shift-click to select all three at once before clicking >>.)

  2. Click “Select Storenames”. A new Configure Storenames section appears below, with one row per channel. Each row has a Type dropdown and a Name text field.

    Storenames GUI after clicking Select Storenames, showing the Configure Storenames section with a Type dropdown and Name field for each of the three channels
  3. Fill out one row per channel. The Name field is a brain region identifier, not a description of the channel’s role. The control and signal rows must use the same Name because GuPPy uses that match to pair the isosbestic (405 nm) control trace with the calcium-dependent (470 nm) signal trace from the same fiber, then fits and subtracts one from the other during preprocessing to remove motion artifacts and photobleaching. In a multi-fiber recording, each fiber gets its own identifier (e.g. DMS, DLS) so each control is paired with its own signal. In this single-fiber tutorial we use A. The event TTL row uses a free-form event name instead.

    Channel

    Type

    Name

    Sample_Control_Channel

    control

    A

    Sample_Signal_Channel

    signal

    A

    Sample_TTL

    event TTLs

    RewardPort

    GuPPy combines the dropdown and text field into the final storename: control + A becomes control_A, signal + A becomes signal_A, and the event TTL keeps the name as-is (RewardPort). If the control and signal Name fields differ, saving will fail with Mismatched signal/control region pairs Every 'signal_<region>' must have a matching 'control_<region>'.

  4. Click “Show Selected Configuration”. This applies your row entries to the JSON editor below, which should now read something like:

    {
      "Sample_Control_Channel": ["control_A"],
      "Sample_Signal_Channel": ["signal_A"],
      "Sample_TTL": ["RewardPort"]
    }
    
  5. Choose the output directory. Use the over-write storeslist file or create a new one? menu button and select create_new_file.

    Despite the menu’s name, this choice does more than name a file. It picks the output directory for the entire analysis pipeline. From this point on, every downstream step (Read Raw Data, Preprocess, PSTH Computation, Visualization) writes its outputs (HDF5 files, PSTH results, plots) into that directory and reads storesList.csv from it to know which raw channel maps to which storename. create_new_file makes a fresh subdirectory inside the session folder, named <session>_output_<N>/ with <N> auto-incremented (_output_1 on the first run, _output_2 on the second, and so on).

    Note

    The other menu option, over_write_file, is for re-running on a session that already has an output subdirectory. It lets you point at an existing <session>_output_<N>/, deletes everything inside it (the previous storesList.csv plus any HDF5 and PSTH results from the previous run), and starts that subdirectory over fresh. Pick over_write_file only when you genuinely want that destructive behavior. For the tutorial, ignore it.

  6. Click Save. GuPPy creates the output subdirectory (e.g. sample_data_csv_1_output_1/) and writes storesList.csv into it. The downstream steps will read and write inside this folder.

You can close this Storenames tab and return to the original homepage tab to continue.

Step 3: Load the raw data#

Click Read Raw Data. A progress bar appears in the sidebar directly below the button and fills as the work runs. That bar is the primary signal that the step is in progress.

GuPPy homepage sidebar with the Read Raw Data progress bar partially filled

The other bars on the sidebar (under Preprocess and Remove Artifacts and PSTH Computation) appear pre-filled at 100% as a styling default; they reset to 0 and fill while their own step is running. So a fully-green bar does not mean that step is done, it just means it has not been touched yet.

GuPPy loads each CSV file and writes the data into the output folder you created in Step 2, one HDF5 file per storename (so for this tutorial: sample_data_csv_1_output_1/control_A.hdf5, .../signal_A.hdf5, .../RewardPort.hdf5). Each file holds the channel’s data, timestamps, and sampling_rate datasets plus a few metadata fields. HDF5 is a binary format that stores large numerical arrays efficiently and supports partial reads, which speeds up the later pipeline steps.

When the progress bar reaches 100% the step is complete. Confirmation messages are also logged to the terminal where you launched guppy.

Step 4: Preprocess the signal#

Click Preprocess. As with Read Raw Data, a progress bar appears in the sidebar directly below the button and fills as the work runs.

GuPPy homepage sidebar with the Preprocess and Remove Artifacts progress bar partially filled

GuPPy runs the following on the raw signal:

  1. Trims the first few seconds from both channels (the Eliminate first few seconds parameter, internally timeForLightsTurnOn).

  2. Applies a moving-average filter to reduce high-frequency noise (the Window for Moving Average filter parameter, internally filter_window).

  3. Fits the control channel to the signal channel using a linear regression, then subtracts it. This removes motion artifacts and photobleaching that affect both channels equally.

  4. Computes the z-score and the dF/F (delta F over F) of the corrected signal.

The results are written into the same output folder as Step 3, in four new HDF5 files per region. You do not choose the location or the file names; they follow a fixed convention:

File

Contents

z_score_A.hdf5

The z-scored trace

dff_A.hdf5

The dF/F trace

cntrl_sig_fit_A.hdf5

The fitted control trace (used internally and for artifact-removal plots)

timeCorrection_A.hdf5

Corrected timestamps, sampling rate, and a few related metadata fields

When preprocessing finishes, GuPPy opens a matplotlib window showing the preprocessed trace plotted against time. The default is the z-score; the plot_zScore_dff input parameter controls this and can be set to dff or Both instead. Close the matplotlib window to return control to the GUI.

Step 5: Compute the PSTH#

Click Compute PSTH. As with the previous two steps, a progress bar appears in the sidebar directly below the button and fills as the work runs.

GuPPy homepage sidebar with the PSTH Computation progress bar partially filled

GuPPy aligns the z-scored trace to each event timestamp in Sample_TTL.csv, extracts the window defined by the Seconds before 0 / Seconds after 0 parameters (internally nSecPrev, nSecPost) around each event, and averages across all trials. The result is a peri-stimulus time histogram (PSTH).

The default window is -10 to +20 seconds. With the sample data you will get a small number of trials (the TTL file has just a handful of timestamps), so the average will be noisy. This is expected for a minimal example dataset.

The outputs land in the same sample_data_csv_1_output_1/ directory you have been using since Step 2, with one set of files per (event, region) pair. For this tutorial that is the single pair (RewardPort, A):

File

Contents

RewardPort_A.hdf5

The peri-event timestamps (the x-axis of the PSTH)

RewardPort_A_z_score_A.h5

The PSTH dataframe: one column per trial, plus mean and err (standard error) columns and the timestamps column

RewardPort_A_baselineUncorrected_z_score_A.h5

Same dataframe before baseline correction was applied (kept for inspection)

peak_AUC_RewardPort_A_z_score_A.csv and matching .hdf5

Peak amplitude and area-under-curve for the trial-mean PSTH

The visualization step in Step 6 reads these files; you do not need to inspect them by hand.

Step 6: Visualize the results#

Back on the homepage, expand the Visualization Parameters card. Leave both settings at their defaults: z-score or ΔF/F? stays at z_score (the metric we computed in Step 4), and Visualize Average Results? stays at False. The latter is a group-analysis feature for averaging across multiple sessions and requires Average Group? to have been enabled during PSTH computation; we have a single session, so it does not apply here.

Click Open Visualization GUI in the sidebar. A new browser tab opens with the Visualization GUI for this session, organized into two tabs.

GuPPy Visualization GUI showing the PSTH tab with the RewardPort event selected

The PSTH tab is the default view. It shows the trial-aligned trace for one event with controls running down the left:

  • Event selector: which TTL channel to align to (here RewardPort).

  • X and Y dropdowns: what to plot on each axis. X is typically timestamps, Y can be mean (the trial average) or an individual trial like trial_1.

  • X Limit and Y Limit range sliders: restrict the displayed window.

  • Width Plot, Height Plot, Y Label, Save options dropdowns and a Save PSTH button: figure dimensions and export.

On the right is a trial multi-select (Trial # - Timestamps) and a Select mean and/or just trials checkbox group, which together let you overlay any combination of individual trials and the mean. With a TTL file containing only a handful of timestamps, the average will be noisy; this is expected for the minimal sample dataset.

The Heat Map tab shows trials stacked vertically with colour encoding the metric, which is the better view for inspecting trial-to-trial variability. Its controls mirror the PSTH tab: event selector, Color map (plasma, viridis, …), width and height, save options, and a trial picker.

For this tutorial, the goal is just to reach a rendered PSTH; feel free to play with the controls. A deeper walkthrough of the visualization options will live in an upcoming how-to guide.

Next steps#

  • See How-to Guides for task-specific instructions (TDT data ingestion, artifact removal, group analysis, etc.).

  • See Explanation for background on the isosbestic correction and z-score methods.