Data Loaders¶

There are two dataloaders available to make working with the provided data more straightforward.

Data loader providing spectral data and labels for a single pixel. Useful for scikit-learn classifiers
Pytorch dataset and Pytorch Ligthning dataloader providing image chips together with labels

In this notebook we show how these dataloaders can be used.

Pixel dataloader¶

In [2]:

Copied!

from disfor.datasets import TabularDataset
from disfor.datasets import TabularDataset

The class TabularDataset provides arguments to filter the dataset and returns class properties which can be used for training of sklearn classifiers.

In [3]:

Copied!





data = TabularDataset(
    # If None, data gets dynamically downloaded and cached from Huggingface
    data_folder=None,
    # selecting healthy forest (110), clear cut (211) and bark beetle (231)
    target_classes=[110, 211, 231],
    # we remap salvage logging (221 and 222) to also be part of the clear cut class
    # this happens before filtering the target_classes. This means, that all values in the
    # mapping dict need to be in target_classes to be included
    class_mapping_overrides={221: 211, 222: 211},
    # subset to only include samples with high confidence
    confidence=["high"],
    # only include acquisitions from "leaf-on" months
    months=[5, 6, 7, 8, 9],
    # including also dark pixels (2) as valid
    valid_scl_values=[2, 4, 5, 6],
    # only include acquisitions where the clear cut is recent (maximum of 90 days),
    # for all other classes include everything
    max_days_since_event={211: 90},
    max_samples_per_event=5,
    # omit samples which have low tcd in the comment
    omit_low_tcd=True,
    # omit samples which have border in the comment
    omit_border=True,
)
data = TabularDataset(
    # If None, data gets dynamically downloaded and cached from Huggingface
    data_folder=None,
    # selecting healthy forest (110), clear cut (211) and bark beetle (231)
    target_classes=[110, 211, 231],
    # we remap salvage logging (221 and 222) to also be part of the clear cut class
    # this happens before filtering the target_classes. This means, that all values in the
    # mapping dict need to be in target_classes to be included
    class_mapping_overrides={221: 211, 222: 211},
    # subset to only include samples with high confidence
    confidence=["high"],
    # only include acquisitions from "leaf-on" months
    months=[5, 6, 7, 8, 9],
    # including also dark pixels (2) as valid
    valid_scl_values=[2, 4, 5, 6],
    # only include acquisitions where the clear cut is recent (maximum of 90 days),
    # for all other classes include everything
    max_days_since_event={211: 90},
    max_samples_per_event=5,
    # omit samples which have low tcd in the comment
    omit_low_tcd=True,
    # omit samples which have border in the comment
    omit_border=True,
)

Once initialized, the class instance provides train and test data as numpy arrays.

In [4]:

Copied!

print(data.y_train, data.X_train, data.y_test, data.X_test, sep="\n")
print(data.y_train, data.X_train, data.y_test, data.X_test, sep="\n")

[2 2 0 ... 0 0 0]
[[ 522  632  648 ... 2199 1711 1078]
 [ 632  760  758 ... 2568 1995 1150]
 [ 164  333  229 ... 2614  954  422]
 ...
 [ 194  203  133 ... 1163  555  259]
 [ 208  198  148 ... 1518  692  319]
 [ 119  148  109 ... 1916  905  382]]
[0 0 1 ... 0 0 0]
[[ 224  396  291 ... 2060 1204  593]
 [ 185  416  304 ... 2202 1315  682]
 [ 493  734  936 ... 2708 2751 1605]
 ...
 [ 207  430  361 ... 2271 1855 1139]
 [ 578  790  760 ... 2749 2297 1453]
 [ 484  722  606 ... 3479 2669 1611]]

It also provides the used label encoder, to go from the 0 to n-1 encoded labels back to the original labels.

In [5]:

Copied!

data.encoder.inverse_transform(data.y_test)[:10]
data.encoder.inverse_transform(data.y_test)[:10]

Out[5]:

array([110, 110, 211, 110, 110, 110, 211, 211, 211, 211])

Now, let's quickly train a Random Forest model and validate the output:

In [6]:

Copied!

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(oob_score=True)
rf.fit(data.X_train, data.y_train)

print(rf.oob_score_)
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(oob_score=True)
rf.fit(data.X_train, data.y_train)

print(rf.oob_score_)

0.9209039548022598

The out of box accuracy for the Random Forest model is 0.92. However let's use the held out set to get a better idea of the model accuracy. For this we apply the trained model on the held out predictors (X_test) and derive accuracy metrics from this.

In [7]:

Copied!





from sklearn.metrics import ConfusionMatrixDisplay, classification_report

y_pred = rf.predict(data.X_test)
print(
    classification_report(
        data.y_test, y_pred, target_names=data.encoder.classes_.astype(str)
    )
)

disp = ConfusionMatrixDisplay.from_predictions(
    data.y_test,
    y_pred,
    display_labels=["Undisturbed (110)", "Clear Cut (211)", "Bark Beetle (231)"],
    normalize="true",
)
from sklearn.metrics import ConfusionMatrixDisplay, classification_report

y_pred = rf.predict(data.X_test)
print(
    classification_report(
        data.y_test, y_pred, target_names=data.encoder.classes_.astype(str)
    )
)

disp = ConfusionMatrixDisplay.from_predictions(
    data.y_test,
    y_pred,
    display_labels=["Undisturbed (110)", "Clear Cut (211)", "Bark Beetle (231)"],
    normalize="true",
)

              precision    recall  f1-score   support

         110       0.93      0.97      0.95      1203
         211       0.81      0.70      0.75       161
         231       0.62      0.41      0.50        92

    accuracy                           0.90      1456
   macro avg       0.79      0.69      0.73      1456
weighted avg       0.90      0.90      0.90      1456

No description has been provided for this image

We can see that the undisturbed class is predicted well, however the other two classes are not predicted particularly well. Especially the precision is not great.

Pytorch dataloader¶

The pytorch dataloader is used for loading image chips. If it is the first time loading the data, it will be downloaded from Huggingface. For this at least 80GB of free disk space is necessary. After the first time loading, around 35GB of space will be used.

The first data loading can take quite some time to download and extract the data.

In [8]:

Copied!





from disfor.datasets import MonoTemporalClassification

tiff_dataset = MonoTemporalClassification(
    # If None, data gets dynamically downloaded and cached from Huggingface
    data_folder=None,
    # selecting healthy forest (110), clear cut (211) and bark beetle (231)
    target_classes=[110, 211, 231],
    # reduce the size of the chip, to include less context
    chip_size=8,
    # subset to only include samples with high confidence
    confidence=["high"],
    # only include acquisitions from "leaf-on" months
    months=[5, 6, 7, 8, 9],
    # only include acquisitions where the clear cut is recent (maximum of 90 days),
    # for all other classes include everything
    max_days_since_event={211: 90},
    max_samples_per_event=5,
    # omit samples which have low tcd in the comment
    omit_low_tcd=True,
    # omit samples which have border in the comment
    omit_border=True,
)
from disfor.datasets import MonoTemporalClassification

tiff_dataset = MonoTemporalClassification(
    # If None, data gets dynamically downloaded and cached from Huggingface
    data_folder=None,
    # selecting healthy forest (110), clear cut (211) and bark beetle (231)
    target_classes=[110, 211, 231],
    # reduce the size of the chip, to include less context
    chip_size=8,
    # subset to only include samples with high confidence
    confidence=["high"],
    # only include acquisitions from "leaf-on" months
    months=[5, 6, 7, 8, 9],
    # only include acquisitions where the clear cut is recent (maximum of 90 days),
    # for all other classes include everything
    max_days_since_event={211: 90},
    max_samples_per_event=5,
    # omit samples which have low tcd in the comment
    omit_low_tcd=True,
    # omit samples which have border in the comment
    omit_border=True,
)

The dataset returns a dictionary with the image, label and path of the image.

In [9]:

Copied!

tiff_dataset[0]
tiff_dataset[0]

Out[9]:

{'image': tensor([[[0.0448, 0.0448, 0.0514, 0.0582, 0.0548, 0.0516, 0.0512, 0.0504],
          [0.0459, 0.0465, 0.0537, 0.0554, 0.0508, 0.0466, 0.0496, 0.0500],
          [0.0471, 0.0470, 0.0542, 0.0562, 0.0506, 0.0474, 0.0509, 0.0524],
          [0.0440, 0.0443, 0.0534, 0.0559, 0.0506, 0.0513, 0.0518, 0.0540],
          [0.0470, 0.0512, 0.0568, 0.0562, 0.0522, 0.0527, 0.0588, 0.0588],
          [0.0500, 0.0532, 0.0608, 0.0582, 0.0536, 0.0524, 0.0558, 0.0614],
          [0.0460, 0.0494, 0.0555, 0.0534, 0.0536, 0.0530, 0.0560, 0.0626],
          [0.0462, 0.0433, 0.0478, 0.0482, 0.0517, 0.0562, 0.0574, 0.0548]],
 
         [[0.0704, 0.0664, 0.0632, 0.0692, 0.0674, 0.0630, 0.0634, 0.0713],
          [0.0661, 0.0732, 0.0694, 0.0666, 0.0662, 0.0598, 0.0611, 0.0692],
          [0.0660, 0.0690, 0.0648, 0.0604, 0.0618, 0.0574, 0.0637, 0.0717],
          [0.0670, 0.0575, 0.0632, 0.0642, 0.0638, 0.0618, 0.0694, 0.0754],
          [0.0667, 0.0618, 0.0672, 0.0668, 0.0632, 0.0637, 0.0702, 0.0826],
          [0.0678, 0.0682, 0.0704, 0.0700, 0.0645, 0.0625, 0.0694, 0.0872],
          [0.0626, 0.0634, 0.0638, 0.0631, 0.0632, 0.0640, 0.0700, 0.0824],
          [0.0578, 0.0589, 0.0626, 0.0636, 0.0648, 0.0687, 0.0706, 0.0730]],
 
         [[0.0396, 0.0418, 0.0606, 0.0728, 0.0710, 0.0650, 0.0624, 0.0535],
          [0.0374, 0.0430, 0.0636, 0.0636, 0.0642, 0.0559, 0.0570, 0.0543],
          [0.0368, 0.0434, 0.0557, 0.0674, 0.0644, 0.0571, 0.0612, 0.0563],
          [0.0386, 0.0472, 0.0602, 0.0686, 0.0623, 0.0664, 0.0645, 0.0622],
          [0.0390, 0.0572, 0.0722, 0.0725, 0.0648, 0.0664, 0.0760, 0.0838],
          [0.0400, 0.0612, 0.0763, 0.0780, 0.0702, 0.0677, 0.0786, 0.0871],
          [0.0424, 0.0522, 0.0673, 0.0674, 0.0672, 0.0722, 0.0830, 0.0902],
          [0.0396, 0.0378, 0.0511, 0.0658, 0.0708, 0.0756, 0.0835, 0.0732]],
 
         [[0.1128, 0.1145, 0.1145, 0.1023, 0.1023, 0.0977, 0.0977, 0.0971],
          [0.1065, 0.1008, 0.1008, 0.0978, 0.0978, 0.0945, 0.0945, 0.1060],
          [0.1065, 0.1008, 0.1008, 0.0978, 0.0978, 0.0945, 0.0945, 0.1060],
          [0.1004, 0.1032, 0.1032, 0.0989, 0.0989, 0.1076, 0.1076, 0.1303],
          [0.1004, 0.1032, 0.1032, 0.0989, 0.0989, 0.1076, 0.1076, 0.1303],
          [0.0943, 0.1041, 0.1041, 0.1029, 0.1029, 0.1117, 0.1117, 0.1455],
          [0.0943, 0.1041, 0.1041, 0.1029, 0.1029, 0.1117, 0.1117, 0.1455],
          [0.0898, 0.0927, 0.0927, 0.0996, 0.0996, 0.1096, 0.1096, 0.1127]],
 
         [[0.4431, 0.2933, 0.2933, 0.1756, 0.1756, 0.1955, 0.1955, 0.2538],
          [0.3975, 0.2596, 0.2596, 0.1761, 0.1761, 0.1819, 0.1819, 0.2490],
          [0.3975, 0.2596, 0.2596, 0.1761, 0.1761, 0.1819, 0.1819, 0.2490],
          [0.3717, 0.2216, 0.2216, 0.1723, 0.1723, 0.1791, 0.1791, 0.2648],
          [0.3717, 0.2216, 0.2216, 0.1723, 0.1723, 0.1791, 0.1791, 0.2648],
          [0.3349, 0.2223, 0.2223, 0.1611, 0.1611, 0.1774, 0.1774, 0.2449],
          [0.3349, 0.2223, 0.2223, 0.1611, 0.1611, 0.1774, 0.1774, 0.2449],
          [0.2948, 0.2508, 0.2508, 0.2002, 0.2002, 0.1708, 0.1708, 0.1706]],
 
         [[0.5606, 0.3849, 0.3849, 0.1957, 0.1957, 0.2262, 0.2262, 0.3015],
          [0.5281, 0.3177, 0.3177, 0.1941, 0.1941, 0.2235, 0.2235, 0.2963],
          [0.5281, 0.3177, 0.3177, 0.1941, 0.1941, 0.2235, 0.2235, 0.2963],
          [0.4868, 0.2619, 0.2619, 0.1993, 0.1993, 0.2107, 0.2107, 0.3002],
          [0.4868, 0.2619, 0.2619, 0.1993, 0.1993, 0.2107, 0.2107, 0.3002],
          [0.4440, 0.2564, 0.2564, 0.1926, 0.1926, 0.2043, 0.2043, 0.2831],
          [0.4440, 0.2564, 0.2564, 0.1926, 0.1926, 0.2043, 0.2043, 0.2831],
          [0.3800, 0.3076, 0.3076, 0.2427, 0.2427, 0.1960, 0.1960, 0.2015]],
 
         [[0.6180, 0.4956, 0.2720, 0.2370, 0.2282, 0.2228, 0.2562, 0.3224],
          [0.5692, 0.4920, 0.3028, 0.2218, 0.2168, 0.2188, 0.2454, 0.3100],
          [0.5020, 0.3996, 0.2796, 0.2160, 0.2104, 0.2124, 0.2420, 0.2854],
          [0.4720, 0.3028, 0.2382, 0.2318, 0.2198, 0.2126, 0.2512, 0.3112],
          [0.4940, 0.3028, 0.2522, 0.2328, 0.2172, 0.2112, 0.2424, 0.3276],
          [0.4952, 0.3264, 0.2524, 0.2278, 0.2180, 0.2166, 0.2234, 0.3120],
          [0.4364, 0.3324, 0.2232, 0.2046, 0.2084, 0.2100, 0.2150, 0.2886],
          [0.3288, 0.3160, 0.2950, 0.2352, 0.2084, 0.2204, 0.2158, 0.2366]],
 
         [[0.6057, 0.4248, 0.4248, 0.2250, 0.2250, 0.2593, 0.2593, 0.3387],
          [0.5530, 0.3576, 0.3576, 0.2184, 0.2184, 0.2475, 0.2475, 0.3180],
          [0.5530, 0.3576, 0.3576, 0.2184, 0.2184, 0.2475, 0.2475, 0.3180],
          [0.5337, 0.3043, 0.3043, 0.2199, 0.2199, 0.2465, 0.2465, 0.3330],
          [0.5337, 0.3043, 0.3043, 0.2199, 0.2199, 0.2465, 0.2465, 0.3330],
          [0.4618, 0.2949, 0.2949, 0.2269, 0.2269, 0.2282, 0.2282, 0.3200],
          [0.4618, 0.2949, 0.2949, 0.2269, 0.2269, 0.2282, 0.2282, 0.3200],
          [0.4113, 0.3368, 0.3368, 0.2654, 0.2654, 0.2243, 0.2243, 0.2340]],
 
         [[0.2310, 0.2115, 0.2115, 0.1764, 0.1764, 0.1614, 0.1614, 0.1598],
          [0.2167, 0.1935, 0.1935, 0.1683, 0.1683, 0.1610, 0.1610, 0.1717],
          [0.2167, 0.1935, 0.1935, 0.1683, 0.1683, 0.1610, 0.1610, 0.1717],
          [0.2068, 0.1843, 0.1843, 0.1711, 0.1711, 0.1838, 0.1838, 0.2053],
          [0.2068, 0.1843, 0.1843, 0.1711, 0.1711, 0.1838, 0.1838, 0.2053],
          [0.1993, 0.1865, 0.1865, 0.1810, 0.1810, 0.1968, 0.1968, 0.2413],
          [0.1993, 0.1865, 0.1865, 0.1810, 0.1810, 0.1968, 0.1968, 0.2413],
          [0.1764, 0.1708, 0.1708, 0.1832, 0.1832, 0.1930, 0.1930, 0.2090]],
 
         [[0.1105, 0.1160, 0.1160, 0.1074, 0.1074, 0.0926, 0.0926, 0.0826],
          [0.1057, 0.1062, 0.1062, 0.1040, 0.1040, 0.0949, 0.0949, 0.0911],
          [0.1057, 0.1062, 0.1062, 0.1040, 0.1040, 0.0949, 0.0949, 0.0911],
          [0.0981, 0.1066, 0.1066, 0.1078, 0.1078, 0.1055, 0.1055, 0.1176],
          [0.0981, 0.1066, 0.1066, 0.1078, 0.1078, 0.1055, 0.1055, 0.1176],
          [0.0976, 0.1088, 0.1088, 0.1129, 0.1129, 0.1217, 0.1217, 0.1507],
          [0.0976, 0.1088, 0.1088, 0.1129, 0.1129, 0.1217, 0.1217, 0.1507],
          [0.0863, 0.0950, 0.0950, 0.1083, 0.1083, 0.1199, 0.1199, 0.1254]]]),
 'label': tensor(2),
 'path': 'C:\\Users\\Jonas.Viehweger\\AppData\\Local\\disfor\\disfor\\Cache\\0.1.0\\tiffs\\797\\2020-06-05.tif'}

The image can also be plotted.

In [10]:

Copied!

tiff_dataset.plot_chip(5001)
tiff_dataset.plot_chip(5001)

Pytorch Lightning¶

For use with Pytorch Lightning, a Lightning datamodule is also available. This datamodule takes care of splitting the dataset into a training and validation set, so that the training progress can be monitored.

The datamodule only takes a few extra parameters, like batch_size, num_workers and persist_workers. All of the reminaing parameters are passed as keyword arguments to TiffDataset.

In [11]:

Copied!





from disfor.datasets import MonoTemporalClassificationDataModule

tiff_datamodule = MonoTemporalClassificationDataModule(
    batch_size=64,
    num_workers=6,
    # Keyword arguments are passed to TiffDataset
    # selecting healthy forest (110), clear cut (211) and bark beetle (231)
    target_classes=[110, 211, 231],
    # reduce the size of the chip, to include less context
    chip_size=8,
    # subset to only include samples with high confidence
    confidence=["high"],
    # only include acquisitions from "leaf-on" months
    months=[5, 6, 7, 8, 9],
    # only include acquisitions where the clear cut is recent (maximum of 90 days),
    # for all other classes include everything
    max_days_since_event={211: 90},
    max_samples_per_event=5,
    # omit samples which have low tcd in the comment
    omit_low_tcd=True,
    # omit samples which have border in the comment
    omit_border=True,
)
from disfor.datasets import MonoTemporalClassificationDataModule

tiff_datamodule = MonoTemporalClassificationDataModule(
    batch_size=64,
    num_workers=6,
    # Keyword arguments are passed to TiffDataset
    # selecting healthy forest (110), clear cut (211) and bark beetle (231)
    target_classes=[110, 211, 231],
    # reduce the size of the chip, to include less context
    chip_size=8,
    # subset to only include samples with high confidence
    confidence=["high"],
    # only include acquisitions from "leaf-on" months
    months=[5, 6, 7, 8, 9],
    # only include acquisitions where the clear cut is recent (maximum of 90 days),
    # for all other classes include everything
    max_days_since_event={211: 90},
    max_samples_per_event=5,
    # omit samples which have low tcd in the comment
    omit_low_tcd=True,
    # omit samples which have border in the comment
    omit_border=True,
)

To test this datamodule we are defining a very simple neural network to predict classes from the input images.

In [12]:

Copied!





import torch
import torch.nn as nn
import lightning as L


class SimpleClassifier(L.LightningModule):
    def __init__(self, num_classes=2, lr=1e-3):
        super().__init__()
        self.lr = lr

        # Simple feedforward network
        # Input: 10 channels * 8 * 8 = 640 features
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(10 * 8 * 8, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(64, num_classes),
        )

        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch["image"], batch["label"]
        logits = self(x)
        loss = self.criterion(logits, y)
        self.log("train_loss", loss, prog_bar=True, batch_size=len(batch["label"]))
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch["image"], batch["label"]
        logits = self(x)
        loss = self.criterion(logits, y)

        # Calculate accuracy
        preds = torch.argmax(logits, dim=1)
        acc = (preds == y).float().mean()

        self.log("val_loss", loss, prog_bar=True, batch_size=len(batch["label"]))
        self.log("val_acc", acc, prog_bar=True, batch_size=len(batch["label"]))
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)
import torch
import torch.nn as nn
import lightning as L


class SimpleClassifier(L.LightningModule):
    def __init__(self, num_classes=2, lr=1e-3):
        super().__init__()
        self.lr = lr

        # Simple feedforward network
        # Input: 10 channels * 8 * 8 = 640 features
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(10 * 8 * 8, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(64, num_classes),
        )

        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch["image"], batch["label"]
        logits = self(x)
        loss = self.criterion(logits, y)
        self.log("train_loss", loss, prog_bar=True, batch_size=len(batch["label"]))
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch["image"], batch["label"]
        logits = self(x)
        loss = self.criterion(logits, y)

        # Calculate accuracy
        preds = torch.argmax(logits, dim=1)
        acc = (preds == y).float().mean()

        self.log("val_loss", loss, prog_bar=True, batch_size=len(batch["label"]))
        self.log("val_acc", acc, prog_bar=True, batch_size=len(batch["label"]))
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.lr)

Finally we train the neural net using the data from our datamodule. As an example we are only going for 20 epochs.

In [13]:

Copied!

model = SimpleClassifier(num_classes=3, lr=1e-3)

# Train with your dataloader
trainer = L.Trainer(max_epochs=20)
trainer.fit(model, datamodule=tiff_datamodule)
model = SimpleClassifier(num_classes=3, lr=1e-3)

# Train with your dataloader
trainer = L.Trainer(max_epochs=20)
trainer.fit(model, datamodule=tiff_datamodule)

💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
c:\Users\Jonas.Viehweger\Documents\Projects\2025\DISFOR\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\logger_connector\logger_connector.py:76: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default

  | Name      | Type             | Params | Mode 
-------------------------------------------------------
0 | model     | Sequential       | 90.5 K | train
1 | criterion | CrossEntropyLoss | 0      | train
-------------------------------------------------------
90.5 K    Trainable params
0         Non-trainable params
90.5 K    Total params
0.362     Total estimated model params size (MB)
10        Modules in train mode
0         Modules in eval mode

c:\Users\Jonas.Viehweger\Documents\Projects\2025\DISFOR\.venv\Lib\site-packages\torch\utils\data\dataloader.py:666: UserWarning: 'pin_memory' argument is set as true but no accelerator is found, then device pinned memory won't be used.
  warnings.warn(warn_msg)

Epoch 19: 100%|██████████| 213/213 [00:04<00:00, 44.52it/s, v_num=7, train_loss=0.147, val_loss=0.212, val_acc=0.926]

`Trainer.fit` stopped: `max_epochs=20` reached.

Epoch 19: 100%|██████████| 213/213 [00:04<00:00, 44.49it/s, v_num=7, train_loss=0.147, val_loss=0.212, val_acc=0.926]

Now, let's look at the confusion matrix of the trained neural net:

In [14]:

Copied!





# After training, run validation and collect predictions
model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for batch in tiff_datamodule.val_dataloader():
        x, y = batch["image"], batch["label"]
        logits = model(x)
        preds = torch.argmax(logits, dim=1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(y.cpu().numpy())
# After training, run validation and collect predictions
model.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for batch in tiff_datamodule.val_dataloader():
        x, y = batch["image"], batch["label"]
        logits = model(x)
        preds = torch.argmax(logits, dim=1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(y.cpu().numpy())

In [ ]:

Copied!





print(
    classification_report(
        all_labels,
        all_preds,
        target_names=["Healthy (110)", "Clear Cut (211)", "Bark Beetle (231)"],
    )
)

disp = ConfusionMatrixDisplay.from_predictions(
    all_labels,
    all_preds,
    display_labels=["Healthy (110)", "Clear Cut (211)", "Bark Beetle (231)"],
    normalize="true",
)
print(
    classification_report(
        all_labels,
        all_preds,
        target_names=["Healthy (110)", "Clear Cut (211)", "Bark Beetle (231)"],
    )
)

disp = ConfusionMatrixDisplay.from_predictions(
    all_labels,
    all_preds,
    display_labels=["Healthy (110)", "Clear Cut (211)", "Bark Beetle (231)"],
    normalize="true",
)

                   precision    recall  f1-score   support

    Healthy (110)       0.95      0.97      0.96      1203
  Clear Cut (211)       0.79      0.75      0.77       161
Bark Beetle (231)       0.85      0.63      0.72        92

         accuracy                           0.93      1456
        macro avg       0.86      0.78      0.82      1456
     weighted avg       0.92      0.93      0.92      1456

The Kernel crashed while executing code in the current cell or a previous cell. 

Please review the code in the cell(s) to identify a possible cause of the failure. 

Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. 

View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.

After 20 epochs of training this very simple neural net, the model is outperforming the pixel based random forest model. Especially the class bark beetle is predicted more accurately.

These models are just toy examples to show the integration of the provided datasets into training pipelines.