图像分割之数据处理

标签：无分类：未分类创建时间：2024-12-24 01:51:50 更新时间：2025-04-28 14:37:41

1.前言

遥感图像分割，其实就是图像分割，这里我介绍下使用 ArcPro 进行数据集制作的方法。数据获取，可以从参考文章中去查看，这里不在说明了。[数据集]26个语义分割类遥感影像数据集推荐这里的很多数据集好像都不能用了，有需要自己去查看吧。我查看了一下，像 LoveDA 这种数据集，也是使用的 mask 标签图像的形式进行标注的。

LoveDA
用于域自适应语义分割的遥感土地覆盖数据集。5987 个图像芯片（Google Earth），7 个土地覆盖类别，166768 个标签，包含中国 3 个城市。1）背景，2）建筑，3）道路，4）水体，5）裸土，6）林地，7）耕地，0）无效值（使用时应被忽略）
FloodNet Dataset
数据是使用小型无人机平台 DJI Mavic Pro 四轴飞行器收集的。整个数据集有 2343 张图像，分为训练集（~~60%）、验证集（~~20%）和测试集（~20%）。语义分割标签包括：1）背景，2）建筑物被淹，3）建筑物未淹，4）道路被淹，5）道路未淹，6）水，7）树，8）车辆，9）水池， 10) 草。
iSAID数据集
iSAID是第一个用于航空图像分割的基准数据集。这个大规模且注释密集的数据集包含2806张高分辨率图像中15个类别的655451个对象实例。iSAID的显著特征如下：（a）具有高空间分辨率的大量图像，（b）15个重要且常见的类别，（c）每个类别有大量实例，（d）每个图像有大量标记实例，这可能有助于学习上下文信息，（e）巨大的对象尺度变化，包含小、中、大对象，通常在同一幅图像中，（f）图像中具有不同方向的物体分布不平衡且不均匀，描绘了现实生活中的空中条件，（g）几个外观模糊的小尺寸物体只能通过上下文推理来解决，（h）由专业注释者进行精确的实例级注释，由符合明确定义的指南的专家注释器进行交叉检查和验证。1）船舶 2）储罐 3）棒球场 4）网球场 5）篮球场 6）田径场 7）桥梁 8）大型车辆 9）小型车辆 10）直升机 11）直升机 12）环岛 13）足球场 14）飞机 15）港口

参考文章:
【1】.标注对象以供深度学习使用 ArcPro也自带了深度学习标注工具，可以参考使用。
【2】.超经典！分割任务数据集介绍。
【3】.语义分割和数据集
【4】.Visual Object Classes Challenge 2012 (VOC2012) 这里可以下载 voc 的数据。
【5】.遥感图像数据标注工具（3） - X-AnyLabeling 这好像是一个标注工具，但是我没有进行测试，这里写了是遥感图像标注工具，但是具体我不太理解，基本上都是标注的常规的图像，但是出人意料的，是 star 还挺多的。
【6】.基于Mask R-CNN的遥感影像疏林地智能识别方法这是一篇论文，数据集生成结果自动存放在cv2_mask,JSON,labelme_JSON和pic等4个文件夹中。
【7】.遥感影像-实例分割数据集：iSAID 从切图到YOLO格式数据集制作详细介绍开源数据集isaid标注包含实例分割，但是原始影像太大，很吃显存，一般显卡无法用原始影像直接训练，所以需要对影像进行裁剪，并生成对应的标签，因为想用yolo系列跑模型，所以将标签需要转为txt格式。
【8】.遥感影像-语义分割数据集：LoveDA数据集详细介绍及训练样本处理流程

2.制作Mask标注掩膜

使用 ArcMap 进行标注，主要步骤包括下面几步，参考文章中也是这么做的，不过其实也是用的ArcPro,只是没有用自带的深度学习工具。

【1】.遥感图像语义分割数据集制作（使用ArcGIS Pro）这篇文章说的很明白了。1.欧空局ESA 哨兵数据;2.欧空局ESA 哨兵数据;3.Google Earth Engine（GEE）;4.地理空间数据云;5.NOAA系列卫星数据;6.吉林一号高分辨率影像;

2.1.加载遥感影像

这个步骤不讲了，就是ArcPro的基本操作。

参考文章:
【1】.Python 遥感图像分类 python遥感图像分割这里用了 ArcMap 进行了图像分割。
【2】.ArcGIS Pro深度学习中的数据标注实践我感觉这是百度用大模型生成的文档，里面的内容也是错误百出。

2.2.新建shp文件

这个步骤不讲了，就是ArcPro的基本操作。

2.3.标注目标地物

选中新建的 shp 图层，选择编辑，点击创建，选择面，然后开始勾画要提取的区域。

2.4.设置属性

选中新建的 shp 图层，右键打开属性表，选择 id 字段（没有新建一个），然后讲该字段值通过字段计算器都设置为 255，或者其他值，值255后续可用于创建Label文件时作为目标地物的像元值，该部分也可以在开始时新建一个属性字段 value 用于保存目标地物的像元值。

参考文章:
【1】.图像二值化

2.5.将矢量数据转为栅格数据

在地里处理工具箱中，查找 “面转栅格” 工具，然后将 shp 文件转为栅格数据。

在该部分中，

输入要素：选择最开始创建的shapefile图层，
值字段：选择所设置像元灰度值的字段名称，
输出的栅格数据集：选择自己的路径，
像元分配类型：以像元中心，
像元大小选择：进行标注的影像。
随后点击环境配置：在该部分中，像元大小，捕获栅格输出的坐标均要以最开始参考的影像一致，不然会出现导出的栅格图层与原始的影像像元数不对应的情况, “处理范围” 也要和影像一直。

注意
(1) 开始的时候，我总是无法转换，最后竟然是因为我目录的问题，最好在一个单独的目录下面。

（2）矢量转栅格，生成的像素要和原始影像一直，如果像素不一致该如何去识别呢？如果直接采用生成的正射影像进行数据处理，可能生成的栅格和原始图像不一致，这样就没版本进行切分了，最好就是先把正射影像进行切分为正方形，然后再进行标注，然后进行矢量转栅格。

参考文章:
【1】.裁剪栅格 (数据管理)
【2】.ArcGIS/ArcMap中对栅格数据进行裁剪切割的方法 1.数据管理->栅格->栅格处理->裁剪，设定好四个顶点坐标即可。2.在空间分析工具extraction中，提供了按照矩形提取的操作。
【3】.面转栅格 (转换)

2.6.裁剪

保持标签和遥感影像有一致的行列号。将矢量面转栅格，可能出现的问题就是和原始切片的像素大小不一致，这个情况下，就需要进行处理，要把需要训练的影像，和创建的矢量影像进行取交集操作。我尝试了很久，都无法找到合适的方案，矢量转换的山歌虽然和待训练的象元大小一致，但是范围不一致，也就是说行列号不一致。

【解决方案】
创建一个矩形框，按照这个矩形框，将标注图层和影像图层重新切一遍，这样就可以保证两者相素一致了。

参考文章:
【1】.Over (Spatial Analyst)
【2】.ArcGIS栅格计算器求2个栅格数据的交集（区域）
【3】.栅格计算器初试

2.7.数据值制作

于遥感影像的像元数量（即宽度和高度）往往较大且不规则，而深度学习模型的训练则需要规则的像元大小（如256×256或512×512），因此需要对数据进行进一步处理，以符合网络训练的要求。

import os
from osgeo import gdal
import numpy as np
from tqdm import tqdm
 
# 读取数据
def read_image(image_path, num_bands=None, selected_bands=None):
    dataset = gdal.Open(image_path)
    if dataset is None:
        print(f"Could not open image: {image_path}")
        return None
    else:
        if num_bands is None:
            num_bands = dataset.RasterCount
        image_data = []
        for i in range(1, num_bands + 1):
            if selected_bands is not None and i not in selected_bands:
                continue
            band = dataset.GetRasterBand(i)
            band_data = band.ReadAsArray()
            image_data.append(band_data)
        return np.array(image_data)
 
# 滑动裁剪
def sliding_crop(image, window_size=(512, 512), stride=256):
    height, width = image.shape[1], image.shape[2]
 
    # 计算需要填充的高度和宽度
    pad_height = 0
    pad_width = 0
    if height % window_size[0] != 0:
        pad_height = window_size[0] - (height % window_size[0])
    if width % window_size[1] != 0:
        pad_width = window_size[1] - (width % window_size[1])
 
    # 在图像右侧和下侧填充0值
    padded_image = np.pad(image, ((0, 0), (0, pad_height),
                          (0, pad_width)), mode='constant', constant_values=0)
 
    crops = []
    for y in range(0, height + pad_height - window_size[0] + 1, stride):
        for x in range(0, width + pad_width - window_size[1] + 1, stride):
            crop = padded_image[:, y:y+window_size[0], x:x+window_size[1]]
            crops.append(crop)
    return crops
 
# 保存数据
def save_crops(crops, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
 
    existing_files = os.listdir(output_dir)
    existing_indices = set()
    for filename in existing_files:
        if filename.startswith("crop_") and filename.endswith(".tif"):
            index_str = filename.split("_")[1].split(".")[0]
            existing_indices.add(int(index_str))
 
    start_index = max(existing_indices) + 1 if existing_indices else 0
 
    for i, crop in enumerate(crops):
        output_path = os.path.join(output_dir, f"crop_{start_index + i}.tif")
        save_image(crop, output_path)
 
def save_image(image_data, output_path):
    num_bands, height, width = image_data.shape
    driver = gdal.GetDriverByName("GTiff")
    dataset = driver.Create(output_path, width, height,
                            num_bands, gdal.GDT_Byte)
    for i in range(num_bands):
        dataset.GetRasterBand(i + 1).WriteArray(image_data[i])
    dataset.FlushCache()
 
 
if __name__ == "__main__":
    # 影像的路径
    image_path = r"xxxx.tif"
    # 标签的路径
    label_path = r"xxxx.tif"
    images = read_image(image_path)
    labels = read_image(label_path, 1)
    if images.shape[1:] == labels.shape[1:]:
        print("Images have same dimensions. Starting cropping...")
        # 设置滑动窗口的大小及步长，用于生成重叠的滑动裁剪块
        images_crops = sliding_crop(images, window_size=(512, 512), stride=256)
        labels_crops = sliding_crop(labels, window_size=(512, 512), stride=256)
        # 保存裁剪后的图像
        save_crops(images_crops,
                   r"H:\Images")# 影像块的文件夹
        save_crops(labels_crops,
                   r"H:\Labels")# 标签块的文件夹
        print("Cropping done and crops saved!")
    else:
        print("Images have different dimensions. Cannot proceed with cropping.")

参考文章:
【1】.制作属于自己的遥感影像语义分割数据集这里也提供了部分的代码：1.利用ArcGIS或QGIS软件创建矢量文件，并对影像区域的目标地物进行勾画，需要注意在地物勾画时，目标地物需要全部勾画，不关心的背景地物无需勾画，同时矢量文件与影像的坐标系要保持一致。2.添加分类标签，在勾画的矢量属性表中新建一个名为value或者classvalue的字段，字段类型为整形。针对需要提取的地物进行字段赋值，以1为起始依次按照顺序对不同地物进行赋值，通常背景地物在后续样本制作环节会自动标注为0。4.矢量文件转栅格，这里我们基于python的gdal库将矢量文件转换为与影像范围、分辨率对应的栅格文件。

3.Mask转txt

通过上面的方法，获取到的实际上是掩膜，但是使用 yolo 进行训练，就需要转换为 txt 格式。
（1）方案一
方案一使用的是从通义千问那里得到的代码，但是我总觉得哪里不对劲，因为训练处理的东西好像不太对。

import cv2
import os


path = r"D:\zlc\drone\图像分割\栅格\Labels"
files = os.listdir(path)
for file in files:
    name = file.split('.')[0]
    file_path = os.path.join(path,name+'.tif')
    img = cv2.imread(file_path)
    # img = cv2.imread(path)
    H,W=img.shape[0:2]
    print(H,W)

    #img1 = cv2.imread("F:/Deep_Learning/Model/YOLOv8_Seg/Dataset/images/20160222_080933_361_1.jpg")

    gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    ret,bin_img = cv2.threshold(gray_img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
    cnt,hit = cv2.findContours(bin_img,cv2.RETR_TREE,cv2.CHAIN_APPROX_TC89_KCOS)

    #cv2.drawContours(img1,cnt,-1,(0,255,0),5)

    cnt = list(cnt)
    f = open("D:\zlc\drone\图像分割\栅格\yolo-labels/{}.txt".format(file.split(".")[0]), "a+")
    for j in cnt:
        result = []
        pre = j[0]
        for i in j:
            if abs(i[0][0] - pre[0][0]) > 1 or abs(i[0][1] - pre[0][1]) > 1:# 在这里可以调整间隔点，我设置为1
                pre = i
                temp = list(i[0])
                temp[0] /= W
                temp[1] /= H
                result.append(temp)

                #cv2.circle(img1,i[0],1,(0,0,255),2)

        print(result)
        print(len(result))

        # if len(result) != 0:

        if len(result) != 0:
            f.write("0 ")
            for line in result:
                line = str(line)[1:-2].replace(",","")
                # print(line)
                f.write(line+" ")
            f.write("\n")
    f.close()

（2）方案二
方案二使用的是yolo自带的工具 convert_segment_masks_to_yolo_seg(“path/to/masks_directory”, “path/to/output/directory”, classes=80)

from ultralytics.utils import LOGGER
from pathlib import Path
import cv2
import numpy as np

def convert_segment_masks_to_yolo_seg(masks_dir, output_dir, classes):
    """
    Converts a dataset of segmentation mask images to the YOLO segmentation format.

    This function takes the directory containing the binary format mask images and converts them into YOLO segmentation format.
    The converted masks are saved in the specified output directory.

    Args:
        masks_dir (str): The path to the directory where all mask images (png, jpg) are stored.
        output_dir (str): The path to the directory where the converted YOLO segmentation masks will be stored.
        classes (int): Total classes in the dataset i.e. for COCO classes=80

    Example:
        ```python
        from ultralytics.data.converter import convert_segment_masks_to_yolo_seg

        # The classes here is the total classes in the dataset, for COCO dataset we have 80 classes
        convert_segment_masks_to_yolo_seg("path/to/masks_directory", "path/to/output/directory", classes=80)

Notes:
    The expected directory structure for the masks is:

        - masks
            ├─ mask_image_01.png or mask_image_01.jpg
            ├─ mask_image_02.png or mask_image_02.jpg
            ├─ mask_image_03.png or mask_image_03.jpg
            └─ mask_image_04.png or mask_image_04.jpg

    After execution, the labels will be organized in the following structure:

        - output_dir
            ├─ mask_yolo_01.txt
            ├─ mask_yolo_02.txt
            ├─ mask_yolo_03.txt
            └─ mask_yolo_04.txt
"""
for mask_path in Path(masks_dir).iterdir():
    if mask_path.suffix == ".png":
        mask = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE)  # Read the mask image in grayscale
        img_height, img_width = mask.shape  # Get image dimensions
        LOGGER.info(f"Processing {mask_path} imgsz = {img_height} x {img_width}")

        unique_values = np.unique(mask)  # Get unique pixel values representing different classes
        yolo_format_data = []

        for value in unique_values:
            if value == 0:
                continue  # Skip background
            class_index = value
            if class_index == -1:
                LOGGER.warning(f"Unknown class for pixel value {value} in file {mask_path}, skipping.")
                continue

            # Create a binary mask for the current class and find contours
            contours, _ = cv2.findContours(
                (mask == value).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
            )  # Find contours

            for contour in contours:
                if len(contour) >= 3:  # YOLO requires at least 3 points for a valid segmentation
                    contour = contour.squeeze()  # Remove single-dimensional entries
                    yolo_format = [class_index]
                    for point in contour:
                        # Normalize the coordinates
                        yolo_format.append(round(point[0] / img_width, 6))  # Rounding to 6 decimal places
                        yolo_format.append(round(point[1] / img_height, 6))
                    yolo_format_data.append(yolo_format)
        # Save Ultralytics YOLO format data to file
        output_path = Path(output_dir) / f"{mask_path.stem}.txt"
        with open(output_path, "w") as file:
            for item in yolo_format_data:
                line = " ".join(map(str, item))
                file.write(line + "\n")
        LOGGER.info(f"Processed and stored at {output_path} imgsz = {img_height} x {img_width}")

if name == “main“:
# The classes here is the total classes in the dataset.
# for COCO dataset we have 80 classes.
convert_segment_masks_to_yolo_seg(masks_dir=”D:/zlc/drone/drone-train/data/land/tiles/png”, output_dir=”D:/zlc/drone/drone-train/data/land/tiles/labels”, classes=2)


参考文章:
【1】.[将mask的图片标签转换为yolo的txt标签](https://blog.csdn.net/qq_41701723/article/details/135449035) 这篇文章的代码可以使用，将mask的图片标签转换为yolo的txt标签,获取外轮廓
【2】.[将注记从Mask-RCNN数据集格式转换为COCO格式](https://cloud.tencent.com/developer/information/%E5%B0%86%E6%B3%A8%E8%AE%B0%E4%BB%8EMask-RCNN%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2%E4%B8%BACOCO%E6%A0%BC%E5%BC%8F) 1.Mask R-CNN 数据集格式和 COCO 格式的主要区别在于标注信息的组织方式。Mask R-CNN 通常包含以下文件：images/：图像文件, annotations/：包含标注框和掩码的 JSON 文件。2.COCO 格式包含以下文件：images/：图像文件,annotations/：包含标注框、掩码和类别信息的 JSON 文件。
【3】.[Mask标注转YOLO格式txt文件](https://blog.csdn.net/my_wx/article/details/140798592)
【4】.[mask-to-annotation](https://github.com/matthewkenely/mask-to-annotation) mask-to-annotation 是一款功能强大的高效工具，用于从二进制和彩色掩模中自动生成 COCO、YOLO 和 VGG 等流行计算机视觉格式的注释。
【5】.[Binary mask images to instead yolo annotation format ](https://github.com/ultralytics/ultralytics/issues/3085) The YOLOv8 repo offers a useful script, create_masks.py, which can be used to generate YOLO annotations from binary masks.
【6】.[常见数据集格式转换](https://www.unnamedtat.xyz/posts/10be164/) 1.mask转换为json（coco数据集的话自己再改一下）。2.可视化json。3.json转化为YOLO格式。
【7】.[简单实用](https://docs.ultralytics.com/zh/usage/simple-utilities/) 这里的代码更加的简单，只需要使用ultralytics自带的工具 convert_segment_masks_to_yolo_seg 就可以了。
【8】.[Yolov8-seg：制作并训练自己的数据集+提取并重建mask](https://blog.csdn.net/XY_39/article/details/136673733) 这里使用了 sam 进行了半自动标注。
【9】.[掩码mask图像标注转yolo格式](https://blog.csdn.net/Y5823990/article/details/143917406) classes是你的数据有多少类别，这里需要注意，类别与掩码的值对应，比如说你这里classes=2，那么你的掩码值必须是第一类为1，第二类为2。不能是通常的255，会警告并且转换输出为空。
【10】.[ultralytics.data.converter.convert_segment_masks_to_yolo_seg](https://docs.ultralytics.com/reference/data/converter/#ultralytics.data.converter.convert_coco) 这是官网的说明。

# 4.影像分割
图像分割一般尺寸要求都是很小的，但是遥感影像的分辨率尺寸特别的大，需要进行切割。切割成较小的尺寸，单独进行图像分割，然后再进行合并。根据 [Python|遥感影像语义分割：使用Python(GDAL)制作遥感影像语义分割数据集](https://www.cnblogs.com/tangjielin/p/18288301) 这篇文章，提供了下面的方法。
```python
import os
from osgeo import gdal
import numpy as np


#  读取tif数据集
def readTif(fileName):
   dataset = gdal.Open(fileName)
   if dataset is None:
      print(fileName + "文件无法打开")
   return dataset


#  保存tif文件函数
def writeTiff(im_data, im_geotrans, im_proj, path):
   if 'int8' in im_data.dtype.name:
      datatype = gdal.GDT_Byte
   elif 'int16' in im_data.dtype.name:
      datatype = gdal.GDT_UInt16
   else:
      datatype = gdal.GDT_Float32
   if len(im_data.shape) == 3:
      im_bands, im_height, im_width = im_data.shape
   elif len(im_data.shape) == 2:
      im_data = np.array([im_data])
      im_bands, im_height, im_width = im_data.shape
   # 创建文件
   driver = gdal.GetDriverByName("GTiff")
   dataset = driver.Create(path, int(im_width), int(im_height), int(im_bands), datatype)
   if dataset is not None:
      dataset.SetGeoTransform(im_geotrans)  # 写入仿射变换参数
      dataset.SetProjection(im_proj)  # 写入投影
   for i in range(im_bands):
      dataset.GetRasterBand(i + 1).WriteArray(im_data[i])
   del dataset


def TifCrop(TifPath, SavePath, CropSize, RepetitionRate):
   """
    滑动窗口裁剪函数
    TifPath 影像路径
    SavePath 裁剪后保存目录
    CropSize 裁剪尺寸
    RepetitionRate 重复率
    """
   dataset_img = readTif(TifPath)
   width = dataset_img.RasterXSize
   height = dataset_img.RasterYSize
   proj = dataset_img.GetProjection()
   geotrans = dataset_img.GetGeoTransform()
   img = dataset_img.ReadAsArray(0, 0, width, height)  # 获取数据

   #  获取当前文件夹的文件个数len,并以len+1命名即将裁剪得到的图像
   new_name = len(os.listdir(SavePath))

   #  裁剪图片,重复率为RepetitionRate
   for i in range(int((height - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))):
      for j in range(int((width - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))):
         #  如果图像是单波段
         if len(img.shape) == 2:
            cropped = img[
                      int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize,
                      int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize]
         #  如果图像是多波段
         else:
            cropped = img[:,
                      int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize,
                      int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize]
         #  写图像
         writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name)
         #  文件名 + 1
         new_name = new_name + 1

   #  向前裁剪最后一列
   for i in range(int((height - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))):
      if len(img.shape) == 2:
         cropped = img[int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize,
                   (width - CropSize): width]
      else:
         cropped = img[:,
                   int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize,
                   (width - CropSize): width]
      #  写图像
      writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name)
      new_name = new_name + 1

   #  向前裁剪最后一行
   for j in range(int((width - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))):
      if len(img.shape) == 2:
         cropped = img[(height - CropSize): height,
                   int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize]
      else:
         cropped = img[:, (height - CropSize): height,
                   int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize]
      writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name)
      #  文件名 + 1
      new_name = new_name + 1

   #  裁剪右下角
   if len(img.shape) == 2:
      cropped = img[(height - CropSize): height, (width - CropSize): width]
   else:
      cropped = img[:, (height - CropSize): height, (width - CropSize): width]
   writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name)
   new_name = new_name + 1


# 训练集和验证集都要裁剪
# 裁剪图像特征。拿到影像数据增强中进行数据增强
# 将影像1裁剪为重复率为0.5的256×256的数据集
if __name__ == '__main__':
   TifCrop("D:\Dataset_Authoring\EnShi.tif",
           "D:\Dataset_Authoring\EnShi_data", 512, 0.5)

5.合并

分割识别之后的图像，需要重新合并，形成一个大尺寸的遥感影像，才能查看全貌。我查了很多的资料，最后还是使用了阿里云的通义千问解决了代码问题。

import os
import re
import numpy as np
import cv2

def merge_images_with_overlap(folder_path, row_overlap_width, col_overlap_height):
    # 正则表达式匹配 crop_{row}_{col}.tif 格式的文件名
    pattern = re.compile(r'crop_(\d+)_(\d+)\.tif$')

    images = {}
    missing_images = set()

    # 遍历文件夹中的所有文件并记录所有的 (row, col) 键
    for filename in os.listdir(folder_path):
        match = pattern.match(filename)
        if match:
            try:
                row, col = map(int, match.groups())
                images[(row, col)] = os.path.join(folder_path, filename)
            except ValueError as e:
                print(f"转换行或列编号时出错: {e}，文件名: {filename}")
                continue

    if not images:
        print("没有找到匹配的图像文件")
        return None

    # 获取实际存在的行和列集合
    rows = {key[0] for key in images.keys()}
    cols = {key[1] for key in images.keys()}

    # 计算 max_row 和 max_col
    max_row = max(rows) if rows else -1
    max_col = max(cols) if cols else -1

    # 检查是否有缺失的图像文件
    for row in range(max_row + 1):
        for col in range(max_col + 1):
            if (row, col) not in images:
                missing_images.add((row, col))

    if missing_images:
        print("警告：以下图像文件缺失:")
        for missing in missing_images:
            print(f"行 {missing[0]} 列 {missing[1]}")

    # 假设所有的图片尺寸是一致的
    sample_image_path = list(images.values())[0]
    try:
        sample_image = cv2.imread(sample_image_path, cv2.IMREAD_COLOR)
        image_height, image_width, _ = sample_image.shape
    except Exception as e:
        print(f"无法打开样本图像: {sample_image_path}, 错误: {e}")
        return None

    result_width = (max_col + 1) * image_width - max_col * col_overlap_height
    result_height = (max_row + 1) * image_height - max_row * row_overlap_width

    # 创建一个新的空白图像来存放最终结果
    final_result = np.zeros((result_height, result_width, 3), dtype=np.uint8)

    # 处理每一行每一列
    for row in range(max_row + 1):
        for col in range(max_col + 1):
            path = images.get((row, col))
            if not path:
                print(f"跳过缺失的行 {row} 列 {col} 的图像")
                continue

            try:
                img = cv2.imread(path, cv2.IMREAD_COLOR)

                left = col * (image_width - col_overlap_height)
                upper = row * (image_height - row_overlap_width)
                right = left + image_width
                lower = upper + image_height

                # 如果不是第一列，则需要处理左边的重叠区域
                if col > 0 and (row, col-1) in images:
                    prev_img_path = images.get((row, col-1))
                    if prev_img_path:
                        prev_img = cv2.imread(prev_img_path, cv2.IMREAD_COLOR)
                        blended_part = cv2.addWeighted(
                            prev_img[:, -(col_overlap_height):, :], 0.5,
                            img[:, :col_overlap_height, :], 0.5, 0
                        )
                        final_result[upper:lower, left:left+col_overlap_height, :] = blended_part

                # 如果不是第一行，则需要处理上面的重叠区域
                if row > 0 and (row-1, col) in images:
                    prev_img_path = images.get((row-1, col))
                    if prev_img_path:
                        prev_img = cv2.imread(prev_img_path, cv2.IMREAD_COLOR)
                        blended_part = cv2.addWeighted(
                            prev_img[-row_overlap_width:, :, :], 0.5,
                            img[:row_overlap_width, :, :], 0.5, 0
                        )
                        final_result[upper:upper+row_overlap_width, left:right, :] = blended_part

                # 粘贴非重叠部分
                non_overlap_box = (
                    left + (col_overlap_height if col > 0 else 0),
                    upper + (row_overlap_width if row > 0 else 0),
                    right,
                    lower
                )
                final_result[non_overlap_box[1]:non_overlap_box[3], non_overlap_box[0]:non_overlap_box[2], :] = \
                    img[non_overlap_box[1]-upper:non_overlap_box[3]-upper, non_overlap_box[0]-left:non_overlap_box[2]-left, :]
            except Exception as e:
                print(f"处理图像时发生错误: {path}, 错误: {e}")

    # 保存最终结果图像
    output_path = os.path.join(folder_path, 'merged_image.tif')
    try:
        cv2.imwrite(output_path, final_result)
        print(f"合并后的图像已保存至: {output_path}")
    except Exception as e:
        print(f"保存结果图像时出错: {e}")

# 使用函数，指定文件夹路径、行重叠宽度和列重叠高度
folder_path = r'D:\zlc\drone\图像分割\栅格\Images'
row_overlap_width = 256  # 行之间重叠的宽度（像素）
col_overlap_height = 256  # 列之间重叠的高度（像素）

merge_images_with_overlap(folder_path, row_overlap_width, col_overlap_height)

参考文章:
【1】.【Python笔记】遥感影像(tif)合并与分块
【2】.多个合并成一个tif文件python
【3】.Python遥感开发之批量对TIF数据合并通常对遥感数据按照月和年或者季节进行分析，我们需要对我们下载的8天或者16天数据按照需求进行合并，对数据的合并一般可以采取均值法、最大值法等。本篇博客以月数据为基础，按照春夏秋冬的季节对月数据进行合并。
【4】.Python遥感开发之批量拼接和分割 1.遥感图像无交错的批量拼接。2.遥感图像有交错的批量拼接。
【5】.基于python实现.tif格式遥感影像的镶嵌处理（合并）此代码可以实现多个 .tif 文件合并成一个，略微不足之处，此代码目前没有实现批量处理。
【6】.【Python】使用gdal.WarpOptions完成tif影像拼接和目标截取这里通过gdal.Wrap函数进行了图像的拼接。
【7】.将切割后的小图片还原为大图片
【8】.python gdal遥感影像基础操作（读写、增加波段、添加坐标系）
【9】.影像镶嵌 python 1.读取输入图像。2.图像特征检测与匹配。3.计算单应性矩阵。4.图像变换与拼接。
【10】.遥感图像的拼接和镶嵌
【11】.Python中GDAL基于栅格影像叠加提取另一景栅格影像的像元数值
【12】.GDAL+Python实现栅格影像处理之拼接镶嵌Mosaic 1.通过镶嵌原理自定义方法实现。由于我的两幅图像位置固定，未根据其他情况进行拼接。读者可以参考这种方式，因为理解思路最重要。2.方法2采用gdal.Warp()提供的接口进行镶嵌。

往期推荐

文章目录

微信公众号

广告位

诚心邀请广大金主爸爸洽谈合作

每日一省

isNaN 和 Number.isNaN 函数的区别？

1.函数 isNaN 接收参数后，会尝试将这个参数转换为数值，任何不能被转换为数值的的值都会返回 true，因此非数字值传入也会返回 true ，会影响 NaN 的判断。

2.函数 Number.isNaN 会首先判断传入参数是否为数字，如果是数字再继续判断是否为 NaN ，不会进行数据类型的转换，这种方法对于 NaN 的判断更为准确。

每日二省

为什么0.1+0.2 ! == 0.3，如何让其相等?

一个直接的解决方法就是设置一个误差范围，通常称为“机器精度”。对JavaScript来说，这个值通常为2-52，在ES6中，提供了Number.EPSILON属性，而它的值就是2-52，只要判断0.1+0.2-0.3是否小于Number.EPSILON，如果小于，就可以判断为0.1+0.2 ===0.3。

每日三省

== 操作符的强制类型转换规则？

1.首先会判断两者类型是否**相同，**相同的话就比较两者的大小。

2.类型不相同的话，就会进行类型转换。

3.会先判断是否在对比 null 和 undefined，是的话就会返回 true。

4.判断两者类型是否为 string 和 number，是的话就会将字符串转换为 number。

5.判断其中一方是否为 boolean，是的话就会把 boolean 转为 number 再进行判断。

6.判断其中一方是否为 object 且另一方为 string、number 或者 symbol，是的话就会把 object 转为原始类型再进行判断。

每日英语

Happiness is time precipitation, smile is the lonely sad.

幸福是年华的沉淀，微笑是寂寞的悲伤。