1.前言 遥感图像分割,其实就是图像分割,这里我介绍下使用 ArcPro 进行数据集制作的方法。数据获取,可以从参考文章中去查看,这里不在说明了。[数据集]26个语义分割类遥感影像数据集推荐 这里的很多数据集好像都不能用了,有需要自己去查看吧。我查看了一下,像 LoveDA 这种数据集,也是使用的 mask 标签图像的形式进行标注的。
LoveDA 用于域自适应语义分割的遥感土地覆盖数据集。5987 个图像芯片(Google Earth),7 个土地覆盖类别,166768 个标签,包含中国 3 个城市。1)背景,2)建筑,3)道路,4)水体,5)裸土,6)林地,7)耕地,0)无效值(使用时应被忽略)
FloodNet Dataset 数据是使用小型无人机平台 DJI Mavic Pro 四轴飞行器收集的。整个数据集有 2343 张图像,分为训练集(60%)、验证集(20%)和测试集(~20%)。语义分割标签包括:1)背景,2)建筑物被淹,3)建筑物未淹,4)道路被淹,5)道路未淹,6)水,7)树,8)车辆,9)水池, 10) 草。
iSAID数据集 iSAID是第一个用于航空图像分割的基准数据集。这个大规模且注释密集的数据集包含2806张高分辨率图像中15个类别的655451个对象实例。iSAID的显著特征如下:(a)具有高空间分辨率的大量图像,(b)15个重要且常见的类别,(c)每个类别有大量实例,(d)每个图像有大量标记实例,这可能有助于学习上下文信息,(e)巨大的对象尺度变化,包含小、中、大对象,通常在同一幅图像中,(f)图像中具有不同方向的物体分布不平衡且不均匀,描绘了现实生活中的空中条件,(g)几个外观模糊的小尺寸物体只能通过上下文推理来解决,(h)由专业注释者进行精确的实例级注释,由符合明确定义的指南的专家注释器进行交叉检查和验证。1)船舶 2)储罐 3)棒球场 4)网球场 5)篮球场 6)田径场 7)桥梁 8)大型车辆 9)小型车辆 10)直升机 11)直升机 12)环岛 13)足球场 14)飞机 15)港口
2.制作Mask标注掩膜 使用 ArcMap 进行标注,主要步骤包括下面几步,参考文章中也是这么做的,不过其实也是用的ArcPro,只是没有用自带的深度学习工具。
【1】.遥感图像语义分割数据集制作(使用ArcGIS Pro) 这篇文章说的很明白了。1.欧空局ESA 哨兵数据;2.欧空局ESA 哨兵数据;3.Google Earth Engine(GEE);4.地理空间数据云;5.NOAA系列卫星数据;6.吉林一号高分辨率影像;
2.1.加载遥感影像 这个步骤不讲了,就是ArcPro的基本操作。
2.2.新建shp文件 这个步骤不讲了,就是ArcPro的基本操作。
2.3.标注目标地物 选中新建的 shp 图层,选择编辑,点击创建,选择面,然后开始勾画要提取的区域。
2.4.设置属性 选中新建的 shp 图层,右键打开属性表,选择 id 字段(没有新建一个),然后讲该字段值通过字段计算器都设置为 255,或者其他值,值255后续可用于创建Label文件时作为目标地物的像元值,该部分也可以在开始时新建一个属性字段 value 用于保存目标地物的像元值。
2.5.将矢量数据转为栅格数据 在地里处理工具箱中,查找 “面转栅格” 工具,然后将 shp 文件转为栅格数据。
在该部分中,
输入要素:选择最开始创建的shapefile图层,
值字段:选择所设置像元灰度值的字段名称,
输出的栅格数据集:选择自己的路径,
像元分配类型:以像元中心,
像元大小选择:进行标注的影像。
随后点击环境配置:在该部分中,像元大小,捕获栅格输出的坐标均要以最开始参考的影像一致,不然会出现导出的栅格图层与原始的影像像元数不对应的情况, “处理范围” 也要和影像一直。
注意 (1) 开始的时候,我总是无法转换,最后竟然是因为我目录的问题,最好在一个单独的目录下面。
(2)矢量转栅格,生成的像素要和原始影像一直,如果像素不一致该如何去识别呢?如果直接采用生成的正射影像进行数据处理,可能生成的栅格和原始图像不一致,这样就没版本进行切分了,最好就是先把正射影像进行切分为正方形,然后再进行标注,然后进行矢量转栅格。
2.6.裁剪 保持标签和遥感影像有一致的行列号。将矢量面转栅格,可能出现的问题就是和原始切片的像素大小不一致,这个情况下,就需要进行处理,要把需要训练的影像,和创建的矢量影像进行取 交集操作。我尝试了很久,都无法找到合适的方案,矢量转换的山歌虽然和待训练的象元大小一致,但是范围不一致,也就是说行列号不一致。
【解决方案】 创建一个矩形框,按照这个矩形框,将标注图层和影像图层重新切一遍,这样就可以保证两者相素一致了。
2.7.数据值制作 于遥感影像的像元数量(即宽度和高度)往往较大且不规则,而深度学习模型的训练则需要规则的像元大小(如256×256或512×512),因此需要对数据进行进一步处理,以符合网络训练的要求。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 import osfrom osgeo import gdalimport numpy as npfrom tqdm import tqdm def read_image (image_path, num_bands=None , selected_bands=None ): dataset = gdal.Open(image_path) if dataset is None : print (f"Could not open image: {image_path} " ) return None else : if num_bands is None : num_bands = dataset.RasterCount image_data = [] for i in range (1 , num_bands + 1 ): if selected_bands is not None and i not in selected_bands: continue band = dataset.GetRasterBand(i) band_data = band.ReadAsArray() image_data.append(band_data) return np.array(image_data) def sliding_crop (image, window_size=(512 , 512 ), stride=256 ): height, width = image.shape[1 ], image.shape[2 ] pad_height = 0 pad_width = 0 if height % window_size[0 ] != 0 : pad_height = window_size[0 ] - (height % window_size[0 ]) if width % window_size[1 ] != 0 : pad_width = window_size[1 ] - (width % window_size[1 ]) padded_image = np.pad(image, ((0 , 0 ), (0 , pad_height), (0 , pad_width)), mode='constant' , constant_values=0 ) crops = [] for y in range (0 , height + pad_height - window_size[0 ] + 1 , stride): for x in range (0 , width + pad_width - window_size[1 ] + 1 , stride): crop = padded_image[:, y:y+window_size[0 ], x:x+window_size[1 ]] crops.append(crop) return crops def save_crops (crops, output_dir ): if not os.path.exists(output_dir): os.makedirs(output_dir) existing_files = os.listdir(output_dir) existing_indices = set () for filename in existing_files: if filename.startswith("crop_" ) and filename.endswith(".tif" ): index_str = filename.split("_" )[1 ].split("." )[0 ] existing_indices.add(int (index_str)) start_index = max (existing_indices) + 1 if existing_indices else 0 for i, crop in enumerate (crops): output_path = os.path.join(output_dir, f"crop_{start_index + i} .tif" ) save_image(crop, output_path) def save_image (image_data, output_path ): num_bands, height, width = image_data.shape driver = gdal.GetDriverByName("GTiff" ) dataset = driver.Create(output_path, width, height, num_bands, gdal.GDT_Byte) for i in range (num_bands): dataset.GetRasterBand(i + 1 ).WriteArray(image_data[i]) dataset.FlushCache() if __name__ == "__main__" : image_path = r"xxxx.tif" label_path = r"xxxx.tif" images = read_image(image_path) labels = read_image(label_path, 1 ) if images.shape[1 :] == labels.shape[1 :]: print ("Images have same dimensions. Starting cropping..." ) images_crops = sliding_crop(images, window_size=(512 , 512 ), stride=256 ) labels_crops = sliding_crop(labels, window_size=(512 , 512 ), stride=256 ) save_crops(images_crops, r"H:\Images" ) save_crops(labels_crops, r"H:\Labels" ) print ("Cropping done and crops saved!" ) else : print ("Images have different dimensions. Cannot proceed with cropping." )
参考文章:
【1】.
制作属于自己的遥感影像语义分割数据集 这里也提供了部分的代码:1.利用ArcGIS或QGIS软件创建矢量文件,并对影像区域的目标地物进行勾画,需要注意在地物勾画时,目标地物需要全部勾画,不关心的背景地物无需勾画,同时矢量文件与影像的坐标系要保持一致。2.添加分类标签,在勾画的矢量属性表中新建一个名为value或者classvalue的字段,字段类型为整形。针对需要提取的地物进行字段赋值,以1为起始依次按照顺序对不同地物进行赋值,通常背景地物在后续样本制作环节会自动标注为0。4.矢量文件转栅格,这里我们基于python的gdal库将矢量文件转换为与影像范围、分辨率对应的栅格文件。
3.Mask转txt 通过上面的方法,获取到的实际上是掩膜,但是使用 yolo 进行训练,就需要转换为 txt 格式。 (1)方案一 方案一使用的是从通义千问那里得到的代码,但是我总觉得哪里不对劲,因为训练处理的东西好像不太对。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 import cv2import ospath = r"D:\zlc\drone\图像分割\栅格\Labels" files = os.listdir(path) for file in files: name = file.split('.' )[0 ] file_path = os.path.join(path,name+'.tif' ) img = cv2.imread(file_path) H,W=img.shape[0 :2 ] print (H,W) gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) ret,bin_img = cv2.threshold(gray_img,0 ,255 ,cv2.THRESH_BINARY+cv2.THRESH_OTSU) cnt,hit = cv2.findContours(bin_img,cv2.RETR_TREE,cv2.CHAIN_APPROX_TC89_KCOS) cnt = list (cnt) f = open ("D:\zlc\drone\图像分割\栅格\yolo-labels/{}.txt" .format (file.split("." )[0 ]), "a+" ) for j in cnt: result = [] pre = j[0 ] for i in j: if abs (i[0 ][0 ] - pre[0 ][0 ]) > 1 or abs (i[0 ][1 ] - pre[0 ][1 ]) > 1 : pre = i temp = list (i[0 ]) temp[0 ] /= W temp[1 ] /= H result.append(temp) print (result) print (len (result)) if len (result) != 0 : f.write("0 " ) for line in result: line = str (line)[1 :-2 ].replace("," ,"" ) f.write(line+" " ) f.write("\n" ) f.close()
(2)方案二 方案二使用的是yolo自带的工具 convert_segment_masks_to_yolo_seg(“path/to/masks_directory”, “path/to/output/directory”, classes=80)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 from ultralytics.utils import LOGGERfrom pathlib import Pathimport cv2import numpy as npdef convert_segment_masks_to_yolo_seg (masks_dir, output_dir, classes ): """ Converts a dataset of segmentation mask images to the YOLO segmentation format. This function takes the directory containing the binary format mask images and converts them into YOLO segmentation format. The converted masks are saved in the specified output directory. Args: masks_dir (str): The path to the directory where all mask images (png, jpg) are stored. output_dir (str): The path to the directory where the converted YOLO segmentation masks will be stored. classes (int): Total classes in the dataset i.e. for COCO classes=80 Example: ```python from ultralytics.data.converter import convert_segment_masks_to_yolo_seg # The classes here is the total classes in the dataset, for COCO dataset we have 80 classes convert_segment_masks_to_yolo_seg("path/to/masks_directory", "path/to/output/directory", classes=80)
Notes:
The expected directory structure for the masks is:
- masks
├─ mask_image_01.png or mask_image_01.jpg
├─ mask_image_02.png or mask_image_02.jpg
├─ mask_image_03.png or mask_image_03.jpg
└─ mask_image_04.png or mask_image_04.jpg
After execution, the labels will be organized in the following structure:
- output_dir
├─ mask_yolo_01.txt
├─ mask_yolo_02.txt
├─ mask_yolo_03.txt
└─ mask_yolo_04.txt
"""
for mask_path in Path(masks_dir).iterdir():
if mask_path.suffix == ".png":
mask = cv2.imread(str(mask_path), cv2.IMREAD_GRAYSCALE) # Read the mask image in grayscale
img_height, img_width = mask.shape # Get image dimensions
LOGGER.info(f"Processing {mask_path} imgsz = {img_height} x {img_width}")
unique_values = np.unique(mask) # Get unique pixel values representing different classes
yolo_format_data = []
for value in unique_values:
if value == 0:
continue # Skip background
class_index = value
if class_index == -1:
LOGGER.warning(f"Unknown class for pixel value {value} in file {mask_path}, skipping.")
continue
# Create a binary mask for the current class and find contours
contours, _ = cv2.findContours(
(mask == value).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
) # Find contours
for contour in contours:
if len(contour) >= 3: # YOLO requires at least 3 points for a valid segmentation
contour = contour.squeeze() # Remove single-dimensional entries
yolo_format = [class_index]
for point in contour:
# Normalize the coordinates
yolo_format.append(round(point[0] / img_width, 6)) # Rounding to 6 decimal places
yolo_format.append(round(point[1] / img_height, 6))
yolo_format_data.append(yolo_format)
# Save Ultralytics YOLO format data to file
output_path = Path(output_dir) / f"{mask_path.stem}.txt"
with open(output_path, "w") as file:
for item in yolo_format_data:
line = " ".join(map(str, item))
file.write(line + "\n")
LOGGER.info(f"Processed and stored at {output_path} imgsz = {img_height} x {img_width}")
if name == “main “: # The classes here is the total classes in the dataset. # for COCO dataset we have 80 classes. convert_segment_masks_to_yolo_seg(masks_dir=”D:/zlc/drone/drone-train/data/land/tiles/png”, output_dir=”D:/zlc/drone/drone-train/data/land/tiles/labels”, classes=2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 参考文章: 【1】.[将mask的图片标签转换为yolo的txt标签](https://blog.csdn.net/qq_41701723/article/details/135449035) 这篇文章的代码可以使用,将mask的图片标签转换为yolo的txt标签,获取外轮廓 【2】.[将注记从Mask-RCNN数据集格式转换为COCO格式](https://cloud.tencent.com/developer/information/%E5%B0%86%E6%B3%A8%E8%AE%B0%E4%BB%8EMask-RCNN%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2%E4%B8%BACOCO%E6%A0%BC%E5%BC%8F) 1.Mask R-CNN 数据集格式和 COCO 格式的主要区别在于标注信息的组织方式。Mask R-CNN 通常包含以下文件:images/:图像文件, annotations/:包含标注框和掩码的 JSON 文件。2.COCO 格式包含以下文件:images/:图像文件,annotations/:包含标注框、掩码和类别信息的 JSON 文件。 【3】.[Mask标注转YOLO格式txt文件](https://blog.csdn.net/my_wx/article/details/140798592) 【4】.[mask-to-annotation](https://github.com/matthewkenely/mask-to-annotation) mask-to-annotation 是一款功能强大的高效工具,用于从二进制和彩色掩模中自动生成 COCO、YOLO 和 VGG 等流行计算机视觉格式的注释。 【5】.[Binary mask images to instead yolo annotation format ](https://github.com/ultralytics/ultralytics/issues/3085) The YOLOv8 repo offers a useful script, create_masks.py, which can be used to generate YOLO annotations from binary masks. 【6】.[常见数据集格式转换](https://www.unnamedtat.xyz/posts/10be164/) 1.mask转换为json(coco数据集的话自己再改一下)。2.可视化json。3.json转化为YOLO格式。 【7】.[简单实用](https://docs.ultralytics.com/zh/usage/simple-utilities/) 这里的代码更加的简单,只需要使用ultralytics自带的工具 convert_segment_masks_to_yolo_seg 就可以了。 【8】.[Yolov8-seg:制作并训练自己的数据集+提取并重建mask](https://blog.csdn.net/XY_39/article/details/136673733) 这里使用了 sam 进行了半自动标注。 【9】.[掩码mask图像标注转yolo格式](https://blog.csdn.net/Y5823990/article/details/143917406) classes是你的数据有多少类别,这里需要注意,类别与掩码的值对应,比如说你这里classes=2,那么你的掩码值必须是第一类为1,第二类为2。不能是通常的255,会警告并且转换输出为空。 【10】.[ultralytics.data.converter.convert_segment_masks_to_yolo_seg](https://docs.ultralytics.com/reference/data/converter/#ultralytics.data.converter.convert_coco) 这是官网的说明。 # 4.影像分割 图像分割一般尺寸要求都是很小的,但是遥感影像的分辨率尺寸特别的大,需要进行切割。切割成较小的尺寸,单独进行图像分割,然后再进行合并。根据 [Python|遥感影像语义分割:使用Python(GDAL)制作遥感影像语义分割数据集](https://www.cnblogs.com/tangjielin/p/18288301) 这篇文章,提供了下面的方法。 ```python import os from osgeo import gdal import numpy as np # 读取tif数据集 def readTif(fileName): dataset = gdal.Open(fileName) if dataset is None: print(fileName + "文件无法打开") return dataset # 保存tif文件函数 def writeTiff(im_data, im_geotrans, im_proj, path): if 'int8' in im_data.dtype.name: datatype = gdal.GDT_Byte elif 'int16' in im_data.dtype.name: datatype = gdal.GDT_UInt16 else: datatype = gdal.GDT_Float32 if len(im_data.shape) == 3: im_bands, im_height, im_width = im_data.shape elif len(im_data.shape) == 2: im_data = np.array([im_data]) im_bands, im_height, im_width = im_data.shape # 创建文件 driver = gdal.GetDriverByName("GTiff") dataset = driver.Create(path, int(im_width), int(im_height), int(im_bands), datatype) if dataset is not None: dataset.SetGeoTransform(im_geotrans) # 写入仿射变换参数 dataset.SetProjection(im_proj) # 写入投影 for i in range(im_bands): dataset.GetRasterBand(i + 1).WriteArray(im_data[i]) del dataset def TifCrop(TifPath, SavePath, CropSize, RepetitionRate): """ 滑动窗口裁剪函数 TifPath 影像路径 SavePath 裁剪后保存目录 CropSize 裁剪尺寸 RepetitionRate 重复率 """ dataset_img = readTif(TifPath) width = dataset_img.RasterXSize height = dataset_img.RasterYSize proj = dataset_img.GetProjection() geotrans = dataset_img.GetGeoTransform() img = dataset_img.ReadAsArray(0, 0, width, height) # 获取数据 # 获取当前文件夹的文件个数len,并以len+1命名即将裁剪得到的图像 new_name = len(os.listdir(SavePath)) # 裁剪图片,重复率为RepetitionRate for i in range(int((height - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))): for j in range(int((width - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))): # 如果图像是单波段 if len(img.shape) == 2: cropped = img[ int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize, int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize] # 如果图像是多波段 else: cropped = img[:, int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize, int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize] # 写图像 writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name) # 文件名 + 1 new_name = new_name + 1 # 向前裁剪最后一列 for i in range(int((height - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))): if len(img.shape) == 2: cropped = img[int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize, (width - CropSize): width] else: cropped = img[:, int(i * CropSize * (1 - RepetitionRate)): int(i * CropSize * (1 - RepetitionRate)) + CropSize, (width - CropSize): width] # 写图像 writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name) new_name = new_name + 1 # 向前裁剪最后一行 for j in range(int((width - CropSize * RepetitionRate) / (CropSize * (1 - RepetitionRate)))): if len(img.shape) == 2: cropped = img[(height - CropSize): height, int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize] else: cropped = img[:, (height - CropSize): height, int(j * CropSize * (1 - RepetitionRate)): int(j * CropSize * (1 - RepetitionRate)) + CropSize] writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name) # 文件名 + 1 new_name = new_name + 1 # 裁剪右下角 if len(img.shape) == 2: cropped = img[(height - CropSize): height, (width - CropSize): width] else: cropped = img[:, (height - CropSize): height, (width - CropSize): width] writeTiff(cropped, geotrans, proj, SavePath + "/%d.tif" % new_name) new_name = new_name + 1 # 训练集和验证集都要裁剪 # 裁剪图像特征。拿到影像数据增强中进行数据增强 # 将影像1裁剪为重复率为0.5的256×256的数据集 if __name__ == '__main__': TifCrop("D:\Dataset_Authoring\EnShi.tif", "D:\Dataset_Authoring\EnShi_data", 512, 0.5)
5.合并 分割识别之后的图像,需要重新合并,形成一个大尺寸的遥感影像,才能查看全貌。我查了很多的资料,最后还是使用了 阿里云的通义千问 解决了代码问题。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 import osimport reimport numpy as npimport cv2def merge_images_with_overlap (folder_path, row_overlap_width, col_overlap_height ): pattern = re.compile (r'crop_(\d+)_(\d+)\.tif$' ) images = {} missing_images = set () for filename in os.listdir(folder_path): match = pattern.match (filename) if match : try : row, col = map (int , match .groups()) images[(row, col)] = os.path.join(folder_path, filename) except ValueError as e: print (f"转换行或列编号时出错: {e} ,文件名: {filename} " ) continue if not images: print ("没有找到匹配的图像文件" ) return None rows = {key[0 ] for key in images.keys()} cols = {key[1 ] for key in images.keys()} max_row = max (rows) if rows else -1 max_col = max (cols) if cols else -1 for row in range (max_row + 1 ): for col in range (max_col + 1 ): if (row, col) not in images: missing_images.add((row, col)) if missing_images: print ("警告:以下图像文件缺失:" ) for missing in missing_images: print (f"行 {missing[0 ]} 列 {missing[1 ]} " ) sample_image_path = list (images.values())[0 ] try : sample_image = cv2.imread(sample_image_path, cv2.IMREAD_COLOR) image_height, image_width, _ = sample_image.shape except Exception as e: print (f"无法打开样本图像: {sample_image_path} , 错误: {e} " ) return None result_width = (max_col + 1 ) * image_width - max_col * col_overlap_height result_height = (max_row + 1 ) * image_height - max_row * row_overlap_width final_result = np.zeros((result_height, result_width, 3 ), dtype=np.uint8) for row in range (max_row + 1 ): for col in range (max_col + 1 ): path = images.get((row, col)) if not path: print (f"跳过缺失的行 {row} 列 {col} 的图像" ) continue try : img = cv2.imread(path, cv2.IMREAD_COLOR) left = col * (image_width - col_overlap_height) upper = row * (image_height - row_overlap_width) right = left + image_width lower = upper + image_height if col > 0 and (row, col-1 ) in images: prev_img_path = images.get((row, col-1 )) if prev_img_path: prev_img = cv2.imread(prev_img_path, cv2.IMREAD_COLOR) blended_part = cv2.addWeighted( prev_img[:, -(col_overlap_height):, :], 0.5 , img[:, :col_overlap_height, :], 0.5 , 0 ) final_result[upper:lower, left:left+col_overlap_height, :] = blended_part if row > 0 and (row-1 , col) in images: prev_img_path = images.get((row-1 , col)) if prev_img_path: prev_img = cv2.imread(prev_img_path, cv2.IMREAD_COLOR) blended_part = cv2.addWeighted( prev_img[-row_overlap_width:, :, :], 0.5 , img[:row_overlap_width, :, :], 0.5 , 0 ) final_result[upper:upper+row_overlap_width, left:right, :] = blended_part non_overlap_box = ( left + (col_overlap_height if col > 0 else 0 ), upper + (row_overlap_width if row > 0 else 0 ), right, lower ) final_result[non_overlap_box[1 ]:non_overlap_box[3 ], non_overlap_box[0 ]:non_overlap_box[2 ], :] = \ img[non_overlap_box[1 ]-upper:non_overlap_box[3 ]-upper, non_overlap_box[0 ]-left:non_overlap_box[2 ]-left, :] except Exception as e: print (f"处理图像时发生错误: {path} , 错误: {e} " ) output_path = os.path.join(folder_path, 'merged_image.tif' ) try : cv2.imwrite(output_path, final_result) print (f"合并后的图像已保存至: {output_path} " ) except Exception as e: print (f"保存结果图像时出错: {e} " ) folder_path = r'D:\zlc\drone\图像分割\栅格\Images' row_overlap_width = 256 col_overlap_height = 256 merge_images_with_overlap(folder_path, row_overlap_width, col_overlap_height)