透视变换

透视变换是一种更通用的几何变换，能够模拟三维空间中的视角变化，常用于校正图像的透视失真。

算法原理

透视变换（也称为投影变换）是一种将图像从一个视平面投影到另一个视平面的变换。它能够处理由于相机视角不同而产生的透视失真，例如拍摄矩形物体时由于角度问题导致的梯形变形。

flowchart LR A[原始图像] --> B[选择 4 个对应点对
任意三点不共线] B --> C[计算透视矩阵 3×3
h11 h12 h13
h21 h22 h23
h31 h32 h33] C --> D[应用齐次坐标变换
x' = (h11x+h12y+h13)/(h31x+h32y+h33)
y' = (h21x+h22y+h23)/(h31x+h32y+h33)] D --> E[输出透视变换图像] subgraph 应用 F[矫正倾斜文档] end E -.-> F

对比项	仿射变换	透视变换
点对数量	3 对	4 对
矩阵大小	2×3	3×3
保持平行	是	否（会聚）
应用场景	旋转缩放	视角矫正

透视变换使用3×3的投影矩阵来描述变换关系：

[x'] [h11 h12 h13] [x]

[y'] = [h21 h22 h23] [y]

[w ] [h31 h32 h33] [1]

其中最终坐标为 (x'/w, y'/w)。透视变换有8个自由度（因为齐次坐标的尺度不变性）。

算法步骤

确定变换前后的4对对应点（不共线）
计算透视变换矩阵
对图像中的每个像素点应用变换公式
使用插值方法计算变换后像素值

Python实现

Python

import cv2
import numpy as np
import matplotlib.pyplot as plt

def perspective_transform(image_path, src_points, dst_points):
    """
    实现透视变换
    :param image_path: 输入图像路径
    :param src_points: 原图像中的4个点 [(x1,y1), (x2,y2), (x3,y3), (x4,y4)]
    :param dst_points: 目标图像中的4个点 [(x1',y1'), (x2',y2'), (x3',y3'), (x4',y4')]
    :return: 原图和透视变换后的图像
    """
    # 读取图像
    img = cv2.imread(image_path)
    
    # 转换为numpy数组
    src_points = np.float32(src_points)
    dst_points = np.float32(dst_points)
    
    # 计算透视变换矩阵
    matrix = cv2.getPerspectiveTransform(src_points, dst_points)
    
    # 应用透视变换
    perspective_img = cv2.warpPerspective(img, matrix, (img.shape[1], img.shape[0]))
    
    return img, perspective_img

def manual_perspective_transform(image_path, matrix):
    """
    手动实现透视变换
    :param image_path: 输入图像路径
    :param matrix: 3x3透视变换矩阵
    :return: 透视变换后的图像
    """
    # 读取图像
    img = cv2.imread(image_path)
    height, width = img.shape[:2]
    
    # 创建输出图像
    transformed = np.zeros_like(img)
    
    # 对输出图像的每个像素进行反向映射
    for i in range(height):
        for j in range(width):
            # 当前点坐标
            x, y = j, i
            
            # 应用反向透视变换
            # [x']   [h11  h12  h13] [x]
            # [y'] = [h21  h22  h23] [y]
            # [w ]   [h31  h32  h33] [1]
            
            w = matrix[2][0]*x + matrix[2][1]*y + matrix[2][2]
            if abs(w) < 1e-10:  # 避免除零
                continue
                
            orig_x = (matrix[0][0]*x + matrix[0][1]*y + matrix[0][2]) / w
            orig_y = (matrix[1][0]*x + matrix[1][1]*y + matrix[1][2]) / w
            
            # 检查坐标是否在原图范围内
            if 0 <= orig_x < width and 0 <= orig_y < height:
                # 使用最近邻插值
                orig_x_int, orig_y_int = int(orig_x), int(orig_y)
                
                if orig_x_int < width - 1 and orig_y_int < height - 1:
                    # 双线性插值
                    dx, dy = orig_x - orig_x_int, orig_y - orig_y_int
                    transformed[i, j] = (
                        img[orig_y_int, orig_x_int] * (1 - dx) * (1 - dy) +
                        img[orig_y_int, orig_x_int + 1] * dx * (1 - dy) +
                        img[orig_y_int + 1, orig_x_int] * (1 - dx) * dy +
                        img[orig_y_int + 1, orig_x_int + 1] * dx * dy
                    ).astype(np.uint8)
                else:
                    transformed[i, j] = img[orig_y_int, orig_x_int]
    
    return img, transformed

def birdseye_view_transform(image_path, src_quad):
    """
    将图像转换为鸟瞰视图
    :param image_path: 输入图像路径
    :param src_quad: 原图像中的四边形顶点 [(x1,y1), (x2,y2), (x3,y3), (x4,y4)]
    :return: 原图和鸟瞰视图
    """
    # 读取图像
    img = cv2.imread(image_path)
    height, width = img.shape[:2]
    
    # 定义目标矩形的四个角点
    # 假设我们想要一个标准矩形输出
    rect_width = 400
    rect_height = 600
    dst_points = np.float32([
        [0, 0],
        [rect_width - 1, 0],
        [rect_width - 1, rect_height - 1],
        [0, rect_height - 1]
    ])
    
    src_points = np.float32(src_quad)
    
    # 计算透视变换矩阵
    matrix = cv2.getPerspectiveTransform(src_points, dst_points)
    
    # 应用透视变换
    birdseye = cv2.warpPerspective(img, matrix, (rect_width, rect_height))
    
    return img, birdseye

def deskew_image(image_path, corners):
    """
    纠正图像倾斜
    :param image_path: 输入图像路径
    :param corners: 文档的四个角点 [(x1,y1), (x2,y2), (x3,y3), (x4,y4)]
    :return: 原图和纠正后的图像
    """
    # 读取图像
    img = cv2.imread(image_path)
    
    # 定义输出图像的尺寸
    # 计算输出图像的宽度和高度
    (tl, tr, br, bl) = corners
    
    # 计算新图像的宽度
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA), int(widthB))
    
    # 计算新图像的高度
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA), int(heightB))
    
    # 定义目标点
    dst = np.array([
        [0, 0],
        [maxWidth - 1, 0],
        [maxWidth - 1, maxHeight - 1],
        [0, maxHeight - 1]], dtype="float32")
    
    # 计算透视变换矩阵
    src = np.array(corners, dtype="float32")
    matrix = cv2.getPerspectiveTransform(src, dst)
    
    # 应用透视变换
    corrected = cv2.warpPerspective(img, matrix, (maxWidth, maxHeight))
    
    return img, corrected

# 使用示例
if __name__ == "__main__":
    # 注意：需要提供实际的图像路径
    # 定义源点和目标点
    # src_pts = [(100, 100), (300, 100), (300, 300), (100, 300)]
    # dst_pts = [(50, 50), (350, 100), (300, 350), (80, 300)]
    # img, result = perspective_transform('image.jpg', src_pts, dst_pts)
    # quad = [(100, 100), (400, 100), (400, 300), (100, 300)]
    # bird_img, bird_result = birdseye_view_transform('image.jpg', quad)
    pass

算法可视化

算法流程图

flowchart LR A[输入] --> B[4 点对] B --> C[透视矩阵] C --> D[变换] D --> E[输出]

算法流程图

flowchart LR A[输入图像] --> B[确定 4 个点对] B --> C[计算透视矩阵] C --> D[应用变换] D --> E[输出变换图像]

算法优缺点

优点

能够处理透视失真，模拟不同的观察视角
适用于文档扫描和校正
能够实现更真实的几何变换效果
保持直线的性质（直线变换后仍为直线）

缺点

需要4对对应点才能确定变换矩阵
计算复杂度较高
对点的定位精度要求较高
变换后可能出现较大的空白区域

应用场景

文档扫描和校正
车牌识别预处理
地图校正
虚拟现实和增强现实
3D重建
鸟瞰图生成

算法信息

类型: 几何变换
适用: 透视校正和视角变换
复杂度: O(M×N)，其中M和N是图像的行和列
参数: 透视变换矩阵（8个自由度）

crop_rotate透视变换

透视变换

算法原理

算法步骤

Python实现

analytics 算法可视化

flowchart 算法流程图

flowchart 算法流程图

算法优缺点

优点

缺点

应用场景

算法信息

相关算法

透视变换

算法可视化

算法流程图

算法流程图