基于Python与K-Means的自动化视频分类方法

CYRUS-STUDIO · Sep 20, 2024 · cc284be · cc284be
1 parent c8fb2fa
commit cc284be
Show file tree

Hide file tree

Showing 10 changed files with 609 additions and 40 deletions.
diff --git a/content/posts/基于Python与K-Means的自动化视频分类方法.md b/content/posts/基于Python与K-Means的自动化视频分类方法.md
@@ -0,0 +1,169 @@
++++
+title = '基于Python与K-Means的自动化视频分类方法'
+date = 2024-09-21T01:43:21.619108+08:00
+draft = false
++++
+
+> 版权归作者所有，如有转发，请注明文章出处：<https://cyrus-studio.github.io/blog/>
+
+# __实现过程__
+
+
+1\. 特征提取：使用预训练的 InceptionV3 模型，从视频的若干帧中提取高维的视觉特征。将每个视频的所有帧特征取平均值，生成一个固定长度的特征向量来表示该视频。
+
+2\. 聚类：通过 K-Means 的聚类结果，每个视频被分配了一个簇标签，代表该视频与哪些视频在特征上最相似。
+
+3\. 分类整理：最后根据簇标签，将视频移动到相应的分类文件夹中，每个文件夹对应一个簇。
+
+## __InceptionV3 模型__
+
+
+InceptionV3 是一种用于图像分类和特征提取的深度学习模型，它是Inception 系列模型的第三个版本，由 Google 在 2015 年提出。
+
+它最初是作为图像分类任务的一个模型，能够将图像分类到 1000 个类别中（如狗、猫、汽车等）。通过去除模型的最后几层（分类部分），可以将 InceptionV3 用作特征提取器。
+
+## __簇__
+
+
+簇是聚类算法的核心概念，表示数据中相似的子集，目的是将无标签的数据点分组。
+
+## __K-Means__
+
+
+K-Means 是一种常用的无监督聚类算法，它的目标是将数据点分成 K 个簇（Cluster），使得每个簇内的数据点尽可能接近同一个中心（即簇的质心）。
+
+算法的核心思想是通过迭代的方式找到 K 个最优的簇质心，并根据这些质心将数据进行分组。
+
+# __源码__
+
+
+## __1\. 安装依赖库__
+
+
+```
+pip install moviepy scikit-learn tensorflow opencv-python
+```
+
+## __2\. 实现代码__
+
+
+```
+import os
+import numpy as np
+import cv2
+from moviepy.editor import VideoFileClip
+from sklearn.cluster import KMeans
+from tensorflow.keras.applications import InceptionV3
+from tensorflow.keras.applications.inception_v3 import preprocess_input
+from tensorflow.keras.preprocessing import image
+from tensorflow.keras.models import Model
+from shutil import move
+
+# 提取视频的帧作为特征
+def extract_video_features(video_path, model, frame_interval=30):
+    video = VideoFileClip(video_path)
+    frame_count = 0
+    features = []
+
+    for frame in video.iter_frames(fps=1):  # 以每秒一帧的速度获取帧
+        if frame_count % frame_interval == 0:
+            # Resize frame to match model input size (299x299 for InceptionV3)
+            img = cv2.resize(frame, (299, 299))
+            img = image.img_to_array(img)
+            img = np.expand_dims(img, axis=0)
+            img = preprocess_input(img)
+
+            # 提取特征
+            feature = model.predict(img)
+            features.append(feature.flatten())
+
+        frame_count += 1
+
+    # 取视频的所有帧特征的均值作为视频的最终特征
+    return np.mean(features, axis=0)
+
+# 批量提取目录下所有视频的特征
+def extract_features_for_all_videos(input_dir, model, frame_interval=30):
+    video_features = []
+    video_files = []
+
+    for filename in os.listdir(input_dir):
+        if filename.endswith(".mp4"):  # 你可以根据需要修改文件格式
+            video_path = os.path.join(input_dir, filename)
+            print(f"正在处理视频: {filename}")
+            features = extract_video_features(video_path, model, frame_interval)
+            video_features.append(features)
+            video_files.append(filename)
+
+    return np.array(video_features), video_files
+
+# 对视频进行聚类
+def cluster_videos(video_features, num_clusters=3):
+    kmeans = KMeans(n_clusters=num_clusters, random_state=42)
+    kmeans.fit(video_features)
+    return kmeans.labels_
+
+# 将视频分类到不同的文件夹
+def classify_videos(input_dir, output_dir, video_files, labels):
+    for label, filename in zip(labels, video_files):
+        output_folder = os.path.join(output_dir, f"cluster_{label}")
+        if not os.path.exists(output_folder):
+            os.makedirs(output_folder)
+
+        input_path = os.path.join(input_dir, filename)
+        output_path = os.path.join(output_folder, filename)
+
+        move(input_path, output_path)
+        print(f"已将视频 {filename} 移动到 {output_folder}")
+
+# 主函数
+def main(input_dir, output_dir, num_clusters=3, frame_interval=30):
+    # 加载预训练的InceptionV3模型，并去掉顶层的分类部分，只用来提取特征
+    base_model = InceptionV3(weights='imagenet')
+    model = Model(inputs=base_model.input, outputs=base_model.get_layer('avg_pool').output)
+
+    # 提取所有视频的特征
+    video_features, video_files = extract_features_for_all_videos(input_dir, model, frame_interval)
+
+    # 对视频进行聚类
+    labels = cluster_videos(video_features, num_clusters)
+
+    # 将视频移动到相应的分类文件夹
+    classify_videos(input_dir, output_dir, video_files, labels)
+
+# 示例调用
+input_directory = "path/to/input_videos"
+output_directory = "path/to/output_videos"
+main(input_directory, output_directory, num_clusters=30, frame_interval=30)
+```
+
+## __3\. 代码说明__
+
+
+1\. extract_video_features：从每个视频中提取帧，使用 InceptionV3 模型提取每个帧的特征，并最终取所有帧特征的平均值作为该视频的代表特征。
+
+2\. extract_features_for_all_videos：批量提取目录中所有视频的特征。
+
+3\. cluster_videos：使用 K-Means 聚类算法对视频进行分类，将相似的视频聚到一起。
+
+4\. classify_videos：将视频根据聚类结果移动到不同的分类文件夹。
+
+5\. main：主函数，负责加载模型、提取特征、聚类以及将视频分类。
+
+## __4\. 调用说明__
+
+
+1\. input_directory: 视频所在的输入文件夹。
+
+2\. output_directory: 输出文件夹，程序会根据聚类结果创建不同的文件夹，将相似的视频分类进去。
+
+3\. num_clusters: 要分类的类别数，即希望将视频分为多少类。
+
+4\. frame_interval: 每隔多少帧提取一次特征帧。值越大，提取帧的间隔越大。
+
+
+源码地址：[https://github.com/CYRUS-STUDIO/classify-videos-kmeans-python](https://github.com/CYRUS-STUDIO/classify-videos-kmeans-python)
+
+
+               
+
diff --git a/public/index.html b/public/index.html
@@ -51,6 +51,23 @@
     <h1>CYRUS STUDIO</h1>
     <ul class="posts-list">
 
+        <li class="posts-list-item">
+          <a class="posts-list-item-title" href="https://cyrus-studio.github.io/blog/posts/%E5%9F%BA%E4%BA%8Epython%E4%B8%8Ek-means%E7%9A%84%E8%87%AA%E5%8A%A8%E5%8C%96%E8%A7%86%E9%A2%91%E5%88%86%E7%B1%BB%E6%96%B9%E6%B3%95/">基于Python与K-Means的自动化视频分类方法</a>
+          <span class="posts-list-item-description">
+            <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-calendar">
+  <title>calendar</title>
+  <rect x="3" y="4" width="18" height="18" rx="2" ry="2"></rect><line x1="16" y1="2" x2="16" y2="6"></line><line x1="8" y1="2" x2="8" y2="6"></line><line x1="3" y1="10" x2="21" y2="10"></line>
+</svg>
+            Sep 21, 2024
+            <span class="posts-list-item-separator">-</span>
+            <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-clock">
+  <title>clock</title>
+  <circle cx="12" cy="12" r="10"></circle><polyline points="12 6 12 12 16 14"></polyline>
+</svg>
+            2 min read
+          </span>
+        </li>
+
         <li class="posts-list-item">
           <a class="posts-list-item-title" href="https://cyrus-studio.github.io/blog/posts/android%E4%B8%8B%E5%8F%8D%E8%B0%83%E8%AF%95%E4%B8%8E%E5%8F%8D%E5%8F%8D%E8%B0%83%E8%AF%95/">Android下反调试与反反调试</a>
           <span class="posts-list-item-description">
@@ -204,23 +221,6 @@ <h1>CYRUS STUDIO</h1>
           </span>
         </li>
 
-        <li class="posts-list-item">
-          <a class="posts-list-item-title" href="https://cyrus-studio.github.io/blog/posts/%E7%BC%96%E8%AF%91lineageos%E6%A8%A1%E6%8B%9F%E5%99%A8%E9%95%9C%E5%83%8F%E5%AF%BC%E5%87%BA%E5%88%B0androidstudio/">编译LineageOS模拟器镜像，导出到AndroidStudio</a>
-          <span class="posts-list-item-description">
-            <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-calendar">
-  <title>calendar</title>
-  <rect x="3" y="4" width="18" height="18" rx="2" ry="2"></rect><line x1="16" y1="2" x2="16" y2="6"></line><line x1="8" y1="2" x2="8" y2="6"></line><line x1="3" y1="10" x2="21" y2="10"></line>
-</svg>
-            Sep 1, 2024
-            <span class="posts-list-item-separator">-</span>
-            <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-clock">
-  <title>clock</title>
-  <circle cx="12" cy="12" r="10"></circle><polyline points="12 6 12 12 16 14"></polyline>
-</svg>
-            5 min read
-          </span>
-        </li>
-
     </ul>
 
 
@@ -239,6 +239,12 @@ <h1>CYRUS STUDIO</h1>
     </a>
   </li>
 
+  <li class="page-item">
+    <a class="page-link" href="/blog/page/3/">
+      3
+    </a>
+  </li>
+
 
   <li class="page-item">
     <a class="page-link" href="/blog/page/2/">

diff --git a/public/index.xml b/public/index.xml
@@ -6,8 +6,15 @@
     <description>Recent content on CYRUS STUDIO</description>
     <generator>Hugo</generator>
     <language>zh-cn</language>
-    <lastBuildDate>Thu, 19 Sep 2024 06:44:08 +0800</lastBuildDate>
+    <lastBuildDate>Sat, 21 Sep 2024 01:43:21 +0800</lastBuildDate>
     <atom:link href="https://cyrus-studio.github.io/blog/index.xml" rel="self" type="application/rss+xml" />
+    <item>
+      <title>基于Python与K-Means的自动化视频分类方法</title>
+      <link>https://cyrus-studio.github.io/blog/posts/%E5%9F%BA%E4%BA%8Epython%E4%B8%8Ek-means%E7%9A%84%E8%87%AA%E5%8A%A8%E5%8C%96%E8%A7%86%E9%A2%91%E5%88%86%E7%B1%BB%E6%96%B9%E6%B3%95/</link>
+      <pubDate>Sat, 21 Sep 2024 01:43:21 +0800</pubDate>
+      <guid>https://cyrus-studio.github.io/blog/posts/%E5%9F%BA%E4%BA%8Epython%E4%B8%8Ek-means%E7%9A%84%E8%87%AA%E5%8A%A8%E5%8C%96%E8%A7%86%E9%A2%91%E5%88%86%E7%B1%BB%E6%96%B9%E6%B3%95/</guid>
+      <description>版权归作者所有，如有转发，请注明文章出处：https://cyrus-studio.github.io/blog/&#xA;实现过程 1. 特征提取：使用预训练的 InceptionV3 模型，从视频的若干帧中提取高维的视觉特征。将每个视频的所有帧特征取平均值，生成一个固定长度的特征向量来表示该视频。&#xA;2. 聚类：通过 K-Means 的聚类结果，每个视频被分配了一个簇标签，代表该视频与哪些视频在特征上最相似。&#xA;3. 分类整理：最后根据簇标签，将视频移动到相应的分类文件夹中，每个文件夹对应一个簇。&#xA;InceptionV3 模型 InceptionV3 是一种用于图像分类和特征提取的深度学习模型，它是Inception 系列模型的第三个版本，由 Google 在 2015 年提出。&#xA;它最初是作为图像分类任务的一个模型，能够将图像分类到 1000 个类别中（如狗、猫、汽车等）。通过去除模型的最后几层（分类部分），可以将 InceptionV3 用作特征提取器。&#xA;簇 簇是聚类算法的核心概念，表示数据中相似的子集，目的是将无标签的数据点分组。&#xA;K-Means K-Means 是一种常用的无监督聚类算法，它的目标是将数据点分成 K 个簇（Cluster），使得每个簇内的数据点尽可能接近同一个中心（即簇的质心）。&#xA;算法的核心思想是通过迭代的方式找到 K 个最优的簇质心，并根据这些质心将数据进行分组。&#xA;源码 1. 安装依赖库 pip install moviepy scikit-learn tensorflow opencv-python 2. 实现代码 import os&#xD;import numpy as np&#xD;import cv2&#xD;from moviepy.editor import VideoFileClip&#xD;from sklearn.cluster import KMeans&#xD;from tensorflow.keras.applications import InceptionV3&#xD;from tensorflow.keras.applications.inception_v3 import preprocess_input&#xD;from tensorflow.</description>
+    </item>
     <item>
       <title>Android下反调试与反反调试</title>
       <link>https://cyrus-studio.github.io/blog/posts/android%E4%B8%8B%E5%8F%8D%E8%B0%83%E8%AF%95%E4%B8%8E%E5%8F%8D%E5%8F%8D%E8%B0%83%E8%AF%95/</link>

diff --git a/public/page/2/index.html b/public/page/2/index.html
@@ -51,6 +51,23 @@
     <h1>CYRUS STUDIO</h1>
     <ul class="posts-list">
 
+        <li class="posts-list-item">
+          <a class="posts-list-item-title" href="https://cyrus-studio.github.io/blog/posts/%E7%BC%96%E8%AF%91lineageos%E6%A8%A1%E6%8B%9F%E5%99%A8%E9%95%9C%E5%83%8F%E5%AF%BC%E5%87%BA%E5%88%B0androidstudio/">编译LineageOS模拟器镜像，导出到AndroidStudio</a>
+          <span class="posts-list-item-description">
+            <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-calendar">
+  <title>calendar</title>
+  <rect x="3" y="4" width="18" height="18" rx="2" ry="2"></rect><line x1="16" y1="2" x2="16" y2="6"></line><line x1="8" y1="2" x2="8" y2="6"></line><line x1="3" y1="10" x2="21" y2="10"></line>
+</svg>
+            Sep 1, 2024
+            <span class="posts-list-item-separator">-</span>
+            <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-clock">
+  <title>clock</title>
+  <circle cx="12" cy="12" r="10"></circle><polyline points="12 6 12 12 16 14"></polyline>
+</svg>
+            5 min read
+          </span>
+        </li>
+
         <li class="posts-list-item">
           <a class="posts-list-item-title" href="https://cyrus-studio.github.io/blog/posts/windows%E4%B8%8B%E5%88%9B%E5%BB%BAftp%E6%9C%8D%E5%8A%A1%E5%99%A8%E5%AE%9E%E7%8E%B0%E6%96%87%E4%BB%B6%E5%85%B1%E4%BA%AB/">Windows下创建FTP服务器，实现文件共享</a>
           <span class="posts-list-item-description">
@@ -231,6 +248,21 @@ <h1>CYRUS STUDIO</h1>
     </a>
   </li>
 
+  <li class="page-item">
+    <a class="page-link" href="/blog/page/3/">
+      3
+    </a>
+  </li>
+
+
+  <li class="page-item">
+    <a class="page-link" href="/blog/page/3/">
+      <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-arrow-right">
+  <title>arrow-right</title>
+  <line x1="5" y1="12" x2="19" y2="12"></line><polyline points="12 5 19 12 12 19"></polyline>
+</svg>
+    </a>
+  </li>
 
 </ul>
 

diff --git a/public/page/3/index.html b/public/page/3/index.html
@@ -0,0 +1,96 @@
+<!doctype html>
+<html lang="zh-cn">
+  <head>
+    <title>CYRUS STUDIO</title>
+    <link rel="shortcut icon" href="/favicon.ico" />
+    <meta charset="utf-8" />
+    <meta name="generator" content="Hugo 0.131.0">
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <meta name="author" content="John Doe" />
+    <meta name="description" content="Android &amp; Python Developer" />
+    <link rel="stylesheet" href="/blog/css/main.min.08e876a0f4aeb92fb7ca4e4c12d7d6ea16684353b267ad3f4385e180cc91a06b.css" />
+    <link rel="alternate" type="application/rss+xml" href="https://cyrus-studio.github.io/blog/index.xml" title="CYRUS STUDIO">
+
+
+
+
+
+
+
+  <meta name="twitter:card" content="summary">
+  <meta name="twitter:title" content="CYRUS STUDIO">
+  <meta name="twitter:description" content="Android &amp; Python Developer">
+
+    <meta property="og:url" content="https://cyrus-studio.github.io/blog/">
+  <meta property="og:site_name" content="CYRUS STUDIO">
+  <meta property="og:title" content="CYRUS STUDIO">
+  <meta property="og:description" content="Android &amp; Python Developer">
+  <meta property="og:locale" content="zh_cn">
+  <meta property="og:type" content="website">
+
+
+  </head>
+  <body>
+    <header class="app-header">
+      <a href="https://cyrus-studio.github.io/blog/"><img class="app-header-avatar" src="/blog/avatar.jpg" alt="John Doe" /></a>
+      <span class="app-header-title">CYRUS STUDIO</span>
+      <nav class="app-header-menu">
+          <a class="app-header-menu-item" href="/blog/">Home</a>
+             - 
+
+          <a class="app-header-menu-item" href="/blog/about/">About</a>
+             - 
+
+          <a class="app-header-menu-item" href="https://github.com/CYRUS-STUDIO">GitHub</a>
+      </nav>
+      <p>Android &amp; Python Developer</p>
+    </header>
+    <main class="app-container">
+
+  <article>
+    <h1>CYRUS STUDIO</h1>
+    <ul class="posts-list">
+
+    </ul>
+
+
+<ul class="pagination">
+
+  <li class="page-item">
+    <a class="page-link" href="/blog/page/2/">
+      <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="icon icon-arrow-left">
+  <title>arrow-left</title>
+  <line x1="19" y1="12" x2="5" y2="12"></line><polyline points="12 19 5 12 12 5"></polyline>
+</svg>
+    </a>
+  </li>
+
+
+  <li class="page-item">
+    <a class="page-link" href="/blog/">
+      1
+    </a>
+  </li>
+
+  <li class="page-item">
+    <a class="page-link" href="/blog/page/2/">
+      2
+    </a>
+  </li>
+
+  <li class="page-item active">
+    <a class="page-link" href="/blog/page/3/">
+      3
+    </a>
+  </li>
+
+
+</ul>
+
+
+
+  </article>
+
+    </main>
+  </body>
+</html>