diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/2016/03/25/manage-vxworks-tornado-executable-project-using-tcl.html b/2016/03/25/manage-vxworks-tornado-executable-project-using-tcl.html new file mode 100644 index 000000000..d8f4f05f0 --- /dev/null +++ b/2016/03/25/manage-vxworks-tornado-executable-project-using-tcl.html @@ -0,0 +1,184 @@ + + + + + + + + +用TCL(工具命令语言)管理Tornado (for VxWorks) 可启动工程 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

用TCL(工具命令语言)管理Tornado (for VxWorks) 可启动工程

+ dev/tcl + +
+ +
+

尽管多数情况下要写VxWorks嵌入式应用程序代码常用Tornado编程环境,但有时可能会需要在命令行下完成简单的Tornado工程管理。本教程教授了如何将简单的工程管理迁移到Tornado外部并在命令行下实现(虽然这样做无法体验Tornado下的一些方便的功能)。

+ +
    +
  1. +

    准备Tornado软件。首先得有Tornado的全套软件。我的是Tornado2.2。Tornado是否经过破解或是否安装都问题不大,只要有它的安装目录就可以。

    +
  2. +
  3. +

    配置环境。Tornado环境中已经配好了各种环境变量,所以我们要向在普通cmd下实现Tornado的基本功能,也需要手动配置相应的环境。a) 新建环境变量WIND_BASE,其值为Tornado的安装目录(例如我的Tornado安装在D盘Tornado2.2目录下,那么WIND_BASE值为D:\Tornado2.2;b) 新建环境变量WIND_HOST_TYPE,如果是Windows用户,那么需要将其值设为x86-win32,如果不是Windows用户,那么凭本人的知识就不太清楚了;c) 将%WIND_BASE%\host\%WIND_HOST_TYPE%\bin加入PATH环境变量;d) 新建环境变量DIABLIB,其值为%WIND_BASE%/host/diab(注意斜杠的方向)。注意这些变量必须真得加到系统环境变量中而不是仅在命令行上输SET WIND_BASE=D:\Tornado2.2等等。

    +
  4. +
  5. +

    配置diabgnu工具链。在cmd中执行以下两条批处理命令:

    +
  6. +
+ +
wtxtcl.exe %WIND_BASE%/host/resource/tcl/app-config/Project/gnuInfoGen.tcl diab
+wtxtcl.exe %WIND_BASE%/host/resource/tcl/app-config/Project/gnuInfoGen.tcl gnu
+
+ +
    +
  1. 基本的工程管理方法(建议将下面的每条内容都写到TCL脚本文件中以方便调用)
  2. +
+ +

a) 建立新工程(本例中BSP(板级支持包)以三星的嵌入式开发板S3c2410BP为例)

+ +
# 加载过程库文件cmpScriptLib.tcl,其中定义了工程管理所需的各种方法
+source [wtxPath host resource tcl app-config Project]cmpScriptLib.tcl
+
+# 尝试创建名为"Project0"的可启动工程,注意Project0一定不能是已经存在的工程
+# 新工程位于%WIND_BASE%\target\proj目录下,该目录由可接受任意个参数的命令wtxPath指定
+# S3c2410BP是BSP名,BSP应放在%WIND_BASE%\target\config目录下
+cmpProjCreate S3c2410BP [wtxPath target proj Project0]Project0.wpj
+cmpProjClose
+
+ +

b) 删除工程(以删除工程”Project0”为例)

+ +
source [wtxPath host resource tcl app-config Project]cmpScriptLib.tcl
+
+cmpProjOpen [wtxPath target proj Project0]Project0.wpj
+cmpProjDelete
+
+ +

c) 向工程(以Project0为例)中添加文件(以D:\my_directory\my_source_file.c为例)

+ +
source [wtxPath host resource tcl app-config Project]cmpScriptLib.tcl
+cmpProjOpen [wtxPath target proj Project0]Project0.wpj
+cmpFileAdd d:/my_directory/my_source_file.c
+cmpProjClose
+
+ +

d) 从工程(以Project0为例)中移除文件(以D:\my_directory\my_source_file.c为例)

+ +
source [wtxPath host resource tcl app-config Project]cmpScriptLib.tcl
+cmpProjOpen [wtxPath target proj Project0]Project0.wpj
+cmpFileRemove d:/my_directory/my_source_file.c
+cmpProjClose
+
+ +

e) 获取工程中包含的文件列表(一行一个文件名,以Project0为例)

+ +
source [wtxPath host resource tcl app-config Project]cmpScriptLib.tcl
+set projId [cmpProjOpen [wtxPath target proj Project0]Project0.wpj]
+set file_list [prjFileListGet $projId]
+cmpProjClose
+foreach item $file_list {
+    puts $item
+}
+
+ +

f) 重新编译工程(以Project0为例)

+ +
source [wtxPath host resource tcl app-config Project]cmpScriptLib.tcl
+cmpProjOpen [wtxPath target proj Project0]Project0.wpj
+cmpBuild clean
+cmpBuild
+cmpProjClose
+
+ +

本教程至此结束,若对TCL语言不很熟悉,请参阅工具命令语言(TCL)的相关教程。

+ +
+ +
+ +
+
+ + + diff --git a/2016/05/01/banker-algorithm-termination-condition-proof.html b/2016/05/01/banker-algorithm-termination-condition-proof.html new file mode 100644 index 000000000..7726fb698 --- /dev/null +++ b/2016/05/01/banker-algorithm-termination-condition-proof.html @@ -0,0 +1,173 @@ + + + + + + + + +银行家算法结束条件的合理性证明 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

银行家算法结束条件的合理性证明

+ algorithm + +
+ +
+

首先简要提一下银行家算法的流程(类Java的伪代码)。算法的具体说明请参见操作系统课本。

+ +
/*
+ * 令work为长为m的数组,表示m种资源的剩余量;
+ * 令finish为长为n的布尔数组,表示n个进程是否已经结束;
+ * 令need为n行m列的二维数组,need[i]表示第i个进程在当前时刻所需的最大资源量;
+ * 令allocation为n行m列的二维数组,allocation[i],表示第i个进程在当前时刻已被分配的资源量。
+ * 令available为长为m的数组,表示初始可用的资源量
+ *
+ * array1 ope array2 表示两数组长度(记为len)相等,且对于任意0 <= i < len,array1[i] ope array2[i]。
+ * 例如 array1 < array2表示对于任意0 <= i < len,array1[i] < array2[i]。
+ */
+work = available;
+for (int i = 0; i < finish.length; i++)
+        finish[i] = false;
+while there exists such an i that
+finish[i] == false && need[i] <= work
+        work -= allocation[i];
+        finish[i] = true;
+for (int i = 0; i < finish.length; i++)
+        if (finish[i] = false)
+                return false; //可能发生死锁
+return true; //不可能发生死锁
+
+ +

不知有没有人会质疑该算法的结束条件:该算法没有回溯过程,如何保证这次没有找到一个进程运行的安全序列,这n个进程的任意顺序排列就都不可能构成安全序列呢?

+ +

证明如下:

+ +

假设有$n$个进程,以序号表示为

+ +\[[1, 2, \dots, n]\] + +

进程运行序列进行到

+ +\[S = [i_1, i_2, \dots, i_k]\ (k < n)\] + +

时无法继续算法(即不能找出一个i满足finish[i]==false && need[i] <= work),被判定为可能发生死锁。

+ +

令集合$C = {i_1, i_2, \dots, i_k}$;并令集合$D$为集合${1, 2, \dots, n}$与$C$的差集,即所有finishfalse的进程所组成的集合。

+ +

如果此时无法继续算法,那么根据算法流程,

+ +\[\min_{j\in D}\big\{\text{need}_j\big\} > \text{available} + \sum_{j \in C}\text{allocation}_j\] + +

若存在另一个序列$S’$,使得$S’$为安全序列,则$S’$中的元素排列只能为以下情况之一:

+ +
    +
  1. +

    前$k$个元素构成的集合与$C$相同(但排列顺序可能不同),且后$n-k$个元素构成的集合与$D$相同(但排列顺序可能不同);

    +
  2. +
  3. +

    前$k$个元素中至少有一个元素属于集合$D$,且后$n-k$个元素中至少有一个元素属于集合$C$。

    +
  4. +
+ +

对于第一种情况,根据(命题1)不应存在;

+ +

对于第二种情况,假设某一个属于集合$D$的元素出现在$S’$的第$t$($1\le t\le k$)个位置上。令集合$C’$为集合$C$中的前$t$个元素构成的集合,那么此时应有

+ +\[\exists j \in D,\ \text{need}_j \le \text{available} + \sum_{j \in C'}\text{allocation}_j\quad\text{(命题2)}\] + +

由于$C’$是$C$的子集,所以命题2中的和式一定不大于命题1中的和式。因此如果命题1是正确的,那么命题2一定是错误的。

+ +

所以,只要存在一个序列不是安全序列,那么这$n$个进程的任意排列都不是安全序列。

+ +

换言之,只要有一个序列是安全序列,那么在算法进行过程中出现的任何分叉点所构成的其它序列就都是安全序列。

+ +
+ +
+ +
+
+ + + diff --git a/2016/09/02/validate-xml-via-dtd-using-java.html b/2016/09/02/validate-xml-via-dtd-using-java.html new file mode 100644 index 000000000..d9b2bca6e --- /dev/null +++ b/2016/09/02/validate-xml-via-dtd-using-java.html @@ -0,0 +1,317 @@ + + + + + + + + +使用Java API通过DTD方式验证XML | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

使用Java API通过DTD方式验证XML

+ dev/java + +
+ +
+

摘要

+ +

本文记述了如何使用Java 8API 解析但不验证、按照XML文件头的DOCTYPE声明验证、使用本地文件验证XML的方法。本文不涉及如何读取、修改XML节点,以及创建XML文档的内容。

+ +

解析但不验证

+ +
import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
+
+import org.w3c.dom.Document;
+import org.xml.sax.SAXException;
+
+public class XMLParser {
+	public static void main(String[] args) {
+		try {
+			String xmlToParse = "myDocument.xml";
+			DocumentBuilderFactory dbf = 
+					DocumentBuilderFactory.newInstance();
+			// 默认DocumentBuilderFactory不创建
+			// 启用验证功能的DocumentBuilder
+			DocumentBuilder db = dbf.newDocumentBuilder();
+			Document myDoc = db.parse(xmlToParse);
+		} catch (ParseConfigurationException e) {
+			e.printStackTrace();
+		} catch (IOException e) {
+			e.printStackTrace();
+		} catch (SAXException e) {
+			e.printStackTrace();
+		}
+	}
+
+ +

使用XML文件头部声明的DOCTYPE验证

+ +
import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
+
+import org.w3c.dom.Document;
+import org.xml.sax.SAXException;
+
+public class XMLParser {
+	public static void main(String[] args) {
+		try {
+			String xmlToParse = "myDocument.xml";
+			DocumentBuilderFactory dbf = 
+					DocumentBuilderFactory.newInstance();
+			dbf.setValidating(true);  // 注意这里不同
+			DocumentBuilder db = dbf.newDocumentBuilder();
+			Document myDoc = db.parse(xmlToParse);
+		} catch (ParseConfigurationException e) {
+			e.printStackTrace();
+		} catch (IOException e) {
+			e.printStackTrace();
+		} catch (SAXException e) {
+			e.printStackTrace();
+		}
+	}
+
+ +

这时可能抛出IOException,原因通常是没有找到XML所声明的DTD文件

+ +
    +
  • 如果XML声明的DTD在本地,可能会报FileNotFoundException。此时需要检查本地DTD的路径是否填写正确
  • +
  • 否则可能报SocketException。此时需要检查网络是否畅通
  • +
+ +

然而此时即使XML不符合所声明DTD的定义,SAXException也可能不会被抛出,而仅仅是报错信息通过System.err打印出来,同时会打印运行警告:“警告: 已启用验证, 但未设置 org.xml.sax.ErrorHandler, 这可能不是预期结果。解析器将使用默认 ErrorHandler 来输出前 0 个错误。请调用 ‘setErrorHandler’ 方法以解决此问题。”

+ +

这是因为没有设置ErrorHandler。如果希望SAXException在发生验证错误时被抛出,需要通过DocumentBuilder.setErrorHandler(ErrorHandler eh)方法进行设置。

+ +

重写上述代码如下:

+ +
import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
+
+import org.w3c.dom.Document;
+import org.xml.sax.ErrorHandler;  // 注意这里不同
+import org.xml.sax.SAXException;
+import org.xml.sax.SAXParseException;  // 注意这里不同
+
+public class XMLParser {
+	public static void main(String[] args) {
+		try {
+			String xmlToParse = "myDocument.xml";
+			DocumentBuilderFactory dbf = 
+					DocumentBuilderFactory.newInstance();
+			dbf.setValidating(true);
+			DocumentBuilder db = dbf.newDocumentBuilder();
+			db.setErrorHandler(new ErrorHandler() {
+				/*
+				 * 定义了一个只要出一点解析错误就抛出异常的ErrorHandler。
+				 * 读者可以以此为依据编写更精细化管理的ErrorHandler。
+				 */
+				
+				@Override
+				public void error(SAXParseException exception)
+						throws SAXException {
+					throw exception;
+				}
+				@Override
+				public void fatalError(SAXParseException exception)
+						throws SAXException {
+					throw exception;
+				}
+				@Override
+				public void warning(SAXParseException exception)
+						throws SAXException {
+					throw exception;
+				}
+			});  // 注意这里不同
+			Document myDoc = db.parse(xmlToParse);
+		} catch (ParseConfigurationException e) {
+			e.printStackTrace();
+		} catch (IOException e) {
+			e.printStackTrace();
+		} catch (SAXException e) {
+			e.printStackTrace();
+		}
+	}
+
+ +

使用本地DTD文件验证

+ +
import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
+
+import org.w3c.dom.Document;
+import org.xml.sax.ErrorHandler;
+import org.xml.sax.EntityHandler;  // 注意这里不同
+import org.xml.sax.InputSource;  // 注意这里不同
+import org.xml.sax.SAXException;
+import org.xml.sax.SAXParseException;
+
+public class XMLParser {
+	public static void main(String[] args) {
+		try {
+			String xmlToParse = "myDocument.xml";
+			DocumentBuilderFactory dbf = 
+					DocumentBuilderFactory.newInstance();
+			dbf.setValidating(true);
+			DocumentBuilder db = dbf.newDocumentBuilder();
+			db.setErrorHandler(new ErrorHandler() {
+				@Override
+				public void error(SAXParseException exception)
+						throws SAXException {
+					throw exception;
+				}
+				@Override
+				public void fatalError(SAXParseException exception)
+						throws SAXException {
+					throw exception;
+				}
+				@Override
+				public void warning(SAXParseException exception)
+						throws SAXException {
+					throw exception;
+				}
+			});
+			db.setEntityResolver(new EntityResolver() {
+				/*
+				 * 编写了根据PUBLIC域使用相应的本地dtd的EntityResolver;
+				 * 读者也可以据此编写根据SYSTEM域使用相应dtd的EntityResolver;
+				 * 或不管xml中声明成什么DOCTYPE,都使用同一份dtd进行验证,
+				 * 此时resolveEntity方法体中仅包含
+				 *     return new InputSource("a-fixed-dtd-path");
+				 */
+				
+				@Override
+				public InputSource resolveEntity(String publicId,
+						String systemId) {
+					switch (publicId) {  // 此处仅为示意
+					case "URL-sample-1":
+						return new InputSource(
+								"local-dtd-path-for-url-sample-1");
+					case "URL-sample-2":
+						return new InputSource(
+								"local-dtd-path-for-url-sample-2");
+					default:
+						// 仍然按照DOCTYPE去解析,此时可能抛出IOException
+						return null;
+				}
+			});  // 注意这里不同
+			Document myDoc = db.parse(xmlToParse);
+		} catch (ParseConfigurationException e) {
+			e.printStackTrace();
+		} catch (IOException e) {
+			e.printStackTrace();
+		} catch (SAXException e) {
+			e.printStackTrace();
+		}
+	}
+
+ +
+ +
+ +
+
+ + + diff --git a/2016/12/27/apache-ant-extension-tutorial.html b/2016/12/27/apache-ant-extension-tutorial.html new file mode 100644 index 000000000..88cdf89bb --- /dev/null +++ b/2016/12/27/apache-ant-extension-tutorial.html @@ -0,0 +1,252 @@ + + + + + + + + +Apache Ant 扩展教程 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Apache Ant 扩展教程

+ dev/java + +
+ +
+

Apache Ant 致力于成为一款灵活方便的构建工具,尽管对 Java 支持更多,也可以通过一些第三方库来支持其它语言的构建,甚至一些常规维护任务。鉴于Apache Ant 使用 XML 作为配置语言,以描述性见长,而无法处理过于复杂的过程逻辑,因此便有了著名的 Ant-Contrib 扩展包(主页见这里)的用武之地。Ant-Contrib 的使用固然增加了 Apache Ant 的可编程性,但以笔者的观点看,违背了 Apache Ant 的设计初衷,同时 XML 本身即使具有了编程能力,传统编程语言的逻辑表现力绝非 XML 可比。事实上,通过其官方 API 扩展 Apache Ant 使其完成用户定制功能,从长远来看,具有更好的简洁性、健壮性、可维护性和稳定性,只不过相对亲切的 XML,阅读 API 的艰巨任务掩盖了扩展 Apache Ant 的优势罢了。

+ +
+ +

Apache Ant 构建文件由两部分元素组成,分别是 Task(任务) 和 DataType(数据类型)。通常而言,类型表示一个资源集合,如Fileset(文件集合);任务用于执行某些操作。虽然任务和类型有很多不同点,但两者从 Java 类结构上看又有很多相似之处。例如:

+ +
package packagePath;
+import org.apache.tools.ant.Task;  // 任务都继承自这里
+import org.apache.tools.ant.BuildException;
+
+/*
+ * 使用 Java Bean 规范定义 XML 属性,属性名从 getter/setter 名中推测得到。
+ * 若要添加子元素,需要使用 addXXX(YYYY e) 方法。XXX 为子元素的 XML 元素名
+ * (在 XML 中不分大小写,但在 Java 中的命名要符合 Java Bean 规范);
+ * YYYY 为其实际的 Java 类名。
+ */
+public class MyTask extends Task {
+    
+    private String myStringAttribute;
+    private int myIntAttribute;
+    private File myFileAttribute;
+    
+    private ArrayList<SelfDefinedSubElement> l;
+
+    public MyTask() {
+        l = new ArrayList<SelfDefinedSubElement>();
+    }
+
+    public String getMyStringAttribute() {
+        return myStringAttribute;
+    }
+
+    // 其它两个 getters ...
+
+    public void setMyStringAttribute(String myStringAttribute) {
+        this.myStringAttribute = myStringAttribute;
+    }
+
+    // 其它两个 setters ...
+
+    public void addSelfDefinedElement(SelfDefinedSubElement e) {
+        l.add(e);
+    }
+
+    /*
+     * 在这里开始执行任务。DataType 没有这个方法;但 DataType 有获取引用
+     * 的方法,即在一个地方使用属性 id 标志数据类型然后在另一个地方用 refid
+     * 获得其引用。详询 Apache Ant API
+     */
+    @Override
+    public void execute() {
+        if (myStringAttribute == null) {
+            throw new BuildException("myStringAttribute not set");
+        }
+
+        // 其它输入检查 ...
+
+        // 要完成的操作 ...
+    }  
+}
+
+ +

这是一个任务,一个 Java 文件。

+ +
# 在这里定义 MyTask 在 XML 里的元素名
+nameUsedByMyTaskInBuildfile=packagePath.MyTask
+selfDefinedElement=它的全限定类路径
+
+ +

这是任务声明,一个 propertes 文件。

+ +
<target name="XXX">
+  <!-- some other tasks -->
+  <nameUsedByMyTaskInBuildfile myStringAttribute="stringValue"
+                               myFileAttribute="C:\Users"
+                               myIntAttribute="5">
+    <selfDefinedElement someAttributes="" />
+  </nameUsedByMyTaskInBuildfile>
+  <!-- some other tasks -->
+</target>
+
+ +

这是该任务所对应的一个可能的 XML 示例。

+ +
package anotherPackagePath;
+import org.apache.tools.ant.types.DataType;  // 数据类型继承自这里
+import org.apache.tools.ant.BuildException;
+
+/*
+ * 说明与任务说明相同
+ */
+public class MyType extends DataType {
+    
+    private String myStringAttribute;
+    private int myIntAttribute;
+    private File myFileAttribute;
+    
+    private ArrayList<AnotherSelfDefinedSubElement> l;
+
+    public MyTask() {
+        l = new ArrayList<AnotherSelfDefinedSubElement>();
+    }
+
+    public String getMyStringAttribute() {
+        return myStringAttribute;
+    }
+
+    // 其它两个 getters ...
+
+    public void setMyStringAttribute(String myStringAttribute) {
+        this.myStringAttribute = myStringAttribute;
+    }
+
+    // 其它两个 setters ...
+
+    public void addAnotherSelfDefinedElement(AnotherSelfDefinedSubElement e) {
+        l.add(e);
+    }
+}
+
+ +

这是一个数据类型,一个 Java 文件。

+ +
nameUsedByMyTypeInBuildfile=anotherPackagePath.MyType
+anotherSelfDefinedElement=它的全限定类路径
+
+ +

这是数据类型声明,一个 propertes 文件。

+ +
<nameUsedByMyTypeInBuildfile myStringAttribute="stringValue"
+                             myFileAttribute="C:\Users"
+                             myIntAttribute="5"
+                             id="my.id">
+  <anotherSelfDefinedElement someAttributes="" />
+</nameUsedByMyTypeInBuildfile>
+
+ +

这是该数据类型所对应的一个可能的 XML 示例。

+ +
+ +
+
相关阅读:
+
Apache Ant API 的基本使用方法
+
+Apache Ant API(这是一个下载地址,Apache Ant 不提供官方的在线 API)
+
+ +
+ +
+ +
+
+ + + diff --git a/2017/04/23/relation-between-truncated-distribution-and-original-distribution.html b/2017/04/23/relation-between-truncated-distribution-and-original-distribution.html new file mode 100644 index 000000000..525976aa0 --- /dev/null +++ b/2017/04/23/relation-between-truncated-distribution-and-original-distribution.html @@ -0,0 +1,119 @@ + + + + + + + + +被截短的随机分布与原分布的关系 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

被截短的随机分布与原分布的关系

+ math/probability + +
+ +
+

已知随机分布的概率密度函数为$f_X(x)$,定义域为$D$。现将其定义域截取为$E$,其中$E \subseteq D$,即不断按照该分布取随机变量直到变量值落在$E$中。截取后的随机变量的分布的概率密度函数与$f_X(x)$是什么关系呢?

+ +

要回答这个问题,首先设截取后的概率密度函数为$f_U(x)$,设$a=\min{E}$(如果$E$无下界,令$a$表示$-\infty$)。$\forall x \in E$:

+ +\[\begin{aligned} +\int_a^x{f_U(t)\mathrm{d}t} &= \int_a^x{f_X(t)\mathrm{d}t} + \left(1 - \int_E{f_X(t)\mathrm{d}t}\right)\int_a^x{f_X(t)\mathrm{d}t} + \cdots\\ +\int_a^x{f_U(t)\mathrm{d}t} &= \sum_{n=1}^\infty{\left(1-\int_E{f_X(t)\mathrm{d}t}\right)}^n \int_a^x{f_X(t)\mathrm{d}t}\\ +\int_a^x{f_U(t)\mathrm{d}t} &= \left(\int_E{f_X(t)\mathrm{d}t}\right)^{-1} \int_a^x{f_X(t)\mathrm{d}t}\\ +{\mathrm{d} \over \mathrm{d}x}\int_a^x{f_U(t)\mathrm{d}t} &= \left(\int_E{f_X(t)\mathrm{d}t}\right)^{-1} {d \over \mathrm{d}x}\int_a^x{f_X(t)\mathrm{d}t}\\ +f_U(x) &= \left(\int_E{f_X(t)\mathrm{d}t}\right)^{-1} f_X(x) +\end{aligned}\] + +

所以随机分布在形状上不会有什么改变,但会变高。

+ +
+ +
+ +
+
+ + + diff --git a/2017/07/20/matlab-r2011b-neural-network-toolbox-note.html b/2017/07/20/matlab-r2011b-neural-network-toolbox-note.html new file mode 100644 index 000000000..ed8232788 --- /dev/null +++ b/2017/07/20/matlab-r2011b-neural-network-toolbox-note.html @@ -0,0 +1,130 @@ + + + + + + + + +MATLAB R2011b 神经网络工具箱注意事项 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

MATLAB R2011b 神经网络工具箱注意事项

+ dev/matlab + +
+ +
+

这是记录了我使用神经网络工具箱时遇到的坑,供自己和他人参考。先写一点,以后遇到再更新。

+ +

1

+ +
net = feedforwardnet;
+net = train(net, attributes, targets);
+
+ +

第一行创建了一个两层前馈网络,隐藏层神经元个数为默认的10,这没什么问题。创建完网络后,如果使用 view(net) 来查看网络拓扑的话,会发现输入向量和输出向量是没有的,这是因为还没有调用 configure 函数。configure 函数默认在第一次调用 train 函数时被自动调用。这里有一个坑。假设:

+ +
X = [
+  1 1 2;
+  2 1 3;
+  3 1 1;
+  2 1 3]';
+Y = [
+  0 1 1 0];
+
+ +

即输入向量是3维向量,数据集X中包含4个样本,训练采用分批训练方式。经过 train 函数调用后,net.IW{1,1}的维度竟然会变成10x2!不应该是10x3吗(注:隐藏层神经元个数10,输入向量3维)?因为数据集X中所有样本的第二个属性都是一样的(值都是1),结果这个属性就被Matlab忽略掉了,不知是有意为之还是bug。解决方法

+ +
X(:,find(var(X,0,1) < eps)) = X(:,find(var(X,0,1))) + min(min(X))*1e-5*randn(size(X,1),length(find(var(X,0,1))));
+
+ +

即,把被忽略的列加上一个小的白噪声让它们的值不一样。

+ +
+ +
+ +
+
+ + + diff --git a/2020/05/22/pytorch-crop-images-differentially.html b/2020/05/22/pytorch-crop-images-differentially.html new file mode 100644 index 000000000..90fefa598 --- /dev/null +++ b/2020/05/22/pytorch-crop-images-differentially.html @@ -0,0 +1,191 @@ + + + + + + + + +PyTorch crop images differentially | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

PyTorch crop images differentially

+ dev/pytorch | math/linear algebra + +
+ +
+

Intro

+ +

PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?

+ +

Theory: Affine transformation

+ +

Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range $[-1,1]$, where $(-1,-1)$ indicates the top-left corner, and $(1,1)$ indicates the bottom-right corner, as pointed out by the doc.

+ +

Let $(x,y)$ be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote $(x’,y’)$ as the bottom-right corner of the cropped image. It’s clear that $(x,y)$ corresponds to $(-1,-1)$ with respect to the cropped image coordinate system, and $(x’,y’)$ corresponds to $(1,1)$. We’d like a function $f$ that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function $f$ can be parameterized by an affine transformation matrix $\Theta$ such that

+ +\[\Theta = +\begin{pmatrix} +\theta_{11} & 0 & \theta_{13}\\ +0 & \theta_{22} & \theta_{23}\\ +0 & 0 & 1\\ +\end{pmatrix}\] + +

where $\theta_{12}=\theta_{21}=0$ since skewing is not involved. Denote $\mathbf{u}_H$ as the homogeneous coordinate of $\mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^\intercal$ such that $\mathbf{u}_H=\begin{pmatrix}\mathbf{u}^\intercal&1\end{pmatrix}^\intercal$, $\Theta$ maps $\mathbf{u}_H$ with respect to the cropped image system to $\mathbf{x}_H$ with respect to the original image system, i.e. $\mathbf{x}_H = \Theta \mathbf{u}_H$. Thus,

+ +\[\begin{pmatrix} +x & x'\\ +y & y'\\ +1 & 1 +\end{pmatrix} = +\begin{pmatrix} +\theta_{11} & 0 & \theta_{13}\\ +0 & \theta_{22} & \theta_{23}\\ +0 & 0 & 1\\ +\end{pmatrix} +\begin{pmatrix} +-1 & 1\\ +-1 & 1\\ +1 & 1\\ +\end{pmatrix}\] + +

Solving the equations,

+ +\[\Theta = +\begin{pmatrix} +\frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ +0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ +0 & 0 & 1\\ +\end{pmatrix}\] + +

where $x’\ge x, y’ \ge y$.

+ +

Coding time

+ +

We’ll need two functions:

+ +
    +
  1. +torch.nn.functional.affine_grid to convert the $\Theta$ parameterization to $f$
  2. +
  3. +torch.nn.functional.grid_sample to find the corresponding original image coordinate from each cropped image coordinate
  4. +
+ +
import torch
+import torch.nn.functional as F
+
+B, C, H, W = 16, 3, 224, 224  # batch size, input channels
+                              # original image height and width
+# Let `I` be our original image
+I = torch.rand(B, C, H, W)
+# Set the (x,y) and (x',y') to define the rectangular region to crop
+x, y = -0.5, -0.3  # some examplary random coordinates;
+x_, y_ = 0.7, 0.8  # in practice, (x,y,x_,y_) might be predicted
+                   # as a tensor in the computation graph
+# Set the affine parameters
+theta = torch.tensor([
+    [(x_-x)/2,       0, (x_+x)/2],
+    [       0,(y_-y)/2, (y_+y)/2],
+]).unsqueeze_(0).expand(B, -1, -1)
+# compute the flow field;
+# where size is the output size (scaling involved)
+# `align_corners` option must be the same throughout the code
+f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
+I_cropped = F.grid_sample(I, f, align_corners=False)
+
+ +

Read also

+ + + +
+ +
+ +
+
+ + + diff --git a/2022/02/05/align-strings-in-en-and-zh-like-bsd-ls.html b/2022/02/05/align-strings-in-en-and-zh-like-bsd-ls.html new file mode 100644 index 000000000..4e42fdff1 --- /dev/null +++ b/2022/02/05/align-strings-in-en-and-zh-like-bsd-ls.html @@ -0,0 +1,185 @@ + + + + + + + + +像BSD ls 一样中英文混排字符串(Python3) | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

像BSD ls 一样中英文混排字符串(Python3)

+ dev/python + +
+ +
+

这里有一个C语言实现的字符串打印功能。我没细看它支不支持中英文混排。我在此给一个Python3版的支持中英文混排的字符串打印代码。另见我的Gists:cjkjustfmtstrings_like_ls。下面的代码和Gists没有本质差别,只是我在下面新加了一点注释、精简了一点无关代码。

+ +

代码

+ +

中英文混排时的对齐函数cjkljust

+ +
try:
+    # https://f.gallai.re/cjkwrap
+    from cjkwrap import cjklen
+except ImportError:
+    import unicodedata
+
+    def is_wide(char):
+        return unicodedata.east_asian_width(char) in 'FW'
+
+    def cjklen(string):
+        return sum(2 if is_wide(char) else 1 for char in string)
+
+
+def cjkljust(string, width, fillbyte=' '):
+    """
+    >>> cjkljust('hello', 10, '*')
+    'hello*****'
+    >>> cjkljust('你好world', 10, '*')
+    '你好world*'
+    >>> cjkljust('你好world', 1, '*')
+    '你好world'
+    """
+    return string.ljust(len(string) + width - cjklen(string), fillbyte)
+
+ +

打印函数pprint

+ +
import math
+import itertools
+import shutil
+
+
+def calc_layout(n_strings, total_width, column_width, width_between_cols):
+    # expected_ncols * column_width +
+    #     (expected_ncols - 1) * width_between_cols <= total_width
+    #
+    #   解得 expected_ncols <= (total_width + width_between_cols) /
+    #                          (column_width + width_between_cols)
+    # 因此 expected_ncols 最大为不等号右边的向下取整
+    expected_ncols = math.floor((total_width + width_between_cols) /
+                                (column_width + width_between_cols))
+    expected_ncols = max(expected_ncols, 1)
+    actual_nrows = math.ceil(n_strings / expected_ncols)
+    actual_ncols = (n_strings - 1) // actual_nrows + 1
+    return actual_nrows, actual_ncols
+
+
+def pprint(strings, total_width=None, width_between_cols=1, file=None) -> None:
+    """
+    Pretty print list of strings like ``ls``.
+    :param strings: list of strings
+    :param total_width: the disposable total width, default to terminal width
+    :param width_between_cols: width between columns, default to 1
+    :param file: file handle to which to print, default to stdout
+    """
+    total_width = total_width or shutil.get_terminal_size().columns
+    assert total_width >= 1, total_width
+    assert width_between_cols >= 1, width_between_cols
+
+    if not strings:
+        return
+
+    # column_width: BSD ls 的列宽为所有待打印字符串的最长长度
+    column_width = max(map(cjklen, strings))
+    nrows, ncols = calc_layout(
+        len(strings), total_width, column_width, width_between_cols)
+    columns = [[] for _ in range(ncols)]
+    for i, s in enumerate(strings):
+        columns[i // nrows].append(s)
+
+    for row in itertools.zip_longest(*columns):
+        padded_row = (cjkljust(s or '', column_width) for s in row)
+        print((' ' * width_between_cols).join(padded_row), file=file)
+
+ +
+ +
+ +
+
+ + + diff --git a/2022/02/11/python-align-strings-in-en-and-zh.html b/2022/02/11/python-align-strings-in-en-and-zh.html new file mode 100644 index 000000000..bffee881b --- /dev/null +++ b/2022/02/11/python-align-strings-in-en-and-zh.html @@ -0,0 +1,151 @@ + + + + + + + + +如何在Python中对齐中英文混排字符串 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

如何在Python中对齐中英文混排字符串

+ dev/python + +
+ +
+

Python中有str.ljuststr.rjuststr.center用于左对齐、右对齐和居中对齐字符串。例如'hello'.ljust(10, '*')返回'hello*****''hello'.rjust(10, '*')返回'*****hello',等。每个中日韩文(CJK字符)在Python中被视为一个字符,然而它们的显示宽度为2,这个矛盾使ljustrjustcenter不能正确地对齐CJK字符:例如'你好'.ljust(5, '*')返回'你好***'而不是'你好*'。另见此文

+ +

为了阐述如何解决这个问题,假设我们要以$w$显示宽度对齐字符串s,并以ljust(doc)为例(另外两个同理),另假设fillchar='*'。易知我们需要在s的右侧补$w-l$个'*',其中$l$是s的显示宽度。而为了使ljust为我们补$w-l$个'*'ljust的第1个参数应为$n+w-l$,其中$n$为s的字符数。做简单的变换:$n+w-l = w-(l-n)$。假设s中有$a$个显示宽度为1的字符、$b$个显示宽度为2的字符,则$l=a+2b$,$n=a+b$,因此$l-n=b$,即$n+w-l=w-b$。如果s中显示宽度为2的字符限于CJK字符,那么$b$即为CJK字符的个数。Python中求CJK字符在一个字符串string中的个数的函数为:

+ +
import unicodedata
+
+def count_cjk_chars(string):
+    return sum(unicodedata.east_asian_width(c) in 'FW' for c in string)
+
+ +

不难得到适用于可能含有CJK字符的对齐函数:

+ +
def cjkljust(string, width, fillbyte=' '):
+    """
+    左对齐
+    
+    >>> cjkljust('hello', 10, '*')
+    'hello*****'
+    >>> cjkljust('你好world', 10, '*')
+    '你好world*'
+    >>> cjkljust('你好world', 1, '*')
+    '你好world'
+    """
+    return string.ljust(width - count_cjk_chars(string), fillbyte)
+
+
+def cjkrjust(string, width, fillbyte=' '):
+    """
+    右对齐
+    """
+    return string.rjust(width - count_cjk_chars(string), fillbyte)
+
+
+def cjkcenter(string, width, fillbyte=' '):
+    """
+    居中对齐
+    """
+    return string.center(width - count_cjk_chars(string), fillbyte)
+
+ +

完整代码参见我的Gist

+ +
+ +

也可从PyPI下载使用。

+ +
+ +
+ +
+
+ + + diff --git a/2022/02/13/list-imported-python-modules-using-ast.html b/2022/02/13/list-imported-python-modules-using-ast.html new file mode 100644 index 000000000..08c0a4ce4 --- /dev/null +++ b/2022/02/13/list-imported-python-modules-using-ast.html @@ -0,0 +1,441 @@ + + + + + + + + +使用抽象语法树ast统计哪些Python包与模块被导入了 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

使用抽象语法树ast统计哪些Python包与模块被导入了

+ dev/python + +
+ +
+

长话短说,我的Gist

+ +

给定一个没有requirements.txt的Python项目,如果想知道需要安装哪些包才能满足这个项目的依赖需求,一个容易想到的方法就是对每一个.py文件,用模式匹配(如正则表达式)找import xxx,并记录xxx为需要的包。然而import语句有很多形式,如:import xxximport xxx as aaaimport xxx as aaa, yyy as bbbfrom xxx.yyy import fff as cccfrom .zzz import ggg。因此,更好的方法是利用抽象语法树ast模块来找出所有import语句。

+ +

Python的import语句对应ast的两种节点:ast.Importast.ImportFrom。要从ast.Import获取导入包的列表,可用:

+ +
[a.name for a in node.names]  # 其中node是ast.Import类型的
+
+ +

要从ast.ImportFrom获取导入的包,可用:

+ +
node.module  # 其中node是ast.ImportFrom类型的
+
+ +

值得注意的是如果当前import语句是from . import xxxnode.module将会是None,此时node.level > 0,意味着相对导入。因此,要想获得所有导入的包(除了相对导入外,因为相对导入的包绝不会是需要安装的依赖),可以这样:

+ +
import ast
+# 假设source包含待解析源码
+root = ast.parse(source)
+result = []
+for node in ast.walk(root):
+    if isinstance(node, ast.Import):
+        for a in node.names:
+            result.append(a.name.split('.', maxsplit=1)[0])
+    elif isinstance(node, ast.ImportFrom):
+        if node.level == 0:
+            result.append(node.module.split('.', maxsplit=1)[0])
+
+ +

然而绝对导入的包也有可能是工作目录中已存在的模块或包啊,此时我们就可以根据导入路径判断它是不是指工作目录下的包:

+ +
def exists_local(path, rootpkg):
+    filepath = os.path.join(rootpkg, path.replace('.', os.path.sep))
+    # see if path is a local package
+    if os.path.isdir(filepath) and os.path.isfile(
+            os.path.join(filepath, '__init__.py')):
+        return True
+    # see if path is a local module
+    if os.path.isfile(filepath + '.py'):
+        return True
+
+    return False
+
+ +

其中path是导入路径,rootpkg是根包所在目录(定义见这里)。

+ +

把这个核心功能稍作包装,便可写出下面的完整可执行代码:

+ +
from __future__ import print_function
+
+import argparse
+import os
+import ast
+import sys
+import pkgutil
+import itertools
+import logging
+import json
+
+
+def make_parser():
+    parser = argparse.ArgumentParser(
+        description=('List all root imports. The *root* import of '
+                     '`import pkg1.mod1` is "pkg1".'))
+    parse_opts = parser.add_mutually_exclusive_group()
+    parse_opts.add_argument(
+        '-g',
+        '--greedy',
+        action='store_true',
+        help=('find also import statements within try block, '
+              'if block, while block, function definition, '
+              'etc.'))
+    parse_opts.add_argument(
+        '-a',
+        '--all',
+        action='store_true',
+        help=('first list all minimal-required root '
+              'imports (without `-g
+
+需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,`import bs4`然而`pip install beautifulsoup4`)。
+
+---
+
+类似项目:[pipreqs](https://github.com/bndr/pipreqs)。核心代码是几乎一样的,但包装得不同。
+), then list '
+              'additionally-required root imports (with '
+              '`-g
+
+需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,`import bs4`然而`pip install beautifulsoup4`)。
+
+---
+
+类似项目:[pipreqs](https://github.com/bndr/pipreqs)。核心代码是几乎一样的,但包装得不同。
+), and explain the two lists'))
+    parser.add_argument(
+        '-i',
+        '--include-installed',
+        action='store_true',
+        help='include installed/built-in modules/packages')
+    parser.add_argument(
+        '-T',
+        '--files-from',
+        metavar='LIST_FILE',
+        help=('if specified, the files to process '
+              'will be read one per line from '
+              'LIST_FILE; if specified as `-
+
+需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,`import bs4`然而`pip install beautifulsoup4`)。
+
+---
+
+类似项目:[pipreqs](https://github.com/bndr/pipreqs)。核心代码是几乎一样的,但包装得不同。
+, '
+              'stdin will be expected to contain '
+              'the files to process. Note that '
+              'SOURCE_FILEs, if exist, take '
+              'precedence (see below)'))
+    parser.add_argument(
+        '--ipynb',
+        action='store_true',
+        help=('if specified, the files ending with '
+              '".ipynb" in either SOURCE_FILEs or '
+              'LIST_FILE will be parsed as ipython '
+              'notebook files rather than Python '
+              'files'))
+    parser.add_argument(
+        'rootpkg',
+        metavar='ROOTPKG_DIR',
+        type=dir_type,
+        help=
+        ('the directory of the root package. See '
+         'https://docs.python.org/3.7/distutils/setupscript.html#listing-whole-packages '
+         'about *root package*. Local packages/modules will be '
+         'excluded from the results. For example, if '
+         'there are "mod1.py" and "mod2.py", and in '
+         '"mod2.py" there is `import mod1`, then "mod1" '
+         'won
+
+需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,`import bs4`然而`pip install beautifulsoup4`)。
+
+---
+
+类似项目:[pipreqs](https://github.com/bndr/pipreqs)。核心代码是几乎一样的,但包装得不同。
+t be listed in the result.'))
+    parser.add_argument(
+        'filenames',
+        metavar='SOURCE_FILE',
+        nargs='*',
+        help=('if specified one or more files, '
+              'only these SOURCE_FILEs will get '
+              'processed regardless of `-T
+
+需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,`import bs4`然而`pip install beautifulsoup4`)。
+
+---
+
+类似项目:[pipreqs](https://github.com/bndr/pipreqs)。核心代码是几乎一样的,但包装得不同。
+ '
+              'option; if no SOURCE_FILE is '
+              'specified, `-T
+
+需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,`import bs4`然而`pip install beautifulsoup4`)。
+
+---
+
+类似项目:[pipreqs](https://github.com/bndr/pipreqs)。核心代码是几乎一样的,但包装得不同。
+, if exists, is '
+              'processed. In both cases, the '
+              'final results will be joined'))
+    return parser
+
+
+def dir_type(string):
+    if not os.path.isdir(string):
+        raise argparse.ArgumentTypeError('must be a directory')
+    return string
+
+
+# Reference: https://stackoverflow.com/a/9049549/7881370
+def yield_imports(root, greedy):
+    """
+    Yield all absolute imports.
+    """
+    traverse = ast.walk if greedy else ast.iter_child_nodes
+    for node in traverse(root):
+        if isinstance(node, ast.Import):
+            for a in node.names:
+                yield a.name
+        elif isinstance(node, ast.ImportFrom):
+            # if node.level > 0, the import is relative
+            if node.level == 0:
+                yield node.module
+
+
+def exists_local(path, rootpkg):
+    """
+    Returns ``True`` if the absolute import ``path`` refers to a package or
+    a module residing under the working directory, else ``False``.
+    """
+    filepath = os.path.join(rootpkg, path.replace('.', os.path.sep))
+    # see if path is a local package
+    if os.path.isdir(filepath) and os.path.isfile(
+            os.path.join(filepath, '__init__.py')):
+        return True
+    # see if path is a local module
+    if os.path.isfile(filepath + '.py'):
+        return True
+
+    return False
+
+
+def filter_local(imports_iterable, rootpkg):
+    """
+    Remove modules and packages in the working directory, and yield root
+    imports.
+    """
+    for path in imports_iterable:
+        if not exists_local(path, rootpkg):
+            yield path.split('.', 1)[0]
+
+
+def filter_installed(imports_iterable):
+    """
+    Remove modules and packages already installed, which include built-in
+    modules and packages and those already installed (e.g. via ``pip``).
+    """
+    installed = set(
+        itertools.chain(sys.builtin_module_names,
+                        (x[1] for x in pkgutil.iter_modules())))
+    for name in imports_iterable:
+        if name not in installed:
+            yield name
+
+
+def collect_sources(filenames, files_from):
+    if filenames:
+        for filename in filenames:
+            yield filename
+    elif files_from == '-':
+        try:
+            for line in sys.stdin:
+                yield line.rstrip('\n')
+        except KeyboardInterrupt:
+            pass
+    elif files_from:
+        try:
+            with open(files_from) as infile:
+                for line in infile:
+                    yield line.rstrip('\n')
+        except OSError:
+            logging.exception('failed to read from "{}"'.format(files_from))
+
+
+def parse_python(filename):
+    with open(filename) as infile:
+        root = ast.parse(infile.read(), filename)
+    return root
+
+
+def parse_ipynb(filename):
+    source = []
+    with open(filename) as infile:
+        obj = json.load(infile)
+    for c in obj['cells']:
+        if c['cell_type'] == 'code':
+            source.extend(map(str.rstrip, c['source']))
+    source = (l for l in source if not l.lstrip().startswith('%'))
+    source = '\n'.join(source)
+    root = ast.parse(source, filename)
+    return root
+
+
+def produce_results(filenames, files_from, greedy, rootpkg, include_installed,
+                    ipynb):
+    all_imports = []
+    for filename in collect_sources(filenames, files_from):
+        parse_source = (parse_ipynb if ipynb and filename.endswith('.ipynb')
+                        else parse_python)
+        try:
+            root = parse_source(filename)
+        except OSError:
+            logging.exception('skipped')
+        except SyntaxError:
+            logging.exception('failed to parse "{}"; skipped'.format(filename))
+        else:
+            all_imports.append(yield_imports(root, greedy))
+    all_imports = itertools.chain.from_iterable(all_imports)
+    all_imports = filter_local(all_imports, rootpkg)
+    if not include_installed:
+        all_imports = filter_installed(all_imports)
+    all_imports = set(all_imports)
+    return all_imports
+
+
+def main():
+    logging.basicConfig(format='%(levelname)s: %(message)s')
+    args = make_parser().parse_args()
+
+    if not args.all:
+        all_imports = produce_results(args.filenames, args.files_from,
+                                      args.greedy, args.rootpkg,
+                                      args.include_installed, args.ipynb)
+        if all_imports:
+            print('\n'.join(sorted(all_imports)))
+    else:
+        min_imports = produce_results(args.filenames, args.files_from,
+                                      False, args.rootpkg,
+                                      args.include_installed, args.ipynb)
+        max_imports = produce_results(args.filenames, args.files_from,
+                                      True, args.rootpkg,
+                                      args.include_installed, args.ipynb)
+        extra_imports = max_imports - min_imports
+        printed_min_imports = False
+        if min_imports:
+            print('# minimal imports:')
+            print('\n'.join(sorted(min_imports)))
+            printed_min_imports = True
+        if extra_imports:
+            # pretty formatting purpose
+            if printed_min_imports:
+                print()
+            print('# additional possible imports:')
+            print('\n'.join(sorted(extra_imports)))
+
+    logging.shutdown()
+
+
+if __name__ == '__main__':
+    main()
+
+ +

需要注意的是,程序的输出并不一定是PyPI上包的名字(例如,import bs4然而pip install beautifulsoup4)。

+ +
+ +

类似项目:pipreqs。核心代码是几乎一样的,但包装得不同。

+ +
+ +
+ +
+
+ + + diff --git a/2022/02/17/python-tox-usage-note.html b/2022/02/17/python-tox-usage-note.html new file mode 100644 index 000000000..4053e01ab --- /dev/null +++ b/2022/02/17/python-tox-usage-note.html @@ -0,0 +1,280 @@ + + + + + + + + +Python Tox 使用笔记 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Python Tox 使用笔记

+ dev/python + +
+ +
+

Tox是一个项目自动化工具,在此记录下没在文档和网上tutorial找到的使用细节。试验中尽可能使用最小tox.ini。本文使用tox --showconfig -- <args...>的形式观察配置结果。如果文中没有提<args...>是什么(例如直接说“配置结果为”,而不是“运行…后配置结果为“),那么运行的是tox --showconfig

+ +

默认basepython +

+ +

情况一

+

tox.ini为空。此时只有一个匿名虚拟环境。

+ +

配置结果为:

+ +
...
+
+[testenv:python]
+...
+basepython = /Library/Frameworks/Python.framework/Versions/3.9/bin/python3
+...
+
+ +

这里的/Library/Frameworks/Python.framework/Versions/3.9/bin/python3是本机上按PATH顺序第一个遇到的Python解释器(注意这里既不是第一个python也不是第一个python3)。另外可以观察到,匿名虚拟环境被命名为python

+ +

情况二

+ +

tox.ini

+ +
[testenv:x]
+
+ +

此时只有一个名为x的虚拟环境,x不与文档中的任何一种特殊命名匹配。配置结果为

+ +
...
+
+[testenv:x]
+...
+basepython = /Library/Frameworks/Python.framework/Versions/3.9/bin/python3
+...
+
+ +

可见与情况一相同。

+ +

情况三

+ +

tox.ini

+ +
[testenv:py28]
+
+ +

此时只有一个名为py28的虚拟环境。配置结果为

+ +
...
+
+[testenv:py28]
+...
+basepython = python2.8
+...
+
+ +

我们知道是没有python2.8的,可见tox这里只是做了一个简单的从pyMNpythonM.N的映射。此时如果运行tox的话是要报错的(即使tox.ini里加上skipsdist = true也会报错):ERROR: InterpreterNotFound: python2.8

+ +

情况四

+ +

tox.ini

+ +
[testenv:py28]
+basepython = python2.7
+
+ +

与情况三相同,但显式指定了basepython。配置结果为

+ +
...
+
+[testenv:py28]
+...
+basepython = python2.7
+...
+
+ +

可见显式指定的basepython生效了。

+ +

+{posargs}展开

+ +

情况一

+ +

tox.ini

+ +
[testenv]
+commands = {posargs}
+
+ +

运行tox --showconfig后(无参数),配置结果为

+ +
...
+commands = [[]]
+...
+
+ +

可见{posargs}在无参数时展开为空字符串。

+ +

运行tox --showconfig -- hello world后(带参数),配置结果为

+ +
...
+commands = [['hello', 'world']]
+...
+
+ +

{toxinidir}下新建两个文件hello1hello2,然后运行tox --showconfig -- hello*后(注意这里的运行环境不是Windows),配置结果为

+ +
...
+commands = [['hello1', 'hello2']]
+...
+
+ +

这是符合期望的,因为Shell在传参前先做了Globbing,然而如果运行tox --showconfig -- "hello*"后,配置结果为

+ +
...
+commands = [['hello*']]
+...
+
+ +

可见{posargs}不会做Globbing。

+ +

举一个运行tox的例子。令tox.ini

+ +
[tox]
+skipsdist = true
+
+[testenv]
+allowlist_externals = ls
+commands = ls {posargs}
+
+ +

如果运行tox -- "hello*",我们会得到结果

+ +
python run-test-pre: PYTHONHASHSEED='2558120981'
+python run-test: commands[0] | ls 'hello*'
+ls: hello*: No such file or directory
+ERROR: InvocationError for command /bin/ls 'hello*' (exited with code 1)
+_________________________ summary __________________________
+ERROR:   python: commands failed
+
+ +

情况二

+ +

tox.ini

+ +
[testenv]
+commands = "{posargs}"
+
+ +

注意{posargs}两边的引号。运行tox --showconfig后(无参数),配置结果为

+ +
...
+commands = [['']]
+...
+
+ +

可见虽然{posargs}在无参数时展开为空字符串,但现在有引号,导致仍产生了一个参数,只不过该参数值为空。

+ +

运行tox --showconfig -- hello后(一个参数),配置结果为

+ +
...
+commands = [['hello']]
+...
+
+ +

没什么值得惊讶的。

+ +

运行tox --showconfig -- hello world后(多参数),配置结果为

+ +
...
+commands = [['hello world']]
+...
+
+ +

可见虽然{posargs}展开成了两个参数,但是引号又重新把它们括成了一个参数。

+ +
+ +
+ +
+
+ + + diff --git a/2022/05/18/python-cannot-import-name-sysconfig-from-distutils.html b/2022/05/18/python-cannot-import-name-sysconfig-from-distutils.html new file mode 100644 index 000000000..b1483749b --- /dev/null +++ b/2022/05/18/python-cannot-import-name-sysconfig-from-distutils.html @@ -0,0 +1,158 @@ + + + + + + + + +python cannot import name ‘sysconfig’ from ‘distutils’ | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

python cannot import name 'sysconfig' from 'distutils'

+ dev/python | os/ubuntu + +
+ +
+

搜索这个问题时看到了这篇博客,感觉略麻烦。我就想有没有更简单的方式。后来摸索出来了,记在这里。

+ +

环境

+ +
    +
  • Python 3.9.12
  • +
  • Ubuntu 18.04 LTS
  • +
+ +

安装 Python3.9

+ +

详见这篇回答。简要转述如下:

+ +
sudo apt update
+sudo apt install software-properties-common
+sudo add-apt-repository ppa:deadsnakes/ppa
+sudo apt install python3.9
+
+ +

问题

+ +
python3.9 -m pip -V
+
+ +

报错

+ +
Traceback (most recent call last):
+  File "/usr/lib/python3.9/runpy.py", line 188, in _run_module_as_main
+    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
+  File "/usr/lib/python3.9/runpy.py", line 147, in _get_module_details
+    return _get_module_details(pkg_main_name, error)
+  File "/usr/lib/python3.9/runpy.py", line 111, in _get_module_details
+    __import__(pkg_name)
+  File "/usr/lib/python3/dist-packages/pip/__init__.py", line 29, in <module>
+    from pip.utils import get_installed_distributions, get_prog
+  File "/usr/lib/python3/dist-packages/pip/utils/__init__.py", line 23, in <module>
+    from pip.locations import (
+  File "/usr/lib/python3/dist-packages/pip/locations.py", line 9, in <module>
+    from distutils import sysconfig
+ImportError: cannot import name 'sysconfig' from 'distutils' (/usr/lib/python3.9/distutils/__init__.py)
+
+ +

解决方法

+ +

注意到上文中ppa:deadsnakes/ppa里包含python3.9-venv,而venv显然依赖pip。安装python3.9-venv便能自动处理好依赖。

+ +
sudo apt install python3.9-venv
+
+ +

再看python3.9 -m pip -V即可输出正确的

+ +
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.9)
+
+ +
+ +
+ +
+
+ + + diff --git a/2022/05/24/sync-music-from-mac-to-ipad-without-itunes.html b/2022/05/24/sync-music-from-mac-to-ipad-without-itunes.html new file mode 100644 index 000000000..45f64bcad --- /dev/null +++ b/2022/05/24/sync-music-from-mac-to-ipad-without-itunes.html @@ -0,0 +1,119 @@ + + + + + + + + +如何不通过iTunes将Mac上的音乐同步到iPad | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

如何不通过iTunes将Mac上的音乐同步到iPad

+ os/ios | os/macOS + +
+ +
+

本文记录了如何不通过iTunes(或Finder,如果是新系统的话)将苹果电脑上的文件(音乐、视频等)同步到iPad。本文以同步音乐为例。

+ +
    +
  1. 在Terminal中cd到音乐文件夹(如~/Music),使用fd命令列出所有音乐并传给zip打包,假设打包为share.zipfd -emp4 -etma -em4a -emp3 -d1 . | zip -0qT share.zip -@ +
  2. +
  3. 在Terminal中输入ifconfig | grep 192 | awk '{ print $2 }'确认自己在局域网中的IP地址。
  4. +
  5. 使用python3 -m http.server 9000建立一个http服务器,这里建立在9000端口上。
  6. +
  7. 将iPad连接至与Mac同一局域网。打开iPad的Safari浏览器,在地址栏输入http://192.168.0.xxx:9000,其中192.168.0.xxx表示在第2步中确认的IP地址。
  8. +
  9. Directory listing for /下面找到share.zip,单击下载。下载完毕后应该在Files应用中的On My iPad/Downloads下面找到。
  10. +
  11. 单击share.zip,此时会自动解压为share文件夹。单击share文件夹进入,右上角点击Select,然后左上角点击Select All全选,然后下面点击Move,选择位置,例如移动到On My iPad/Music,音乐就都移动过去了。
  12. +
  13. 删除share文件夹和share.zip
  14. +
+ +

虽然步骤有点多,熟练了也不是很麻烦。第3步中建立的服务器可以常开着,以便随时同步。

+ +
+ +
+ +
+
+ + + diff --git a/2022/05/26/develop-python-cpp-extension-using-cython.html b/2022/05/26/develop-python-cpp-extension-using-cython.html new file mode 100644 index 000000000..82ba163bd --- /dev/null +++ b/2022/05/26/develop-python-cpp-extension-using-cython.html @@ -0,0 +1,216 @@ + + + + + + + + +使用Cython为Python开发C++扩展 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

使用Cython为Python开发C++扩展

+ dev/python | dev/c++ + +
+ +
+

Cython的出现免去了为Python开发C/C++扩展的很多麻烦。本文以一个简单的例子来说明如何为Python开发C++扩展。

+ +

例子程序:给定一个列表,把列表的每个元素平方,并返回新列表。用Python实现会是这样:

+ +
def square(l):
+    return [x * x for x in l]
+
+ +

现在我们用C++实现这个函数。根据Using C++ in Cython,Python列表对应于C++的std::vector,因此我们可以用std::vector

+ +

_square.h:

+ +
#ifndef _SQUARE_H_
+#define _SQUARE_H_
+
+#include <vector>
+
+std::vector<double> _square(std::vector<double> &);
+
+#endif
+
+ +

_square.cpp:

+ +
#include "_square.h"
+
+std::vector<double> _square(std::vector<double> &l)
+{
+    std::vector<double> res(l.size());
+    for (auto i = l.begin(); i != l.end(); ++i) {
+        res.push_back(*i * *i);
+    }
+    return res;
+}
+
+ +

注意到上文代码文件名和函数名都以下划线开头,这里没有什么特殊规则,只是不让它们与Cython文件名和函数重名。接下来我们写封装C++的Cython代码。Cython代码后缀是.pyx

+ +

square.pyx:

+ +
from libcpp.vector cimport vector
+
+cdef extern from "_square.h":
+    vector[double] _square(vector[double] l)
+
+def square(l):
+    cdef vector[double] l_vec = l
+    return _square(l_vec)
+
+ +

最后我们编写用于编译的setup.pysetup.py位于项目根目录。这里假设上述_square.h_square.cppsquare.pyx都位于Python package package1.package2下。

+ +

setup.py:

+ +
from setuptools import Extension, setup
+from Cython.Build import cythonize
+
+extensions = [
+    Extension(
+        # 这里写完整包名
+        name='package1.package2.square',
+        # 这里包含Cython文件和C++源文件
+        sources=[
+            'package1/package2/square.pyx',
+            'package1/package2/_square.cpp',
+        ],
+        # 这里写编译flags;
+        # - 写`-std=c++11`因为我们用了`auto`关键字
+        # - 写`-DNDEBUG`是为了忽略所有`assert`(虽然这里并没有`assert`,只是为多举一个例子)
+        extra_compile_args=['-std=c++11', '-DNDEBUG'],
+        language='c++',
+    ),
+]
+
+setup(
+    # name参数可写可不写,这里没写
+    #name='...',
+    ext_modules=cythonize(extensions),
+    zip_safe=False,
+)
+
+ +

注意最后有一个zip_safe=False,根据Building a Cython module using setuptools,这是为避免一个导入错误:

+ +
+

One caveat: the default action when running python setup.py install is to create a zipped egg file which will not work with cimport for pxd files when you try to use them from a dependent package. To prevent this, include zip_safe=False in the arguments to setup().

+
+ +

最后我们来编译这个扩展模块。在命令行,项目根目录(即setup.py所在目录),执行:

+ +
python3 setup.py build_ext --inplace
+
+ +

为执行这条命令,Windows需要Visual Studio,Linux需要GNU工具链(g++),Mac需要XCode(clang++)。

+ +

为使用这个扩展模块,我们可以这样:

+ +
from package1.package2.square import square
+
+l1 = [1., 2., 3.]
+print(square(l1))
+
+ +

输出

+ +
[1.0, 4.0, 9.0]
+
+ +

致谢

+ +

本文受这个回答启发而创作。

+ +
+ +
+ +
+
+ + + diff --git a/2022/06/02/pass-dynamic-array-between-cpp-and-python.html b/2022/06/02/pass-dynamic-array-between-cpp-and-python.html new file mode 100644 index 000000000..9eaef7941 --- /dev/null +++ b/2022/06/02/pass-dynamic-array-between-cpp-and-python.html @@ -0,0 +1,202 @@ + + + + + + + + +使用Cython在Python和C++间互传大小事先未知的numpy数组 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

使用Cython在Python和C++间互传大小事先未知的numpy数组

+ dev/python | dev/c++ | dev/cython + +
+ +
+

从C++传到Python

+ +

常见的教程如这个问题及回答是将大小已知的numpy数组传入传出C++,如确定会从C++传出大小为$M \times N$的矩阵。方法简单讲就是在Python端分配一个大小为$M \times N$的矩阵,把指向这个矩阵的指针传给C++,C++负责修改矩阵的内容,结束后矩阵就自动“传回”了。

+ +

然而有时我们事先不知道从C++传回的矩阵是多大,这时我们可以用这个回答所提及的技术,即从C++传回std::vector,然后在Python端把它无拷贝地转成numpy数组。

+ +

例子:从C++传回$M \times 2$大小的矩阵,$M$在Python端未知。例子主要来源于网络,但我稍微换了一下应用,并修改了里面的谬误。

+ +

doit.h:

+ +
#ifndef _DOIT_H_
+#define _DOIT_H_
+#include <vector>
+std::vector<long> arange2d();
+#endif
+
+ +

doit.cpp:

+ +
#include "doit.h"
+std::vector<long> arange2d() {
+	std::vector<long> arr(10);
+	long x = 0;
+	for (auto i = arr.begin(); i != arr.end(); ++i) {
+		*i = x++;
+	}
+	return arr;
+}
+
+ +

fast.pyx:

+ +
from libcpp.vector cimport vector
+
+cdef extern from 'doit.h':
+    vector[long] arange2d()
+
+cdef class ArrayWrapper:
+    cdef vector[long] v
+    cdef Py_ssize_t shape[2];
+    cdef Py_ssize_t strides[2];
+
+    def set_data(self, vector[long]& data):
+        self.v.swap(data)  # 注(1)
+
+    def __getbuffer__(self, Py_buffer *buf, int flags):
+        self.shape[0] = self.v.size() // 2
+        self.shape[1] = 2
+        self.strides[0] = self.shape[1] * sizeof(long)
+        self.strides[1] = sizeof(long)
+
+        # 注(2)
+        buf.buf = <char *> self.v.data()
+        buf.format = 'l'  # 注(3)
+        buf.internal = NULL
+        buf.itemsize = <Py_ssize_t> sizeof(long)
+        buf.len = self.v.size() * sizeof(long)
+        buf.ndim = 2
+        buf.obj = self
+        buf.readonly = 0
+        buf.shape = self.shape
+        buf.strides = self.strides
+        buf.suboffsets = NULL
+
+def pyarange2d():
+    cdef vector[long] arr = arange2d()
+    cdef ArrayWrapper wrapper = ArrayWrapper()
+    wrapper.set_data(arr)
+    return np.asarray(wrapper)
+
+ +
    +
  • 注(1):std::vector<T>::swap完成了无拷贝传值,另一种方法是用std::move,不过那需要cdef extern from '<utility>' namespace 'std' nogil: vector[long] move(vector[long]),应该是这样,不过我没试过
  • +
  • 注(2):numpy的Buffer Protocol见此处,里面讲了buf需要设置哪些属性
  • +
  • 注(3):buf.format如何设置见此处 +
  • +
+ +

至于从C++传回Python的多维数组有两个及以上的维度不知道的话(已知维度总数ndim),网络上没找到答案,但我是这么做的:

+ +
    +
  1. 传给C++一个指向Py_ssize_t类型、长度为ndim的数组(即待传回数组的shape)的指针
  2. +
  3. C++传回一个std::vector并修改shape元素为合适的值
  4. +
  5. 按照shapestd::vector的元素类型填写buf的属性,完成std::vector到numpy数组的转换
  6. +
+ +

从Python传到C++

+ +

这应该已经耳熟能详了,我就不在此赘述了。不过有一点需要注意。传double数组时没问题,各平台double都对应numpy.float64。传int数组时需注意,Windows下对应numpy.int32、Linux/Mac下对应numpy.int64。所以直接用传double数组的方法传int数组会报这个错:

+ +
Cannot assign type 'int_t *' to 'int *'
+
+ +

这个问题(就是我提的)。目前我还没有优雅的解决方法。我笨拙的方法(受ead的启发)(请对照着“这个问题”看)如下:把所有的int全替换为int64_t(或int32_t,一致就行),例如int * => int64_t *np.int_t => np.int64_t,然后在dotit.h包含头文件的地方加上#include <cstdint>,在q.pyx头部加上from libc.stdint cimport int64_t。应该就可以编译了。

+ +

补充一点我近期观察到的:以上workaround在Windows下(Visual Studio 2022)貌似不行,会报不能将numpyint32_t转为int32_t,类似这样的错。在Darwin和Linux下都是能通过编译的。

+ +
+ +
+ +
+
+ + + diff --git a/2022/07/24/read-hdf5-from-cpp.html b/2022/07/24/read-hdf5-from-cpp.html new file mode 100644 index 000000000..eee1fa5cc --- /dev/null +++ b/2022/07/24/read-hdf5-from-cpp.html @@ -0,0 +1,223 @@ + + + + + + + + +Read HDF5 file from C++ | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Read HDF5 file from C++

+ dev/c++ + +
+ +
+

Introduction

+ +

HDF5 is a cross-platform data format used to save (high dimensional) arrays. +There are various language bindings out there for manipulating HDF5 files, including C++. +I record here, after stumbling around many hours, how to read data using C++.

+ +

Read scalars

+ +
// note the header is not "hdf5.h"
+#include "H5Cpp.h"
+
+int main()
+{
+	H5::H5File file("/path/to/data.h5", H5F_ACC_RDONLY);
+	H5::DataSet dataset = file.openDataSet("dataset/path");
+	H5::DataSpace filespace = dataset.getSpace();
+	// it might be more than sufficient to use `1` here
+	hsize_t shape[1];
+	// `_dims` must be 0;
+	// `shape` shouldn't be touched
+	int _dims = filespace.getSimpleExtentDims(shape);
+	H5::DataSpace mspace(0, shape);  // where 0 comes from `_dims`
+	double buf[1];
+	dataset.read(buf, H5::PredType::NATIVE_DOUBLE, mspace, filespace);
+
+	// the scalar is in `buf[0]`
+
+	return 0;
+}
+
+ +

Read vector to array

+ +
#include "H5Cpp.h"
+
+int main()
+{
+	H5::H5File file("/path/to/data.h5", H5F_ACC_RDONLY);
+	H5::DataSet dataset = file.openDataSet("dataset/path");
+	H5::DataSpace filespace = dataset.getSpace();
+	// `1` corresponds to 1D array (vectors);
+	// if reading 2D array (matrices), replace `1` with `2`, so forth
+	hsize_t shape[1];
+	// `_dims` is the actual N in N-D array; should be the same as
+	// previously set; `shape` has now been set
+	int _dims = filespace.getSimpleExtentDims(shape);
+	H5::DataSpace mspace(1, shape); // replace `1` with `2` if like above
+	double *buf = new double[shape[0]];
+	// if reading 2D array the previous line should be replaced by:
+	//double *buf = new double[shape[0] * shape[1]];
+	// so forth
+	dataset.read(buf, H5::PredType::NATIVE_DOUBLE, mspace, filespace);
+
+	// the vector (or flatten matrix if reading matrix) is in `buf`
+
+	delete[] buf;
+	return 0;
+}
+
+ +

Note that arrays are stored contiguously. +Read arrays using something like double buf[M][N] is not allowed. +See this answer.

+ +

Read vector to std::vector +

+ +

Basically the same …

+ +
#include "H5Cpp.h"
+#include <vector>
+
+int main()
+{
+	H5::H5File file("/path/to/data.h5", H5F_ACC_RDONLY);
+	H5::DataSet dataset = file.openDataSet("dataset/path");
+	H5::DataSpace filespace = dataset.getSpace();
+	// `1` corresponds to 1D array (vectors);
+	// if reading 2D array (matrices), replace `1` with `2`, so forth
+	hsize_t shape[1];
+	// `_dims` is the actual N in N-D array; should be the same as
+	// previously set; `shape` has now been set
+	int _dims = filespace.getSimpleExtentDims(shape);
+	H5::DataSpace mspace(1, shape); // replace `1` with `2` if like above
+	// must preserve enough space here
+	std::vector<double> buf(shape[0]);
+	// likewise, previous line should be written as
+	//std::vector<double> buf(shape[0] * shape[1]);
+	// if reading 2D array, so forth
+	// note the `.data()` here
+	dataset.read(buf.data(), H5::PredType::NATIVE_DOUBLE, mspace, filespace);
+
+	// the vector is in `buf`
+
+	return 0;
+}
+
+ +

Compile above code

+ +

I’m not quite sure how to compile on Windows, but for Linux and macOS, Makefile should be written like this.

+ +
LDFLAGS = \
+	-L/path/to/hdf5/incstall/directory/lib
+# note the library names here; only `-lhdf5` is not enough
+LDLIBS = \
+	-lhdf5 \
+	-lhdf5_cpp \
+	-lhdf5_hl_cpp
+CPPFLAGS = \
+	-I/path/to/hdf5/install/directory/include
+CXX = clang++
+
+# I haven't tried what if `-std=c++11` is not added, but I guess it
+# should be okay
+a.out : source.cpp
+	$(CXX) $(CPPFLAGS) $(LDFLAGS) -std=c++11 -o $@ $^ $(LDLIBS)
+
+ +
+ +
+ +
+
+ + + diff --git a/2022/07/24/set-up-github-pages-macos.html b/2022/07/24/set-up-github-pages-macos.html new file mode 100644 index 000000000..492e113ed --- /dev/null +++ b/2022/07/24/set-up-github-pages-macos.html @@ -0,0 +1,164 @@ + + + + + + + + +Set up GitHub Pages on macOS | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Set up GitHub Pages on macOS

+ misc + +
+ +
+

The steps are organized in a shell script like form:

+ +
brew install chruby
+# add '. /usr/local/opt/chruby/share/chruby/chruby.sh' to .bashrc or .zshrc
+
+# install ruby alternative to system's
+brew install automake bison openssl readline libyaml gdbm libffi
+curl --remote-name https://cache.ruby-lang.org/pub/ruby/3.1/ruby-3.1.0.tar.xz
+tar xf ruby-3.1.0.tar.xz
+cd ruby-3.1.0
+./configure --prefix="$HOME/.rubies/ruby-3.1.0" --with-opt-dir="$(brew --prefix openssl):$(brew --prefix readline):$(brew --prefix libyaml):$(brew --prefix gdbm):$(brew --prefix libffi)"
+make -j4
+make install
+
+# restart shell
+
+# set default ruby
+chruby ruby-3.1.0
+# now ensure 'command -v ruby' or 'command -v gem' returns the one under ~/.rubies
+
+# install Bundler and Jekyll
+gem install bundler jekyll
+
+mkdir /path/to/website/local/dir
+cd $_
+git init username.github.io
+cd $_
+jekyll new --skip-bundle .
+# follow instruction from https://docs.github.com/en/pages/setting-up-a-github-pages-site-with-jekyll/creating-a-github-pages-site-with-jekyll
+bundle install
+
+# according to https://stackoverflow.com/a/70916831/7881370
+bundle add webrick
+
+# set up mathjax etc.
+# follow https://github.com/jeffreytse/jekyll-spaceship#installation
+# but in Gemfile, add `gem "jekyll-spaceship", "~> 0.9.9"` instead of
+# using the latest version by not specifying version (see Issue #81 of
+# 'jeffreytse/jekyll-spaceship' at GitHub)
+
+bundle install
+
+# Mathjax can now be rendered locally, but not on GitHub. That's because
+# jekyll-spaceship is not in its whitelist. See
+# https://github.com/marketplace/actions/jekyll-deploy-action for detail.
+# Follow its instruction (including adding the github workflow file,
+# creating 'gh-pages' orphan branch). Then ensure the GitHub Personal
+# Access Token (PAT) has sufficient permission (for workflow specifically).
+# Push master to GitHub.
+
+# Mathjax should already be ready.
+
+#####################################################
+# The only command needed to run over and over again:
+#####################################################
+
+# build and serve locally
+bundle exec jekyll serve
+
+ +
+ +
+ +
+
+ + + diff --git a/2022/08/09/notes-build-cython-using-setup-dot-py.html b/2022/08/09/notes-build-cython-using-setup-dot-py.html new file mode 100644 index 000000000..218e85777 --- /dev/null +++ b/2022/08/09/notes-build-cython-using-setup-dot-py.html @@ -0,0 +1,199 @@ + + + + + + + + +Notes on building Cython using setup.py | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Notes on building Cython using setup.py

+ dev/python | dev/cython + +
+ +
+

Basic structure

+ +
from setuptools import Extension, setup
+from Cython.Build import cythonize
+
+extensions = [
+    Extension(
+        name=...,
+        sources=[
+            ...
+        ],
+        include_dirs=[
+            ...
+        ],
+        library_dirs=[
+            ...
+        ],
+        libraries=[
+            ...
+        ],
+        runtime_library_dirs=[
+            ...
+        ],
+        define_macros=[
+            (..., ...),
+            ...
+        ],
+        extra_compile_args=[
+            ...
+        ],
+        extra_link_args=[
+            ...
+        ],
+        language='...',
+    ),
+    ...
+]
+
+setup(
+    ext_modules=cythonize(extensions, language_level='3'),
+    zip_safe=False,
+)
+
+ +

Notes:

+ +
    +
  • +name=...: to be explained in detail below
  • +
  • +sources=[...]: from my experiments, seem must contain one and only one .pyx Cython source
  • +
  • +language_level='3' is used when developing in Python 3.
  • +
  • +zip_safe=False is used as per cython doc +
  • +
  • +define_macros=[("NPY_NO_DEPRECATED_API", "NPY_1_7_API_VERSION")] can be used when devloping using newer version of numpy, to avoid compile-time warnings, despite harmless
  • +
+ +

Name of Extension

+ +
+

the full name of the extension, including any packages – ie. not a filename or pathname, but Python dotted name

+
+ +

For example, a name foo.bar will generate ./foo/bar.*.so file, where * can be obtained by invoke on command line python3-config --extension-suffix, e.g. ./foo/bar.cpython-39-darwin.so. +The file path is relative to build root, the location where setup.py sits.

+ +

Precedence of import

+ +

Suppose the extension we are talking about is named foo.bar. +Let’s assume there’s already a directory named ./foo/bar/. +Open Python prompt under build root and type from foo.bar import xxx where xxx is anything defined in foo.bar. +This should work fine. +Now add an empty foo/bar/__init__.py. +Repeat the above process; it should echo AttributeError on xxx. +This means that the Python package foo.bar takes precedence over the extension module foo.bar.

+ +

Another circumstance. +Again the extension is named foo.bar. +However, there’s now a directory ./foo/ with bar.py and __init__.py inside. +From my experiment, this time extension foo.bar takes precedence over the Python package foo.bar.

+ +

It appears quite involved to me. +So the best practice might be just to avoid having name collision with Python module/package.

+ + + + + +
+ +
+ +
+
+ + + diff --git a/2022/08/31/vae-training-trick.html b/2022/08/31/vae-training-trick.html new file mode 100644 index 000000000..9562bae30 --- /dev/null +++ b/2022/08/31/vae-training-trick.html @@ -0,0 +1,122 @@ + + + + + + + + +Variational Autoencoder training trick | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Variational Autoencoder training trick

+ machine learning + +
+ +
+

A decent tutorial on Variational Autoencoder (VAE) can be found at arXiv. +While I was playing with VAE, and trying the Gaussian output distribution with gradually higher dimensionality, I found a trick that ensure numerical stability at the beginning of training. +As we know, the “encoder” of VAE outputs $\mu_X$ and $\log\sigma_X^2$ given input $x$, and a $z$ is sampled from the Gaussian determined by $\mu_X$ and $\sigma_X^2$ afterwards. +To compute $\sigma_X^2$, $\sigma_X^2=e^{\log\sigma_X^2}$.

+ +

A problem arises, that $\log\sigma_X^2$ goes large enough such that $\sigma_X^2$ becomes floating-point infinity, especially when the mean and log variance is predicted by a dense linear layer and when the input dimension is high. +This is because, despite the fact that log variance is typically small at the end of training, it’s value at the beginning of training is determined by random initialization of the dense linear layer. +Suppose that the linear layer is initialized as standard Gaussian. +With $K$ input neurons, each output element of the linear layer follows the distribution of the sum of $K$ standard Gaussian, whose variance is positively proportional to $K$. +It follows that the maximum of all output elements is proportional to the variance. +Therefore, there should exist an element in $\sigma_X^2$ that is $e^K$ times the expected range. +Naturally, when $K$ is large, it goes to floating-point infinity.

+ +

To solve the problem, we may goes one step further. +Rather than predict $\log\sigma^2$, we predict $K\log\sigma_X^2$, and in turn the output variance becomes $e^{(K\log\sigma_X^2)/K}$, which won’t ever reach infinity. +Since $K$ is a constant throughout training, it won’t cause any effect to the training overall.

+ +
+ +
+ +
+
+ + + diff --git a/2023/03/05/learn-applescript-for-beginners.html b/2023/03/05/learn-applescript-for-beginners.html new file mode 100644 index 000000000..e4ae4e42d --- /dev/null +++ b/2023/03/05/learn-applescript-for-beginners.html @@ -0,0 +1,550 @@ + + + + + + + + +Learn Applescript for Beginners | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Learn Applescript for Beginners

+ dev/applescript + +
+ +
+ + +
+

youtube link

+
+ +
+

Good referential text:

+ + +
+ +

Script editor

+ +

log "message" can be used to log messages that will be shown in “Messages” accessory pane at the bottom.

+ +

File formats

+ +
    +
  • Text: save as plain text
  • +
  • Script: save as compiled library
  • +
  • Application: used to create applet; + If in code on open(droppedFiles) ... end open is written, then it becomes a dropplet, which allows drag-n-drop action on the created applet.
  • +
+ +

Dictionaries

+ +

Words inside square bracket are optional arguments. +Words on the right of right arrow is what will be returned. +What will be returned can be caught by setting it to set variable to. +For example,

+ +
set myDoc to make new document
+
+ +

Variables

+ +
on run
+    set theName to "Jill"
+    log theName
+end run
+
+ +

Scope of variable

+ +

By default within current handler, e.g. on run handler, on sub_handler() handler (subroutine). +However, if claiming a variable as property at the top of script:

+ +
property theName : null
+
+ +

theName will become a global variable.

+ +

Data types

+ +
    +
  • number (integer or real) set x to 23.25 +
  • +
  • string: set x to "23.25"; but set x to "23.25" as number cast x again to a number
  • +
  • date: set x to date "3/5/2023" means set it to 2023-03-05
  • +
  • list: set x to {"item1", "item2", "item3"}. The list may comprise of heterogeneous types
  • +
  • record (dictionary): set x to {keyA: "valueA", keyB: valueB} +
  • +
+ +

Examples:

+ +
set var1 to 22.5
+set var2 to 2
+set theResult to var1 + var2  # may be +, -, *, /, mod
+
+ +
set var1 to "hello"
+set var2 to "world"
+set theResult to var1 & space & var2 & "!"  # string concatenation
+
+ +
set var1 to date "12/22/2021"
+set var2 to date "12/24/2021"
+set theResult to var1 + (2 * days)  # may be days, hours, etc
+set theResult to year of var1  # may be year of, month of, day of, etc
+
+ +
set var1 to {"a", 23.9, "c"}
+set var2 to item 2 of var1 as string  # get "23.9", indexed from 1
+
+ +
set var1 to {FirstName: "Jon", LastName: "Voight"}
+set theResult to FirstName of var1 & space & LastName of var1
+
+# First and Last are reserved words, to use them as keys, surround
+# with pipes (`|`), e.g.:
+set var1 to {|First|: "Jon", |Last|: "Voight"}
+set theResult to |First| of var1 & space |Last| of var2
+
+ +

First script

+ +
on run  # explicit on run handler, responding to double clicking
+    delay 2  # delay for 2 seconds
+    activate  # activate the script
+    display dialog "Hello World!"
+end run
+
+on open (theFiles)  # on open handler, responding to drag-n-drop
+    repeat with aFile in theFiles
+        set myText to read file aFiles
+        display dialog myText
+    end repeat
+end open
+
+# to use on idle handler, save it with "Stay open after run handler"
+on idle
+    activate
+    display dialog "Join us soon"
+    return 3  # rerun this block 3 seconds later until manualy quit
+end idle
+
+ +

Commenting

+ +

Block comment:

+ +
(*
+    Version History
+    This is the first build of my script
+*)
+
+ +

in line comment: led by -- or #

+ +

Repeat loops

+ +
on run
+    repeat 3 times
+    end repeat
+
+    repeat with i from 1 to 3
+    end repeat
+
+    set myList to {"Jason", "Joan", "Jack"}
+    repeat with anItem in myList
+        display dialog "Hello " & anItem as string
+    end repeat
+
+    set test to true
+    set i to 1
+    repeat while test = true
+        if i >= 4 then
+            set test to false
+        end if
+        set i tto i + 1
+    end repeat
+end run
+
+ +

Use exit repeat to break out of repeat earlier.

+ +

Conditionals

+ +
on run
+    set x to 6
+    if x = 6 then
+        display dialog "x is 6"
+    else if x = 5 then
+        display dialog "x is 5"
+    else
+        display dialog "x is neither 5 nor 6"
+    end if
+end run
+
+ +

Error handling

+ +
on run
+    try  # ignore quietly errors and break out of try block
+        set myDemo to "Hello"
+        display dialog myTest
+    end try
+
+    try
+        display dialog myTest
+    on error errName
+        display dialog errName
+    end try
+
+    try
+        if myDemo = "Hello" then
+            # this will raise error "message" with errName assigned "message"
+            error "message"
+            # or phrased as
+            # error "message" number -1000
+        end if
+    on error errName number n
+        display dialog errName & return & "with number " & n
+    end try
+end run
+
+ +

Aias, HFS and POSIX

+ +
on run
+    set posixPath to "/Users/user/Desktop/name.jpg"
+
+    # converts a POSIX path to an HFS file reference
+    set hfsFilePathRef to posix file posixPath
+
+    # converts a POSIX path to an HFS file path
+    set hfsFilePath to posix file posixPath as string
+
+    # cannot convert POSIX path to alias directly
+    set aliasExample to hfsPath as alias
+
+    # convert an HFS path to a POSIX path
+    set backToPosix to posix path of hfsFilePath
+end run
+
+ +

Handlers (aka functions)

+ +

A handler is a collection of applescript statements that you give a descriptive name.

+ +
on run
+    set theResult to doMath(8, 2, "+")
+    log theResult
+end run
+
+on doMath(num1, num2, mathFunc)
+    try
+	    if mathFunc = "+" then
+	        return num1 + num2
+	    else if mathFunc = "-" then
+	        return num1 - num2
+	    else if mathFunc = "*" then
+	        return num1 * num2
+	    else if mathFunc = "/" then
+	        return num1 / num2
+	    else
+	        error "You must supply a proper math function"
+	on error e
+	    activate
+	    display dialog (e as string) giving up after 8
+	end try
+end doMath
+
+ +

Note that giving up after N means the dialog will disappear itself if not clicking the dialog button after N seconds.

+ +

Within tell application block, be sure to call custom handler with my keyword. +For example,

+ +
tell application "Numbers"
+    set theResult to my doMath(8, 2, "-")
+end tell
+
+ +

Quit handler

+ +
on run
+    set someCondition to false
+    if someCondition then
+        # tell the script to quit;
+		# this will trigger the `on quit` handler if present
+        quit
+    end if
+end run
+
+on quit
+    # write cleanup actions your script should run before quitting
+    activate
+    display dialog "I quit" giving up after 4
+    # this will quit current script immediately; without this statement
+	# previous `quit` statement will be caught by `on quit` block and
+	# quit won't be performed
+    continue quit
+end quit
+
+ +

Case study: most recent modified file from a folder

+ +
on run
+    set thePath to "/Users/user/Desktop/folder"
+    set newestFile to getNewestFile(thePath)
+    return newestFile
+end run
+
+on getNewestFile(thePath)
+    try
+        set posixPath to my convertPathTo(thePath, "POSIX")
+        set theFile to do shell script "ls -tp " & quoted form of posixPath & " | grep -Ev '/' | head -n1"
+        if theFile is not equal to "" then
+            set theFile to posixPath & "/" & theFile
+        end if
+        return theFile
+    on error e
+        return ""
+    end try
+end getNewestFile
+
+on convertPathTo(inputPath, requestedForm)
+    try
+        set standardPosixPath to POSIX path of inputPath as string
+        if requestedForm contains "posix" then
+            set transformedPath to POSIX path of standardPosixPath as string
+            if transformedPath ends with "/" then
+                set transformedPath to character 1 thru -2 of transformedPath as string
+            end if
+        else if requestedForm contains "alias" then
+            set transformedPath to POSIX file (standardPosixPath) as string
+            try
+                set transformedPath to transformedPath as alias
+            on error
+                error "The file \"" & transformedPath & "\" doesn't exist and can't be returned as \"alias\""
+            end try
+        else if requestedForm contains "hfs" then
+            set transformedPath to POSIX file (standardPosixPath) as string
+        else
+            error "Requested path transformation type was an unexpected type"
+        end if
+        return transformedPath
+    on error e
+        return false
+    end try
+end convertPathTo
+
+ +

Case study: automatically scale images

+ +
on run
+    set filePath to "/Users/user/Desktop/test.png"
+    resizeImageWidth(fielPath, 450)
+end run
+
+on resizeImageWidth(filePath, pxls)
+    try
+        do shell script "sips --resampleWidth " & pxls & space & quoted form of filePath
+        return true
+    on error e
+        # TODO do something with the error
+        return false
+    end try
+end resizeImageWidth
+
+ +

Case study: simple hot folder creation

+ +

A hot folder is a folder where a script monitor whatever is drag-n-dropped to that folder and perform actions on the object.

+ +
on run
+    # any startup activities required to run this script can be done here
+end run
+
+on idle
+    set input to "/Users/user/Desktop/Hot Folder"
+    set output to "/Users/user/Desktop/Results"
+    set errors to "/Users/user/Desktop/Errors"
+
+    set filePaths to getFiles(input)
+    if filePaths is not equal to {} then
+        repeat with filePath in filePaths
+            if resizeImageWidth(filePath as string, output, 450) then
+                removeFile(filePath as string)
+            else
+                do shell script "mv -f " & quoted form of filePaht & space & quoted form of errors
+            end if
+        end repeat
+    end if
+    return 5
+end idle
+
+on convertPathTo(inputPath, requestedForm)
+    try
+        set standardPosixPath to POSIX path of inputPath as string
+        if requestedForm contains "posix" then
+            set transformedPath to POSIX path of standardPosixPath as string
+            if transformedPath ends with "/" then
+                set transformedPath to character 1 thru -2 of transformedPath as string
+            end if
+        else if requestedForm contains "alias" then
+            set transformedPath to POSIX file (standardPosixPath) as string
+            try
+                set transformedPath to transformedPath as alias
+            on error
+                error "The file \"" & transformedPath & "\" doesn't exist and can't be returned as \"alias\""
+            end try
+        else if requestedForm contains "hfs" then
+            set transformedPath to POSIX file (standardPosixPath) as string
+        else
+            error "Requested path transformation type was an unexpected type"
+        end if
+        return transformedPath
+    on error e
+        return false
+    end try
+end convertPathTo
+
+on stringToList(inputString, theDelimiter)
+    try
+        set tid to AppleScript's text item delimters
+        set AppleScript's text item delimiters to theDelimiter
+        set theList to text items of (inputString as string)
+        set AppleScript's text item delimiters to tid
+        return theList
+    on error e
+        set AppleScript's text item delimiters to tid
+        return {}
+    end try
+end stringToList
+
+on resizeImageWidth(filePath, output, pxls)
+    try
+        do shell script "sips --resampleWidth " & pxls & space & quoted form of filePath & " -o " & qutoed form of output
+        return true
+    on error e
+        return false
+    end try
+end resizeImageWidth
+
+on getFiles(thePath)
+    try
+        set posixPath to my convertPathTo(thePath, "posix")
+        set theFiles to do shell script "find " & quoted form of posixPath & " -type f ! -name \".*\""
+        if theFiles is not equal to "" then
+            set fileList to stringToList(theFiles, return)
+        end if
+        return fileList
+    on error e
+        # log the error here at some point
+        return {}
+    end try
+end getFiles
+
+on removeFile(theFile)
+    try
+        set posixPath to convertPathTo(theFile, "posix")
+        do shell script "rm -f " & quoted form of posixPath
+	    return true
+    on error e
+        return false
+    end try
+end removeFile
+
+ +
+ +
+ +
+
+ + + diff --git a/2023/03/27/pizzahut-free-soda.html b/2023/03/27/pizzahut-free-soda.html new file mode 100644 index 000000000..aee52f052 --- /dev/null +++ b/2023/03/27/pizzahut-free-soda.html @@ -0,0 +1,112 @@ + + + + + + + + +必胜客餐厅隐藏福利 – 苏打水 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

必胜客餐厅隐藏福利 -- 苏打水

+ misc + +
+ +
+

今天去吃必胜客,邻座客人管服务员要了一杯苏打水,才刚刚知道必胜客有免费苏打水提供。在百度搜了一圈似乎鲜有帖子提到这点,看来知道的人不多,邻座客人也是前必胜客服务员,才知道有苏打水的。

+ +

所谓苏打水,这里指的是冰的碳酸氢钠溶液(应该是吧?),里面有两片柠檬促使其放出二氧化碳。喝着有点酸,但不是柠檬的酸味;有一点辣,但也不是汽水的感觉。喝着味道不错,但回味不是很好。尽管单独喝不是很合我口味,我发现它和饭一起喝非常解腻,非常好喝。苏打水是免费的,但不默认提供,菜单上也没有,需要向服务员要,可以说是隐藏福利了。

+ +

希望这篇文章能让更多的人知道它的存在。

+ +

苏打水

+ +
+ +
+ +
+
+ + + diff --git a/2023/04/26/how-to-decide-the-type-of-a-pokemon-quickly.html b/2023/04/26/how-to-decide-the-type-of-a-pokemon-quickly.html new file mode 100644 index 000000000..6fd41cc5f --- /dev/null +++ b/2023/04/26/how-to-decide-the-type-of-a-pokemon-quickly.html @@ -0,0 +1,189 @@ + + + + + + + + +如何尽可能快地确定宝可梦属性 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

如何尽可能快地确定宝可梦属性

+ misc + +
+ +
+

确定宝可梦属性的方法

+ +

可用宝可梦对攻击的反应确定宝可梦的属性. +单一属性宝可梦对单一属性攻击的反应有以下四种: 无效, 抵抗, 一般, 有效; 可用乘数 0, 1/2, 1, 2 表示. +双属性宝可梦对单一属性攻击的反应为以上四个乘数的两两相乘的结果, 分别为 0, 1/4, 1/2, 1, 2, 4, 即无效, 非常抵抗, 抵抗, 一般, 有效, 非常有效. +用乘法表示属性乘数的叠加不是很方便, 故对乘数取底数为 2 的对数, 变为 $-\infty$, -2, -1, 0, 1, 2 六种反应, 下文会使这样操作方便的原因变得显而易见.

+ +

数学表示

+ +

给定属性克制矩阵 $\mathbf A$, 其中第 $i$ 行第 $j$ 列的元素 $a_{ij} \in \{-\infty, -2, -1, 0, 1, 2\}$ 表示单一属性为 $j$ 的宝可梦对属性为 $i$ 的攻击的对数抵抗乘数. +因为一共有 18 种属性, 所以 $\mathbf A$ 的维度为 $18 \times 18$. +使用 one-hot encoding 以及其加性叠加表示宝可梦的单一及双属性, 向量为 18 维; +例如 $(0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)^\top$ 表示宝可梦具有第 2 种和第 4 种属性, 为双属性宝可梦. +令宝可梦属性矩阵 $\mathbf Q$ 的每一列表示一种宝可梦属性的枚举; 由于所有单一属性和双属性的个数为 $18 + \binom{18}{2} = 171$, 故其维度为 $18 \times 171$. +令 $18 \times 18$ 单位矩阵 $\mathbf I$ 的第 $i$ 列为 $\boldsymbol e_i$.

+ +

由于双属性宝可梦对攻击的对数抵抗乘数的叠加是加性的, 因此可用矩阵乘法自然地表示. +例如,

+ +\[\boldsymbol r = \boldsymbol e_i^\top \mathbf A \mathbf Q\] + +

表示在第 $i$ 种属性的攻击下各属性宝可梦的对数抵抗乘数. +如果各乘数在 $\boldsymbol r$ 中是唯一的, 那么便可唯一地确定宝可梦的属性. +即使只有一个元素子集中的乘数唯一, 也能排除掉这些宝可梦属性, 以便进一步确定.

+ +

确定宝可梦属性的算法

+ +

令 $s$ 为 $1,\dots,18$ 的一个排列, 使得第 $j$ 次尝试使用属性为 $s(j)$ 的攻击. +确定宝可梦属性的解即为形似 $s$ 的一个排列. +显然, 暴力枚举具有 $O(n!)$ 复杂度, 不可行. +我们可使用贪心策略确定宝可梦属性.

+ +

初始化剩余宝可梦属性矩阵 $\mathbf Q^{(0)} = \mathbf Q$, 已尝试过的攻击属性集合 $T^{(0)} = \varnothing$, 已确定的攻击序列为 $s^{(0)} = ()$. +假设在第 $k$ 次尝试前, 剩余宝可梦属性矩阵为 $\mathbf Q^{(k-1)}$, 其为原宝可梦属性矩阵 $\mathbf Q$ 的列的子集; 已尝试过的攻击属性集合为 $T^{(k-1)}$, 其元素属于 $T = \{1,\dots,18\}$; 已确定的攻击序列为 $s^{(k-1)}$. +如果 $\mathbf Q^{(k-1)}$ 的列数为零, 算法结束. +否则, $\forall i \in T \,\backslash\, T^{(k)}$, 计算 $\boldsymbol r_i = \boldsymbol e_i^\top \mathbf A \mathbf Q^{(k)}$, 使得 $\boldsymbol r_i$ 中的重复元素数目最小, 令所对应的 $i$ 为 $i^\ast$. +令 $s^{(k)} = s^{(k-1)} \cup i^\ast$, $T^{(k)} = T^{(k-1)} \cup \{i^\ast\}$, $Q^{(k)}$等于去掉在 $\boldsymbol r_{i^\ast}$ 中元素唯一的列的 $Q^{(k-1)}$.

+ +

算法实现时可用一足够小的负数, 例如 -20, 表示 $-\infty$, 然后在 $\boldsymbol r$ 中把所有足够小的数重置为 -20, 以模拟 $-\infty$ 加减任何数 (注意我们的 operand 集合) 都为其本身.

+ +

Python 实现:

+ +
import numpy as np
+
+n = 18
+
+def calc_r(i, A, Q):
+    r = np.eye(n, dtype=int)[i:i+1].dot(A).dot(Q)
+    r[r < -5] = -20
+    return r
+
+def shrink_Q(r, Q):
+    _, v, c = np.unique(r, return_inverse=True, return_counts=True, axis=1)
+    return Q[:, c[v] > 1]
+
+A = ...
+Q = ...
+
+def greedy():
+    s = []
+    T = set(range(n))
+    while Q.shape[1] > 0:
+        best_i = None
+        min_next_Q = Q
+        for i in T:
+            next_Q = shrink_Q(calc_r(i, A, Q), Q)
+            if next_Q.shape[1] < min_next_Q.shape[1]:
+                min_next_Q = next_Q
+                best_i = i
+        Q = min_next_Q
+        s.append(best_i)
+        T.remove(best_i)
+    return s
+
+ +

原问题的扩展

+ +

通过对算法简单的扩展, 还能回答以下问题:

+ +
    +
  • 已知宝可梦具有某种属性, 想确定其是否具有第二属性, 如果有, 是什么属性: 通过移除矩阵 $\mathbf Q$ 的相应列解决
  • +
  • 希望只使用具有某些属性的攻击确定宝可梦的属性: 通过移除集合 $T$ 的相应元素解决
  • +
+ +
+ +
+ +
+
+ + + diff --git a/2023/07/05/connect-to-jupyter-notebook-on-wsl2-from-another-machine-within-wlan.html b/2023/07/05/connect-to-jupyter-notebook-on-wsl2-from-another-machine-within-wlan.html new file mode 100644 index 000000000..13f5231f7 --- /dev/null +++ b/2023/07/05/connect-to-jupyter-notebook-on-wsl2-from-another-machine-within-wlan.html @@ -0,0 +1,140 @@ + + + + + + + + +在 WLAN 下从另一台计算机连接到 WSL2 中的 Jupyter Notebook | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

在 WLAN 下从另一台计算机连接到 WSL2 中的 Jupyter Notebook

+ dev/network + +
+ +
+

摘要

+ +

上一篇文章介绍了如何从本地连接到同一 WLAN 下的另一台 Windows 计算机中的 WSL2 实例. +这篇文章进一步介绍如何连接到该 WSL2 实例中运行的 Jupyter Notebook.

+ +

原理

+ +
    +
  1. 如上一篇文章所述建立由本地到 Windows (IP 地址本文为 192.168.0.105, 用户名本文为 ubuntu) 的 SSH 连接 (端口本文为 4000)
  2. +
  3. 在 WSL2 实例的端口 8890 运行无浏览器的 Jupyter Notebook
  4. +
  5. 在本地建立将本地端口 (本文为 8889) 转发到远程 Windows 端口 8890 的 SSH 隧道
  6. +
  7. 在本地 localhost:8889 访问远程 Jupyter Notebook
  8. +
+ +

具体流程

+ +

运行 Jupyter Notebook

+ +

以下命令在 WSL2 的终端中执行

+ +
jupyter notebook --no-browser --port 8890
+
+ +

建立 SSH 隧道

+ +

以下命令在本地终端执行

+ +
ssh -p 4000 -NL 8889\:localhost\:8890 ubuntu@192.168.0.105
+
+ +

参考

+ + + +
+ +
+ +
+
+ + + diff --git a/2023/07/05/connect-to-wsl2-from-another-machine-within-wlan.html b/2023/07/05/connect-to-wsl2-from-another-machine-within-wlan.html new file mode 100644 index 000000000..74c930516 --- /dev/null +++ b/2023/07/05/connect-to-wsl2-from-another-machine-within-wlan.html @@ -0,0 +1,265 @@ + + + + + + + + +从另一台计算机 SSH 连接到 WSL2 服务器 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

从另一台计算机 SSH 连接到 WSL2 服务器

+ dev/network + +
+ +
+

原理

+ +
    +
  1. 在 Windows 上设置防火墙允许接入端口 (本文暂定为 4000)
  2. +
  3. 在 Windows 上设置端口转发从 4000 至 WSL2 的 IP 地址的端口 22 (即 SSH 的默认端口)
  4. +
  5. 在 WSL2 中设置 SSH 服务器, 监听其端口 22
  6. +
  7. 从另一台计算机 SSH 到 Windows 的 IP 地址的端口 4000
  8. +
+ +

具体流程

+ +

设置 WSL2 中的 SSH 服务器

+ +

以下命令在 WSL2 的终端中执行

+ +

安装 openssh-server:

+ +
sudo apt update
+sudo apt install openssh-server
+
+ +

设置开机启动 systemd. +方法是在 /etc/wsl.conf 中写入:

+ +
[boot]
+systemd=true
+
+ +

可以直接用 vim, nano 等编辑, 也可

+ +
{ echo '[boot]'; echo 'systemd=true'; } | sudo tee /etc/wsl.conf
+
+ +

但注意不要覆盖已存在的 /etc/wsl.conf 文件.

+ +

以下命令在 Windows 终端中执行 (可能需要管理员权限)

+ +

关闭 WSL2:

+ +
wsl --shutdown
+
+ +

以下命令在 WSL2 的终端中执行

+ +

然后打开 WSL2, 执行

+ +
sudo service ssh status
+
+ +

如果输出中包含 “Active: active (running)”, 说明 SSH 服务器安装成功. +否则, 可以尝试以下命令手动开始 ssh 服务.

+ +
sudo service ssh start
+
+ +

设置 Windows 防火墙以允许从其它计算机接入端口 (例如 4000)

+ +

以下命令在 Windows 终端中执行 (需要管理员权限)

+ +
netsh advfirewall firewall add rule name="WSL SSH" dir=in action=allow protocol=TCP localport=4000
+
+ +

其中 name="WSL SSH" 部分的名字可任选. +如果输出为 “确定” (或其它 locale 下的同等含义的输出), 说明设置成功. +日后若想删除可以去控制面板的 “高级安全 Windows Defender 防火墙” 的 “入站规则” 中查看/编辑/删除.

+ +

设置 Windows 的端口转发

+ +

以下命令在 Windows 终端中执行 (需要管理员权限)

+ +

查看 WSL2 的 IP 地址:

+ +
wsl hostname -I
+
+ +

本文假设该 IP 地址为 172.21.199.198. +旧版本的 wsl 可能会返回两个 IP 地址, 此时选择第一个.

+ +

设置从 4000 (见上文) 到 22 的端口转发:

+ +
netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=4000 connectaddress=172.21.199.198 connectport=22
+
+ +

确定设置成功:

+ +
netsh interface portproxy show v4tov4
+
+ +

查看 Windows 的 IP 地址和 WSL 的用户名

+ +

以下命令在 Windows 终端中执行

+ +
ipconfig
+
+ +

在 “无线局域网适配器 WLAN:” 一节中可见 “IPv4 地址”, 本文为 192.168.0.105.

+ +

以下命令在 WSL2 的终端中执行

+ +
echo "$USER"
+
+ +

可得 WSL 用户名, 本文为 ubuntu.

+ +

从另一台计算机 SSH 接入

+ +
ssh -p 4000 ubuntu@192.168.0.105
+
+ +

会提示输入密码, 此时输入 WSL2 的密码即可.

+ +

免密码登录 (适用于 macOS)

+ +

以下命令在 macOS 终端中执行

+ +
ssh-copy-id -p 4000 ubuntu@192.168.0.105
+
+ +

按提示确认并输入密码. +然后打开 ~/.ssh/config, 并输入以下内容

+ +
Host my-wsl
+  User ubuntu
+  Port 4000
+  HostName 192.168.0.105
+  IdentityFile ~/.ssh/id_rsa
+  UseKeychain yes
+
+ +

其中 Host my-wsl 处的名字随意. +UseKeychain yes 是免密码的关键所在. +ubuntu, 4000, 192.168.0.105 三个值的选用见上文.

+ +

然后就可以

+ +
ssh my-wsl
+
+ +

登录 Windows 的 WSL2 了.

+ +

脚本

+ +

自动更新 Windows 的端口转发

+ +

WSL 的 IP 地址可能会变化, 因此每次重启 Windows 后可能需要更新端口转发规则. +Powershell 脚本:

+ +
$wsl_ip = wsl hostname -I
+netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=4000 connectaddress=$wsl_ip connectport=22
+
+ +

运行时需要管理员权限.

+ +

参考

+ + + +
+ +
+ +
+
+ + + diff --git a/2023/08/05/compute-svm-intercept.html b/2023/08/05/compute-svm-intercept.html new file mode 100644 index 000000000..931dc59d6 --- /dev/null +++ b/2023/08/05/compute-svm-intercept.html @@ -0,0 +1,376 @@ + + + + + + + + +How to compute the intercept of C-SVM in primal and dual formulations | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

How to compute the intercept of C-SVM in primal and dual formulations

+ machine learning/svm + +
+ +
+

Compute intercept in primal formulation

+ +

The primal SVM problem is:

+ +\[\min_{\boldsymbol w,b,\boldsymbol\xi} \frac{1}{2}\boldsymbol w^\top\boldsymbol w+C\sum_{i=1}^m\xi_i;\quad\text{s.t. }\ y_i f(\boldsymbol x_i) \ge 1-\xi_i,\ \xi_i \ge 0 \,,\tag{1}\] + +

where the decision function $f(\boldsymbol x) \equiv \boldsymbol w^\top\phi(\boldsymbol x) + b$. +The Lagrangian is:

+ +\[L(\boldsymbol w,b,\boldsymbol\xi,\boldsymbol\alpha,\boldsymbol\mu) = \frac{1}{2}\boldsymbol w^\top\boldsymbol w + C\sum_{i=1}^m\xi_i + \sum_{i=1}^m\alpha_i\big(1-\xi_i-y_i f(\boldsymbol x_i)\big) - \sum_{i=1}^m\mu_i\xi_i\,,\] + +

where $\alpha_i \ge 0$, $\mu_i \ge 0$. +The Karush-Kuhn-Tucker (KKT) conditions are:

+ +\[\begin{cases} +\boldsymbol w=\sum_{i=1}^m\alpha_i y_i \phi(\boldsymbol x_i) &\text{(stationarity)}\\ +0=\sum_{i=1}^m\alpha_i y_i &\text{(stationarity)}\\ +C=\alpha_i+\mu_i &\text{(stationarity)}\\ +0=\alpha_i(y_i f(\boldsymbol x_i)-1+\xi_i) &\text{(complementary)}\\ +0=\mu_i\xi_i &\text{(complementary)}\\ +y_i f(\boldsymbol x_i)-1+\xi_i \ge 0 &\text{(primal feasibility)}\\ +\xi_i \ge 0 &\text{(primal feasibility)}\\ +\alpha_i \ge 0 &\text{(dual feasibility)}\\ +\mu_i \ge 0 &\text{(dual feasibility)}\\ +\end{cases}\,.\] + +

Thus, we have

+ +\[\begin{cases} +y_i f(\boldsymbol x_i) \ge 1 &(\alpha_i=0)\\ +y_i f(\boldsymbol x_i) \le 1 &(\alpha_i=C)\\ +y_i f(\boldsymbol x_i) = 1 &(\text{otherwise})\\ +\end{cases}\,.\tag{2}\] + +

When $S=\{j \mid 0 < \alpha_j < C\} \neq \varnothing$, for each such $j$,

+ +\[\begin{aligned} +y_j (\boldsymbol w^\top\phi(\boldsymbol x_j)+b) &= 1\\ +b &= y_j - \boldsymbol w^\top\phi(\boldsymbol x_j)\,;\\ +\end{aligned}\] + +

The second equality holds since $y_j = \pm 1$. +For numerical stability, we take the mean of all $b$’s as the final value of the intercept:

+ +\[b = \frac{1}{\\|S\\|}\sum_{j \in S} (y_j-\boldsymbol w^\top\phi(\boldsymbol x_j))\,.\] + +

When $S=\varnothing$, taking the first two cases of Equation $(2)$, it follows that

+ +\[\begin{cases} +f(\boldsymbol x_i) \ge 1 &(\alpha_i=0,y_i=1)\\ +f(\boldsymbol x_i) \le -1 &(\alpha_i=0,y_i=-1)\\ +f(\boldsymbol x_i) \le 1 &(\alpha_i=C,y_i=1)\\ +f(\boldsymbol x_i) \ge -1 &(\alpha_i=C,y_i=-1)\\ +\end{cases}\,.\] + +

Equivalently, we have

+ +\[\max_{j \in T_1}\{y_j - \boldsymbol w^\top\phi(\boldsymbol x_j)\} \le b \le \min_{j \in T_2}\{y_j - \boldsymbol w^\top\phi(\boldsymbol x_j)\}\,,\] + +

where

+ +\[\begin{cases} +T_1 = \{j \mid \alpha_j=0,y_j=1\text{ or }\alpha_j=C,y_j=-1\}\\ +T_2 = \{j \mid \alpha_j=0,y_j=-1\text{ or }\alpha_j=C,y_j=1\}\\ +\end{cases}\,,\] + +

The intercept is taken as the mean of the lower and upper bounds.

+ +

To compute $\boldsymbol w^\top\phi(\boldsymbol x)$ in above equations, simply plug in $\boldsymbol w=\sum_{i=1}^m\alpha_i y_i \phi(\boldsymbol x_i)$ and compute the $\phi(\boldsymbol x_i)^\top\phi(\boldsymbol x)$ with the underlying kernel function $\kappa(\boldsymbol x_i,\boldsymbol x)$.

+ +

Compute the intercept in dual formulation

+ +
+

Reference: Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.

+
+ +

The dual SVM problem is:

+ +\[\min_{\boldsymbol\alpha}\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_j y_i y_j \phi(\boldsymbol x_i)^\top\phi(\boldsymbol x_j)-\sum_{i=1}^m\alpha_i\,;\quad\text{s.t. }\sum_{i=1}^m\alpha_i y_i=0,\ 0 \le \alpha_i \le C\,.\tag{3}\] + +

The Lagrangian is:

+ +\[\hat L(\boldsymbol\alpha,\beta,\boldsymbol\lambda,\boldsymbol\nu) = \frac{1}{2}\boldsymbol\alpha^\top\mathbf Q\boldsymbol\alpha - \boldsymbol\alpha^\top\mathbf 1+\beta\boldsymbol\alpha^\top\boldsymbol y-\boldsymbol\alpha^\top\boldsymbol\lambda+(\boldsymbol\alpha-C\mathbf 1)^\top\boldsymbol\nu\,,\tag{4}\] + +

where $\lambda_i \ge 0$, $\nu_i \ge 0$, $\mathbf 1$ is an all-$1$ vector, $\mathbf Q$ is an $m \times m$ matrix such that $Q_{ij} = y_i\phi(\boldsymbol x_i)^\top\phi(\boldsymbol x_j)y_j$, and $\beta$ is actually the intercept. +We’ll assume it for now, and reveal why it is in the end. +The KKT conditions are:

+ +\[\begin{cases} +\mathbf Q\boldsymbol\alpha=1-\beta\boldsymbol y+\boldsymbol\lambda-\boldsymbol\nu &\text{(stationarity)}\\ +\lambda_i\alpha_i = 0 &\text{(complementary)}\\ +\nu_i(C-\alpha_i) = 0 &\text{(complementary)}\\ +\boldsymbol\alpha^\top\boldsymbol y = 0 &\text{(primal feasibility)}\\ +0 \le \alpha_i \le C &\text{(primal feasibility)}\\ +\lambda_i \ge 0 &\text{(dual feasibility)}\\ +\nu_i \ge 0 &\text{(dual feasibility)}\\ +\end{cases}\,.\tag{5}\] + +

Thus, we have

+ +\[\begin{cases} +(\mathbf Q\boldsymbol\alpha)_i \ge 1 - \beta y_i &(\alpha_i=0)\\ +(\mathbf Q\boldsymbol\alpha)_i \le 1 - \beta y_i &(\alpha_i=C)\\ +(\mathbf Q\boldsymbol\alpha)_i = 1 - \beta y_i &(\text{otherwise})\\ +\end{cases}\,.\tag{6}\] + +

where $(\mathbf Q\boldsymbol\alpha)_i$ is the $i$th element of the vector $\mathbf Q\boldsymbol\alpha$. +When $S=\{j \mid 0 < \alpha_j < C\} \neq \varnothing$, for each such $j$,

+ +\[\beta = y_j(1 - (\mathbf Q\boldsymbol\alpha)_j)\,;\] + +

which holds since $y_j = \pm 1$. +For numerical stability, we take the mean of all $\beta$’s as the final value of the intercept:

+ +\[\beta = \frac{1}{\\|S\\|}\sum_{j \in S} y_j (1 - (\mathbf Q\boldsymbol\alpha)_j)\,.\] + +

When $S=\varnothing$, taking the first two cases of Equation $(6)$, it follows that

+ +\[\begin{cases} +\beta \ge 1-(\mathbf Q\boldsymbol\alpha)_i &(\alpha_i=0,y_i=1)\\ +\beta \le -(1-(\mathbf Q\boldsymbol\alpha)_i) &(\alpha_i=0,y_i=-1)\\ +\beta \le 1-(\mathbf Q\boldsymbol\alpha)_i &(\alpha_i=C,y_i=1)\\ +\beta \ge -(1-(\mathbf Q\boldsymbol\alpha)_i) &(\alpha_i=C,y_i=-1)\\ +\end{cases}\,.\] + +

Equivalently, we have

+ +\[\max_{j \in T_1}\{y_j(1-(\mathbf Q\boldsymbol\alpha)_j)\} \le \beta \le \min_{j \in T_2}\{y_j(1-(\mathbf Q\boldsymbol\alpha)_j)\}\,,\] + +

where

+ +\[\begin{cases} +T_1 = \{j \mid \alpha_j=0,y_j=1\text{ or }\alpha_j=C,y_j=-1\}\\ +T_2 = \{j \mid \alpha_j=0,y_j=-1\text{ or }\alpha_j=C,y_j=1\}\\ +\end{cases}\,,\] + +

The intercept is taken as the mean of the lower and upper bounds.

+ +

$\beta$ is the intercept

+ +

To show that $\beta$ is in fact the intercept in primal problem, we go further from Equation $(4)$, plugging in the stationarity conditions of Equation $(5)$, and it follows that

+ +\[\hat L(\boldsymbol\alpha,\beta,\boldsymbol\lambda,\boldsymbol\nu) = -\frac{1}{2}\boldsymbol\alpha^\top\mathbf Q\boldsymbol\alpha-C\mathbf 1^\top\boldsymbol\nu\,,\] + +

where

+ +\[\boldsymbol\alpha=\mathbf Q^{-1}(1-\beta\boldsymbol y+\boldsymbol\lambda-\boldsymbol\nu)\,.\] + +

assuming the inverse of $\mathbf Q$ exists. +Due to the structure of $\mathbf Q$, there exists a unique matrix $\mathbf Q^\frac{1}{2}$:

+ +\[Q^\frac{1}{2} = +\begin{pmatrix} +y_1\phi(\boldsymbol x_1) & \dots & y_m\phi(\boldsymbol x_m)\\ +\end{pmatrix}\] + +

such that $\mathbf Q=(\mathbf Q^\frac{1}{2})^\top\mathbf Q^\frac{1}{2}$. +Let $\boldsymbol w \triangleq \mathbf Q^{-\frac{1}{2}}(1-\beta\boldsymbol y+\boldsymbol\lambda-\boldsymbol\nu)=\mathbf Q^\frac{1}{2}\boldsymbol\alpha$. +The stationarity condition of Equation $(5)$ can be rewritten as:

+ +\[\begin{aligned} +\mathbf Q\boldsymbol\alpha &= 1-\beta\boldsymbol y+\boldsymbol\lambda-\boldsymbol\nu\\ +(\mathbf Q^\frac{1}{2})^\top\boldsymbol w &= 1-\beta\boldsymbol y+\boldsymbol\lambda-\boldsymbol\nu\\ +y_i\phi(\boldsymbol x_i)^\top\boldsymbol w+\beta y_i &\ge 1-\nu_i\quad\forall 1 \le i \le m\\ +y_i(\phi(\boldsymbol x_i)^\top\boldsymbol w+\beta) &\ge 1-\nu_i\\ +\end{aligned}\] + +

Therefore, we have the dual of the dual problem as:

+ +\[\max_{\boldsymbol w,\beta,\boldsymbol\nu}-\frac{1}{2}\boldsymbol w^\top\boldsymbol w-C\mathbf 1^\top\boldsymbol\nu\,;\quad\text{s.t. }y_i(\phi(\boldsymbol x_i)^\top\boldsymbol w+\beta) \ge 1-\nu_i,\ \nu_i \ge 0\,.\] + +

Clearly, $\beta$ is the intercept, and $\nu_i$ is the slack variable $\xi_i$ bounded to each sample in the dataset.

+ +

Show that the two apporaches are equivalent

+ +

Recall that in primal and dual formulations,

+ +\[\begin{aligned} +b &= y_j - \sum_{i=1}^m\alpha_i y_i \phi(\boldsymbol x_i)^\top\phi(\boldsymbol x_j) &\text{(primal formulation)}\\ +b &= y_j (1-(\mathbf Q\boldsymbol\alpha)_j) &\text{(dual formulation)}\\ +\end{aligned}\] + +

If we plug in the definitions of $\boldsymbol w$ and $\mathbf Q$, it follows that

+ +\[\begin{aligned} +b &= y_j - \sum_{i=1}^m \alpha_i y_i \phi(\boldsymbol x_i)^\top\phi(\boldsymbol x_j)\\ +b &= y_j (1 - y_j\sum_{i=1}^m \alpha_i y_i \phi(\boldsymbol x_i)^\top\phi(\boldsymbol x_j))\\ +\end{aligned}\] + +

But $y_j^2=1$. +Therefore, it can be easily shown that the two equations are the same.

+ +

Verify the conclusion by experiment

+ +

We will need numpy and scikit-learn to perform the experiment.

+ +

Get to know SVC class in scikit-learn here. +In summary, given a classifier clf = SVC(...).fit(X, y),

+ +
    +
  • +clf.dual_coef_ holds the product $y_i \alpha_i$ for each $\alpha_i > 0$;
  • +
  • +clf.support_vector_ holds the support vectors of shape (n_SV, n_feature) where n_SV is the number of support vectors;
  • +
  • +clf.intercept_ holds the intercept term.
  • +
+ +

In addition,

+ +
    +
  • +clf.coef_ holds the $\boldsymbol w$ in primal problem. We will use it for convenience below (linear kernel).
  • +
+ +

Codes:

+ +
import numpy as np
+from sklearn.svm import SVC
+from sklearn.datasets import load_iris
+
+
+X, y = load_iris(return_X_y=True)
+# Restrict the classification problem to two-class;
+# otherwise, the problem will become unnecessarily complex.
+i = (y == 0) | (y == 2)
+X, y = X[i], y[i]
+# Make y take values {0, 1} rather than {0, 2}.
+y //= 2
+
+clf = SVC(kernel='linear', random_state=123)
+clf.fit(X, y)
+# The y for support vectors.
+# The `*2-1` operation is used to make it pick the values {1, -1}
+# rather than {1, 0}.
+y_supp = y[clf.support_] * 2 - 1
+# The filter that removes upper bounded alpha's.
+S = np.ravel(np.abs(clf.dual_coef_)) < 1
+
+# Verify that the `clf.coef_` is indeed computed from `clf.dual_coef_`.
+# We'll use `clf.coef_` for convenience below.
+assert np.allclose(
+    np.ravel(clf.coef_),
+    np.sum(np.ravel(clf.dual_coef_) * clf.support_vectors_.T, axis=1))
+# The intercept estimations in primal formulation. Only support vectors are
+# required, since otherwise the dual coefficients will be zero and won't count
+# any.
+b_estimates_primal = y_supp[S] - np.dot(clf.support_vectors_[S], np.ravel(clf.coef_))
+### Verify that the mean of the estimations is indeed the intercept. ###
+assert np.allclose(np.mean(b_estimates_primal), clf.intercept_)
+
+# The kernel matrix.
+K = np.dot(clf.support_vectors_, clf.support_vectors_.T)
+# The Q matrix times alpha. Notice that when computing Q, only support vectors
+# are required for the same reason as above.
+Q_alpha = np.sum(np.ravel(clf.dual_coef_)[:, np.newaxis] * K, axis=0) * y_supp
+# The intercept estimations in dual formulation.
+b_estimates_dual = y_supp[S] * (1 - Q_alpha[S])
+### Verify that the mean of the estimations is indeed the intercept. ###
+assert np.allclose(clf.intercept_, np.mean(b_estimates_dual))
+
+ +

The following has been mentioned in the comment above, but I feel it necessary to redeclare them formally here: +Recall that $\boldsymbol w = \sum_{i=1}^m\alpha_i y_i \phi(\boldsymbol x_i)$, and all $m$ $\alpha$’s are involved when computing $\mathbf Q\boldsymbol\alpha$. +In fact, only those $i$ such that $\alpha_i > 0$ (corresponding to the support vectors) are necessary. +That’s why we are able to find $\boldsymbol w$ and $\mathbf Q\boldsymbol\alpha$ even if scikit-learn stores only data related to support vectors.

+ +

Caveat: +I find it quite hard to construct an example where there’s no free $\alpha$’s (i.e. those $\alpha_i$ such that $0 < \alpha_i < C$) at all. +So strictly speaking, such edge case is not verified empirically in this post.

+ +
+ +
+ +
+
+ + + diff --git a/2023/08/05/dual-of-dual-of-qp.html b/2023/08/05/dual-of-dual-of-qp.html new file mode 100644 index 000000000..1e9e39458 --- /dev/null +++ b/2023/08/05/dual-of-dual-of-qp.html @@ -0,0 +1,172 @@ + + + + + + + + +The dual of the dual of a QP is itself | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

The dual of the dual of a QP is itself

+ math/linear algebra + +
+ +
+

Given a Quadratic Program (QP), we will show that the dual of the dual of the QP is itself.

+ +

Let the QP be in its standard form:

+ +\[\min_{\boldsymbol x}\frac{1}{2}\boldsymbol x^\top\mathbf Q\boldsymbol x + \boldsymbol p^\top\boldsymbol x\,; +\quad\text{s.t. }\mathbf A\boldsymbol x=\boldsymbol b,\ x_i \ge 0\,,\] + +

where $\mathbf Q \succ 0$ is positive definite. +The Lagrangian is

+ +\[L(\boldsymbol x,\boldsymbol\lambda,\boldsymbol\mu) = \frac{1}{2}\boldsymbol x^\top\mathbf Q\boldsymbol x + \boldsymbol p^\top\boldsymbol x + \boldsymbol\lambda^\top(\mathbf A\boldsymbol x-\boldsymbol b)-\boldsymbol\mu^\top\boldsymbol x\,,\tag{1}\] + +

where $\mu_i \ge 0$. +Since $\mathbf Q \succ 0$, we may find the minimum of the $(1)$ with respect to $\boldsymbol x$ by driving $\partial L/\partial\boldsymbol x$ to $\mathbf 0$:

+ +\[\mathbf Q\boldsymbol x + p + \mathbf A^\top\boldsymbol\lambda - \boldsymbol\mu = \mathbf 0\,,\] + +

and it follows that

+ +\[\boldsymbol x = \mathbf Q^{-1}(-\mathbf A^\top\boldsymbol\lambda - \boldsymbol p + \boldsymbol\mu)\,.\] + +

Therefore, the dual formulation of the QP is:

+ +\[\max_{\boldsymbol\lambda,\boldsymbol\mu}-\frac{1}{2}(\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu)^\top\mathbf Q^{-1}(\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu)-\boldsymbol\lambda^\top\boldsymbol b\,;\quad\text{s.t. }\mu_i \ge 0\,.\tag{2}\] + +

Now we will find the dual of the dual formulation. +First make Equation $(2)$ a minimization, and find its Lagrangian:

+ +\[\hat L(\boldsymbol\lambda,\boldsymbol\mu,\boldsymbol y) = \frac{1}{2}(\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu)^\top\mathbf Q^{-1}(\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu) + \boldsymbol\lambda^\top\boldsymbol b - \boldsymbol y^\top\boldsymbol\mu\,,\tag{3}\] + +

where $y_i \ge 0$. +Since $\mathbf Q^{-1} \succ 0$, we may also find its minimum with respect to $\boldsymbol\lambda$ and $\boldsymbol\mu$ by driving corresponding partial derivatives to zero:

+ +\[\begin{align} +\frac{\partial\hat L}{\partial\boldsymbol\lambda} &= \frac{1}{2}(2\mathbf A\mathbf Q^{-1}\mathbf A^\top\boldsymbol\lambda+\mathbf A\mathbf Q^{-1}\boldsymbol p-\mathbf A\mathbf Q^{-1}\boldsymbol\mu+\mathbf A\mathbf Q^{-\top}\boldsymbol p-\mathbf A\mathbf Q^{-\top}\boldsymbol\mu)+\boldsymbol b = 0\,,\tag{4.1}\\ +\frac{\partial\hat L}{\partial\boldsymbol\mu} &= \frac{1}{2}(-\mathbf Q^{-\top}\mathbf A^\top\boldsymbol\lambda-\mathbf Q^{-\top}\boldsymbol p-\mathbf Q^{-1}\mathbf A^\top\boldsymbol\lambda-\mathbf Q^{-1}\boldsymbol p+2\mathbf Q^{-1}\boldsymbol\mu) - \boldsymbol y = 0\,,\tag{4.2}\\ +\end{align}\] + +

where $\mathbf Q^{-\top} \equiv (\mathbf Q^{-1})^\top$. +Left-multiplying $(4.2)$ by $\mathbf A$ and adding it to $(4.1)$ yields

+ +\[\mathbf A\boldsymbol y=\boldsymbol b\,.\tag{5}\] + +

This holds since positive definite matrices are symmetric. +It follows from $(4.2)$ that

+ +\[-\mathbf Q\boldsymbol y = \mathbf A^\top\boldsymbol\lambda-\boldsymbol\mu+\boldsymbol p\,.\tag{6.1}\] + +

or

+ +\[\boldsymbol y = -\mathbf Q^{-1}(\mathbf A^\top\boldsymbol\lambda-\boldsymbol\mu+\boldsymbol p)\,.\tag{6.2}\] + +

Plugging $(6.1)$ and $(6.2)$ back to $(3)$ gives

+ +\[\begin{aligned} +\hat L(\boldsymbol\lambda,\boldsymbol\mu,\boldsymbol y) +&= \frac{1}{2}(\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu)^\top\mathbf Q^{-1}\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu - (\mathbf A^\top\boldsymbol\lambda-\boldsymbol\mu)^\top\mathbf Q^{-1}(\mathbf A^\top\boldsymbol\lambda+\boldsymbol p-\boldsymbol\mu)\\ +&= \frac{1}{2}(-\mathbf Q\boldsymbol y)^\top\mathbf Q^{-1}(-\mathbf Q\boldsymbol y)-(-\mathbf Q\boldsymbol y-\boldsymbol p)^\top\mathbf Q^{-1}(-\mathbf Q\boldsymbol y)\\ +&= -\frac{1}{2}\boldsymbol y^\top\mathbf Q\boldsymbol y-\boldsymbol p^\top\boldsymbol y\,.\\ +\end{aligned}\] + +

Together with Equation $(5)$ and $y_i \ge 0$, we have the dual of the dual formulation:

+ +\[\max_{\boldsymbol y}-\frac{1}{2}\boldsymbol y^\top\mathbf Q\boldsymbol y-\boldsymbol p^\top\boldsymbol y\,;\quad\text{s.t. }\mathbf A\boldsymbol y=\boldsymbol b,\ y_i \ge 0\,.\] + +

Clearly, this is equivalent to the original QP.

+ +
+ +
+ +
+
+ + + diff --git a/2023/09/10/make-use-of-openmp-via-cython-on-mac.html b/2023/09/10/make-use-of-openmp-via-cython-on-mac.html new file mode 100644 index 000000000..0e19ade12 --- /dev/null +++ b/2023/09/10/make-use-of-openmp-via-cython-on-mac.html @@ -0,0 +1,208 @@ + + + + + + + + +Make use of openmp via cython on macOS | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Make use of openmp via cython on macOS

+ dev/cython + +
+ +
+

Abstract

+ +

This post gives a concise example on how to use OpenMP in Cython on macOS.

+ +

Prerequisite

+ +

Install OpenMP.

+ +
brew install libomp
+
+ +

Install numpy (used in the example) and Cython.

+ +
conda install numpy cython
+
+ +

My Cython version is 3.0.0.

+ +

Example

+ +

In test.pyx, we implement the log-sum-exp trick in Cython.

+ +
from cython.parallel cimport prange
+from libc.math cimport exp, log, fmax
+cimport cython
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef double c_max(
+    int N,
+    double *a,
+) nogil:
+    cdef int i
+    cdef double b = a[0]
+    for i in range(1, N):
+        b = fmax(b, a[i])
+    return b
+
+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+cdef double c_logsumexp(
+    int N,
+    double *a,
+) nogil:
+    cdef int i
+    cdef double b = c_max(N, a)
+    cdef double x = 0.0
+    for i in prange(N):
+        x += exp(a[i] - b)
+    x = b + log(x)
+    return x
+
+
+def logsumexp(double [::1] a):
+    return c_logsumexp(a.shape[0], &a[0])
+
+ +

Note how to write the setup.py:

+ +
from setuptools import Extension, setup
+from Cython.Build import cythonize
+
+
+extensions = [
+    Extension(
+        'test',
+        sources=['test.pyx'],
+        extra_compile_args=['-Xpreprocessor', '-fopenmp'],
+        extra_link_args=['-lomp'],
+    ),
+]
+
+setup(
+    ext_modules=cythonize(extensions, language_level='3'),
+    zip_safe=False,
+)
+
+ +

The -Xpreprocessor is required for the openmp pragmas to be processed.

+ +

Build

+ +
python3 setup.py build_ext --inplace
+
+ +

After the build, ls -F output on my mac:

+ +
build/  setup.py  test.c  test.cpython-39-darwin.so*  test.pyx
+
+ +

Test

+ +
python3 -m timeit -s 'from scipy.special import logsumexp; import numpy as np; a = np.random.randn(1000)' 'logsumexp(a)'
+python3 -m timeit -s 'from test import logsumexp; import numpy as np; a = np.random.randn(1000)' 'logsumexp(a)'
+
+ +

The output:

+ +
10000 loops, best of 5: 32.1 usec per loop
+50000 loops, best of 5: 6.66 usec per loop
+
+ +
+ +
+ +
+
+ + + diff --git a/2023/09/24/verify-permutation-equivalence-of-multihead-attention-in-pytorch.html b/2023/09/24/verify-permutation-equivalence-of-multihead-attention-in-pytorch.html new file mode 100644 index 000000000..eec29869b --- /dev/null +++ b/2023/09/24/verify-permutation-equivalence-of-multihead-attention-in-pytorch.html @@ -0,0 +1,146 @@ + + + + + + + + +Verify permutation equivalence of Multi-Head Attention in PyTorch | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Verify permutation equivalence of Multi-Head Attention in PyTorch

+ dev/pytorch | machine learning + +
+ +
+

It’s well known that Multi-Head Attention is permutation equivalent (e.g. here). +Let’s verify it in PyTorch.

+ +
import torch
+from torch import nn
+
+batch_size = 16
+seq_length = 10
+embed_dim = 384
+n_heads = 8
+
+attn = nn.MultiheadAttention(embed_dim, n_heads, batch_first=True)
+X = torch.rand(batch_size, seq_length, embed_dim)
+o = torch.randperm(seq_length)
+z1, _ = attn(X, X, X)
+z2, _ = attn(X[:, o], X[:, o], X[:, o])
+print(torch.allclose(z1[:, o], z2))
+
+ +

Almost certainly, it will print a False. +What’s going wrong? +It turns out that PyTorch uses torch.float32 by default. +Let’s increase the precision to torch.float64:

+ +
import torch
+from torch import nn
+
+batch_size = 16
+seq_length = 10
+embed_dim = 384
+n_heads = 8
+
+attn = nn.MultiheadAttention(embed_dim, n_heads, batch_first=True).to(torch.float64)
+X = torch.rand(batch_size, seq_length, embed_dim, dtype=torch.float64)
+o = torch.randperm(seq_length)
+z1, _ = attn(X, X, X)
+z2, _ = attn(X[:, o], X[:, o], X[:, o])
+print(torch.allclose(z1[:, o], z2))
+
+ +

It should print True now.

+ +
+ +
+ +
+
+ + + diff --git a/2023/10/04/estimate-expectation-of-function-of-random-variable.html b/2023/10/04/estimate-expectation-of-function-of-random-variable.html new file mode 100644 index 000000000..1dc00ba59 --- /dev/null +++ b/2023/10/04/estimate-expectation-of-function-of-random-variable.html @@ -0,0 +1,456 @@ + + + + + + + + +Estimate the expectation of the function of a random variable | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Estimate the expectation of the function of a random variable

+ math/probability + +
+ +
+

First prepare the functions we’ll use later. +The implementations can be tested by py.test.

+ +
# %load expectation_of_function.py
+from functools import partial
+
+import numpy as np
+from scipy.special import logsumexp
+from scipy import stats
+
+
+def softmax(x):
+    b = np.max(x, axis=-1, keepdims=True)
+    z = np.exp(x - b)
+    return z / np.sum(z, axis=-1, keepdims=True)
+
+
+def softmax_jac(x):
+    s = softmax(x)
+    I = np.eye(x.shape[0])
+    return s[:, np.newaxis] * (I - s[np.newaxis, :])
+
+
+def test_softmax_jac():
+    n, d = 100, 100
+    X = np.random.randn(n, d)
+    I = np.eye(d)
+    for j in range(n):
+        s = softmax(X[j])
+        expected = np.empty((d, d))
+        for k in range(d):
+            expected[k] = s[k] * (I[k] - s)
+        assert np.allclose(softmax_jac(X[j]), expected)
+
+
+def softmax_hess(x):
+    s = softmax(x)
+    a1 = np.outer(s, s) - np.diag(s)
+    a2 = np.eye(x.shape[0]) - s[np.newaxis, :]
+    a3 = np.matmul(a2[:, :, np.newaxis], a2[:, np.newaxis, :])
+    a4 = a1[np.newaxis, :, :] + a3
+    return s[:, np.newaxis, np.newaxis] * a4
+
+
+def test_softmax_hess():
+    n, d = 100, 100
+    X = np.random.randn(n, d)
+    I = np.eye(d)
+    for j in range(n):
+        s = softmax(X[j])
+        expected = np.empty((d, d, d))
+        for k in range(d):
+            expected[k] = s[k] * (
+                np.outer(s, s) - np.diag(s) + np.outer(I[k] - s, I[k] - s))
+        assert np.allclose(softmax_hess(X[j]), expected)
+
+
+def logsoftmax(x):
+    return x - logsumexp(x, axis=-1, keepdims=True)
+
+
+def logsoftmax_jac(x):
+    s = softmax(x)
+    I = np.eye(x.shape[0])
+    return I - s[np.newaxis, :]
+
+
+def logsoftmax_hess(x):
+    s = softmax(x)
+    return (np.outer(s, s) - np.diag(s))[np.newaxis]
+
+
+# Deprecated
+def expectation_logsoftmax_approx_at_mu(mu, Sigma):
+    s = softmax(mu)
+    ls = logsoftmax(mu)
+    return ls + np.trace(np.matmul(np.outer(s, s) - np.diag(s), Sigma)) / 2
+
+
+def sigmoid(x):
+    z = np.where(x >= 0, np.exp(-x), np.exp(x))
+    return np.where(x >= 0, 1 / (1 + z), z / (1 + z))
+
+
+def test_sigmoid():
+    n = 1000
+    x = np.random.randn(n)
+    expected = [sigmoid(x[j]) for j in range(n)]
+    assert np.allclose(sigmoid(x), expected)
+
+
+def sigmoid_jac(x):
+    z = sigmoid(x)
+    return z * (1 - z)
+
+
+def sigmoid_hess(x):
+    z = sigmoid(x)
+    return z * (1 - z) * (1 - 2 * z)
+
+
+def logsigmoid(x):
+    return -np.logaddexp(0, -x)
+
+
+def logsigmoid_jac(x):
+    return 1 - sigmoid(x)
+
+
+def logsigmoid_hess(x):
+    z = sigmoid(x)
+    return z * (z - 1)
+
+
+# pylint: disable=too-many-arguments
+def expectation_approx(mu, Sigma, a, fun, jac, hess):
+    f = fun(a)
+    J = jac(a)
+    H = hess(a)
+    d = mu - a
+    if f.ndim == 1:
+        a1 = f
+        a2 = np.dot(J, d)
+        a3 = np.ravel(
+            np.matmul(
+                np.matmul(d[np.newaxis, np.newaxis, :], H), d[np.newaxis, :,
+                                                              np.newaxis]))
+        a4 = np.trace(np.matmul(H, Sigma[np.newaxis]), axis1=1, axis2=2)
+        return a1 + a2 + (a3 + a4) / 2
+    a1 = f
+    a2 = np.dot(J, d)
+    a3 = np.dot(np.dot(d, H), d)
+    a4 = np.dot(H, Sigma)
+    if a4.ndim > 0:
+        a4 = np.trace(a4)
+    return a1 + a2 + (a3 + a4) / 2
+
+
+def test_expectation_approx():
+    n, d = 100, 100
+    mu = np.random.randn(n, d)
+    Sigma = np.random.randn(n, d, d)
+    Sigma = np.matmul(Sigma, np.transpose(Sigma, (0, 2, 1)))
+    for j in range(n):
+        actual = expectation_approx(mu[j], Sigma[j], mu[j], logsoftmax,
+                                    logsoftmax_jac, logsoftmax_hess)
+        expected = expectation_logsoftmax_approx_at_mu(mu[j], Sigma[j])
+        assert np.allclose(actual, expected)
+
+
+def expectation_MC(fun, rvs, n):
+    X = rvs(size=n)
+    return np.mean(fun(X), axis=0)
+
+
+def multivariate_normal_rvs(mean, cov):
+    return partial(stats.multivariate_normal.rvs, mean=mean, cov=cov)
+
+
+def gamma_rvs(a, b):
+    return partial(stats.gamma.rvs, a=a, scale=1 / b)
+
+
+def gamma_mean(a, b):
+    return a / b
+
+
+def gamma_mode(a, b):
+    return (a - 1) / b
+
+
+def gamma_cov(a, b):
+    return a / b**2
+
+
+def dirichlet_rvs(alpha):
+    return partial(stats.dirichlet.rvs, alpha=alpha)
+
+
+def dirichlet_mean(alpha):
+    alpha0 = np.sum(alpha)
+    return alpha / alpha0
+
+
+def dirichlet_mode(alpha):
+    K = alpha.shape[0]
+    alpha0 = np.sum(alpha)
+    return (alpha - 1) / (alpha0 - K)
+
+
+def dirichlet_cov(alpha):
+    K = alpha.shape[0]
+    alpha0 = np.sum(alpha)
+    return (np.eye(K) * alpha * alpha0 - np.outer(alpha, alpha)) / (
+        alpha0**2 * (alpha0 + 1))
+
+ +
from matplotlib import pyplot as plt
+
+ +

We’d like to estimate $\mathbb E_{\boldsymbol x \sim p_X(\boldsymbol x)}[f(\boldsymbol x)]$. +The idea is to approximate the expectation by the 2nd-order Taylor expansion.

+ +

Assume that the Tayler series is expanded at $\boldsymbol x = \boldsymbol a$:

+ +\[\begin{aligned} + f(\boldsymbol x) &= f(\boldsymbol a) + \nabla f(\boldsymbol a)^\top(\boldsymbol x-\boldsymbol a) + \frac{1}{2}(\boldsymbol x-\boldsymbol a)^\top\mathbf H f(\boldsymbol a)(\boldsymbol x-\boldsymbol a)+R_2(\boldsymbol x)\\ + \mathbb E[f(\boldsymbol x)] &\approx f(\boldsymbol a) + \nabla f(\boldsymbol a)^\top (\boldsymbol\mu-\boldsymbol a) + \frac{1}{2}\big((\boldsymbol\mu-\boldsymbol a)^\top \mathbf H f(\boldsymbol a) (\boldsymbol\mu-\boldsymbol a) + \operatorname{tr}(\mathbf H f(\boldsymbol a) \boldsymbol\Sigma)\big)\,,\\ +\end{aligned}\] + +

with error bound (see definition here; and Little-o notation here):

+ +\[\begin{aligned} + R_2(\boldsymbol x) &\in o(\|\boldsymbol x-\boldsymbol a\|^2)\\ + \mathbb E[R_2(\boldsymbol x)] &\in o(\|\boldsymbol\mu-\boldsymbol a\|^2 + \operatorname{tr}(\boldsymbol\Sigma))\,.\\ +\end{aligned}\] + +

It seems that if the Tayler series is not expanded at the mean, the error bound will increase.

+ +

Give it a try on $\mathbb E_{x \sim \text{Exp}(\lambda)}[\log\operatorname{sigmoid}(x)]$, where $\text{Exp}(\lambda)$ is the exponential distribution, or Gamma distribution with parameter $a=1$. +The Monte Carlo result is taken as the groundtruth:

+ +
a, b = 1, 1
+expected = expectation_MC(logsigmoid, gamma_rvs(a, b), 100000)
+approx_at_mu = expectation_approx(gamma_mean(a, b), gamma_cov(a, b), gamma_mean(a, b),
+                                  logsigmoid, logsigmoid_jac, logsigmoid_hess)
+approx_at_mode = expectation_approx(gamma_mean(a, b), gamma_cov(a, b), gamma_mode(a, b),
+                                    logsigmoid, logsigmoid_jac, logsigmoid_hess)
+np.abs(approx_at_mu - expected), np.abs(approx_at_mode - expected)
+
+ +
(0.025421238663924095, 0.05700076508490559)
+
+ +

Okay, so we’d better expand the Taylor series at mean.

+ +

So now the expectation approximation reduces to

+ +\[\mathbb E[f(\boldsymbol x)] \approx f(\boldsymbol\mu) + \frac{1}{2}\operatorname{tr}(\mathbf H f(\boldsymbol\mu) \boldsymbol\Sigma)\,,\] + +

by plugging in $\boldsymbol a=\boldsymbol\mu$, and with error bound

+ +\[R_2(\boldsymbol x) \in o(\operatorname{tr}(\boldsymbol\Sigma))\,.\] + +

We may now verify that the error is indeed positively related to the trace of the covariance. +Take the approximation of $\mathbb E_{\boldsymbol x \sim \mathcal N(\boldsymbol\mu,\boldsymbol\Sigma)}[\log\operatorname{softmax}(\boldsymbol x)]$ as an example, and again regards the Monte Carlo result as the groundtruth:

+ +
d = 50
+mu = np.random.randn(d)
+Sigma = np.random.randn(d, d)
+# make the covariance positive semi-definite
+Sigma = np.dot(Sigma.T, Sigma)
+
+expected = expectation_MC(logsoftmax, multivariate_normal_rvs(mu, Sigma), 100000)
+approx = expectation_approx(mu, Sigma, mu, logsoftmax, logsoftmax_jac, logsoftmax_hess)
+np.trace(Sigma), np.mean(np.abs(approx - expected))
+
+ +
(2534.8991641540433, 11.581681866513225)
+
+ +
Sigma /= 1000
+
+expected = expectation_MC(logsoftmax, multivariate_normal_rvs(mu, Sigma), 100000)
+approx = expectation_approx(mu, Sigma, mu, logsoftmax, logsoftmax_jac, logsoftmax_hess)
+np.trace(Sigma), np.mean(np.abs(approx - expected))
+
+ +
(2.5348991641540435, 0.0006679801955036791)
+
+ +

The mean error drops by 25000 times as the trace decreases by 1000 times.

+ +

Now take $\mathbb E_{\boldsymbol x \sim \text{Dirichlet}(\boldsymbol\alpha)}[\log\operatorname{softmax}(\boldsymbol x)]$ as another example:

+ +
d = 5
+alpha = 6 / d * np.ones(d)
+mu = dirichlet_mean(alpha)
+Sigma = dirichlet_cov(alpha)
+
+expected = expectation_MC(logsoftmax, dirichlet_rvs(alpha), 100000)
+approx = expectation_approx(mu, Sigma, mu, logsoftmax, logsoftmax_jac, logsoftmax_hess)
+np.trace(Sigma), np.mean(np.abs(approx - expected))
+
+ +
(0.11428571428571428, 0.0005659672760450097)
+
+ +
d = 5
+alpha = 60 / d * np.ones(d)
+mu = dirichlet_mean(alpha)
+Sigma = dirichlet_cov(alpha)
+
+expected = expectation_MC(logsoftmax, dirichlet_rvs(alpha), 100000)
+approx = expectation_approx(mu, Sigma, mu, logsoftmax, logsoftmax_jac, logsoftmax_hess)
+np.trace(Sigma), np.mean(np.abs(approx - expected))
+
+ +
(0.013114754098360656, 0.0001473556430732881)
+
+ +

The mean error drops three times as the trace decreases by ten times.

+ +

Hence, the error is certainly positively related to the trace of the covariance.

+ +

To conclude the notebook, assuming that the underlying distribution is multivariate Gaussian, let’s see if the approximation conforms to intuition when $f$ is sigmoid or softmax – to see if the expectation fails within the range of sigmoid or softmax.

+ +
mu = np.array(1.7)
+Sigma = np.logspace(-7, 2, 10)
+approxes = np.array([expectation_approx(mu, Sigma[j], mu, sigmoid, sigmoid_jac, sigmoid_hess)
+                     for j in range(Sigma.shape[0])])
+expected = np.array([expectation_MC(sigmoid, multivariate_normal_rvs(mu, Sigma[j]), 100000)
+                     for j in range(Sigma.shape[0])])
+fig, ax = plt.subplots()
+ax.plot(Sigma, approxes, marker='o', label='approximation')
+ax.plot(Sigma, expected, linestyle='--', color='red', label='groundtruth')
+ax.set_xlabel(r'$\operatorname{tr}(\Sigma)$')
+ax.set_xscale('log')
+ax.legend()
+ax.grid()
+
+ +

output_11_0

+ +

For sigmoid, after the trace of the covariance exceeds 1.0, the approximation starts to deviate from the groundtruth.

+ +
d = 384
+mu = np.random.randn(d)
+Sigma = np.random.randn(d, d)
+Sigma = np.dot(Sigma.T, Sigma)
+a = np.logspace(0, 7, 8)[::-1]
+approxes = np.stack([expectation_approx(mu, Sigma / a[j], mu, softmax, softmax_jac, softmax_hess)
+                     for j in range(a.shape[0])])
+expected = np.stack([expectation_MC(softmax, multivariate_normal_rvs(mu, Sigma / a[j]), 100000)
+                     for j in range(a.shape[0])])
+traces = np.trace(Sigma[np.newaxis] / a[:, np.newaxis, np.newaxis], axis1=1, axis2=2)
+fig, ax = plt.subplots()
+ax.plot(traces, np.mean(approxes, axis=1), color='blue', alpha=0.8, marker='o', label='approximation')
+ax.fill_between(traces, np.max(approxes, axis=1), np.min(approxes, axis=1), color='blue', alpha=0.2)
+ax.plot(traces, np.mean(expected, axis=1), color='red', alpha=0.8, linestyle='--', label='groundtruth')
+ax.fill_between(traces, np.max(expected, axis=1), np.min(expected, axis=1), color='red', alpha=0.2)
+ax.set_xlabel(r'$\operatorname{tr}(\Sigma)$')
+ax.set_xscale('log')
+ax.legend()
+ax.grid()
+
+ +

output_13_0

+ +

For softmax, after the trace of the covariance exceeds 1000, the range of the expectation starts to be counterintuitive.

+ +
+ +
+ +
+
+ + + diff --git a/2023/10/06/dimensionality-reduction-by-svd.html b/2023/10/06/dimensionality-reduction-by-svd.html new file mode 100644 index 000000000..cfdeef114 --- /dev/null +++ b/2023/10/06/dimensionality-reduction-by-svd.html @@ -0,0 +1,152 @@ + + + + + + + + +Dimensionality reduction by SVD | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Dimensionality reduction by SVD

+ math/linear algebra + +
+ +
+

Let $\mathbf X \in \mathbb R^{m \times d}$ be data matrix where the $t$-th row is the $t$-th item in the dataset. +One may achieve dimensionality reduction from $\mathbf X$ to $\tilde{\mathbf X}$, by computing the SVD: $\mathbf X = \mathbf U \mathbf S \mathbf V^\top$, and let $\tilde{\mathbf X} = \mathbf U_k\mathbf S_k$, where $\mathbf U_k$ is the first $k$ columns of $\mathbf U$ and $\mathbf S_k$ is a diagonal matrix of the first $k$ diagonal elements of $\mathbf S$. +The idea behind this process can be viewed as either the classic MDS or PCA.

+ +

In classic MDS, one wants to maintain as much as possible the inner product matrix $\mathbf X\mathbf X^\top = \sum_{j=1}^r \sigma_j^2 \boldsymbol u_j \boldsymbol u_j^\top$ where $\sigma_j$’s have been sorted in descending order. +Clearly, one may perform low-rank approximation of $\mathbf X$ by $\tilde{\mathbf X} = \mathbf U_k \mathbf S_k$ such that $\mathbf X \mathbf X^\top \approx \tilde{\mathbf X}\tilde{\mathbf X}^\top$.

+ +

In PCA, one aims to find the orthonormal transformation matrix $\mathbf V_k$, which is the first $k$ columns of the eigenvectors of the covariance matrix $\mathbf X^\top \mathbf X$ (up to a constant) where the eigenvalues have been sorted in descending order, and then reaches the low-dimensional representation $\mathbf X \mathbf V_k$, which is identical to $\mathbf U_k \mathbf S_k$.

+ +

One point to note is that, if the data matrix is arranged as $\mathbf X’ \in \mathbb R^{d \times m}$, where each column is a vector in the dataset, and let $\mathbf X’ = \mathbf U \mathbf S \mathbf V^\top$ instead, then the low-dimensional representation will be $\mathbf S_k \mathbf V_k^\top$. +Let’s derive this with PCA: +The covariance matrix is now $\mathbf X’ \mathbf X^{\prime\top}$ and so the transformed data matrix is $\mathbf U^\top \mathbf X’$. +By SVD, clearly it equals $\mathbf S_k V_k^\top$.

+ +

As a matter of fact, if we denote $\mathbf X=\mathbf U_1 \mathbf S_1 \mathbf V_1^\top$ and $\mathbf X^\top = \mathbf V_2 \mathbf S_2 \mathbf U_2^\top$, then it should turn out that $(\mathbf U_1 \mathbf S_1)^\top = \mathbf S_2 \mathbf U_2^\top$. +Let’s write Python3 code to verify this:

+ +
import numpy as np
+
+X = np.random.randn(1000, 100)
+
+U1, S1, VT1 = np.linalg.svd(X, full_matrices=False)
+V2, S2, UT2 = np.linalg.svd(X.T, full_matrices=False)
+assert np.allclose((U1 @ np.diag(S1)).T, np.diag(S2) @ UT2)
+
+ +

It could occurs that the assertion fails. +The reason is that given diagonal matrix $\mathbf Q$ where its diagonal elements be either $1$ or $-1$, and given SVD $\mathbf X = \mathbf U \mathbf S \mathbf V^\top$, the following holds for any such $\mathbf Q$: $\mathbf X = \mathbf U \mathbf Q \mathbf S \mathbf Q^\top \mathbf V^\top$. +We need to take into account this case. +Rewriting the code as:

+ +
import numpy as np
+
+X = np.random.randn(1000, 100)
+
+U1, S1, VT1 = np.linalg.svd(X, full_matrices=False)
+V2, S2, UT2 = np.linalg.svd(X.T, full_matrices=False)
+# assert np.allclose((U1 @ np.diag(S1)).T, np.diag(S2) @ UT2)
+
+q = np.mean(U1 / UT2.T, axis=0)
+assert np.allclose(q, U1 / UT2.T)
+Q = np.diag(q)
+assert np.allclose((U1 @ np.diag(S1)).T, np.diag(S2) @ Q @ UT2)
+
+ +

We should now pass the assertion.

+ +
+ +
+ +
+
+ + + diff --git a/2023/11/03/map-estimation-cov-gmm.html b/2023/11/03/map-estimation-cov-gmm.html new file mode 100644 index 000000000..06fd98c29 --- /dev/null +++ b/2023/11/03/map-estimation-cov-gmm.html @@ -0,0 +1,169 @@ + + + + + + + + +Maximum a posteriori estimation of the covariance in Gaussian Mixture models | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Maximum a posteriori estimation of the covariance in Gaussian Mixture models

+ math/probability + +
+ +
+
+plate diagram of a GMM +

plate diagram of a Gaussian Mixture model

+
+ +

The goal is to maximize $\log P(\boldsymbol\Sigma \mid \mathbf x, \boldsymbol\mu,\boldsymbol\alpha)$. +$T$ is the number of samples. +$K$ is the number of Gaussian components. +The dataset is $\mathbf X \triangleq \{\boldsymbol x_t\}_{t=1}^T$.

+ +

By expectation-maximization paradigm:

+ +\[\begin{aligned} + \log P(\boldsymbol\Sigma \mid \mathbf X, \boldsymbol\mu,\boldsymbol\alpha) + &= \log P(\mathbf X \mid \boldsymbol\Sigma,\boldsymbol\mu,\boldsymbol\alpha) + \log P(\boldsymbol\Sigma) + \text{constant}\\ + &= \sum_{t=1}^T \log P(\boldsymbol x_t \mid \boldsymbol\Sigma,\boldsymbol\mu,\boldsymbol\alpha) + \log P(\boldsymbol\Sigma) + \text{constant}\\ + &= \sum_{t=1}^T \log \sum_{k=1}^K P(\boldsymbol x_t,Z_t=k \mid \boldsymbol\Sigma,\boldsymbol\mu,\boldsymbol\alpha) + \log \sum_{k=1}^K P(\boldsymbol\Sigma_k) + \text{constant}\\ + &\ge \sum_{t=1}^T \sum_{k=1}^K P(Z_t=k \mid \mathbf X) \log \left[ \frac{P(\boldsymbol x_t, Z_t=k \mid \boldsymbol\Sigma,\boldsymbol\mu,\boldsymbol\alpha)}{P(Z_t=k \mid \mathbf X)} \right] + \sum_{k=1}^K \log P(\boldsymbol\Sigma_k)\,.\\ +\end{aligned}\] + +

It’s straightforward to compute $P(Z_t=k \mid \mathbf X)$ using Bayes law at E-step. +Denote it as $r_{tk}$. +The part we need to maximize at M-step is:

+ +\[Q(\boldsymbol\Sigma,\boldsymbol\mu,\boldsymbol\alpha) += \sum_{t=1}^T \sum_{k=1}^K r_{tk} [\log P(Z_t=k \mid \boldsymbol\alpha) + \log \mathcal N(\boldsymbol x_t \mid \boldsymbol\Sigma_k,\boldsymbol\mu_k)] + \sum_{k=1}^K \log P(\boldsymbol\Sigma_k)\,.\] + +

Note that

+ +\[\log\mathcal N(\boldsymbol x_t \mid \boldsymbol\Sigma_k,\boldsymbol\mu_k) += \frac{1}{2}\log\det \boldsymbol\Sigma_k^{-1} - \frac{1}{2}(\boldsymbol x_t-\boldsymbol\mu_k)^\top \boldsymbol\Sigma_k^{-1} (\boldsymbol x_t-\boldsymbol\mu_k) + \text{constant}\,,\] + +

that we’ll not focus on the MLE of $\boldsymbol\alpha$ (by Lagrangian multiplier) and $\boldsymbol\mu$, and that the optimization for different $k$’s are independent, we may further simplify the equation to

+ +\[Q(\boldsymbol\Sigma_k) = \sum_{t=1}^T r_{tk} \left[ \frac{1}{2}\log\det \boldsymbol\Sigma_k^{-1} - \frac{1}{2}(\boldsymbol x_t-\boldsymbol\mu_k)^\top \boldsymbol\Sigma_k^{-1}(\boldsymbol x_t-\boldsymbol\mu_k) \right] + P(\boldsymbol\Sigma_k)\,.\] + +

Using properties of the trace operator,

+ +\[Q(\boldsymbol\Sigma_k) = \frac{1}{2}\sum_{t=1}^T r_{tk} [\log\det \boldsymbol\Sigma_k^{-1} - \operatorname{tr}(\mathbf S_{tk} \boldsymbol\Sigma_k^{-1})] + P(\boldsymbol\Sigma_k)\,,\] + +

where $S_{tk} \triangleq (\boldsymbol x_t-\boldsymbol\mu_k)(\boldsymbol x_t-\boldsymbol\mu_k)^\top$. +For the conjugate prior $P(\boldsymbol\Sigma_k)$, we choose the inverse Wishart distribution:

+ +\[\operatorname{IW}(\boldsymbol\Sigma_k \mid \mathbf S_0^{-1},\nu_0) \propto (\det \boldsymbol\Sigma_k)^{-N_0/2}\exp\left(-\frac{1}{2}\operatorname{tr}(\mathbf S_0 \boldsymbol\Sigma_k^{-1})\right)\,,\] + +

where $N_0 \triangleq \nu_0 + D + 1$, and $D$ be the dimension of $\boldsymbol x_t$. +Thus,

+ +\[Q(\boldsymbol\Sigma_k) = \frac{1}{2}\sum_{t=1}^T r_{tk} [\log\det \boldsymbol\Sigma_k^{-1} - \operatorname{tr}(\mathbf S_{tk} \boldsymbol\Sigma_k^{-1})] + \frac{1}{2}[N_0 \log\det \boldsymbol\Sigma_k^{-1} - \operatorname{tr}(\mathbf S_0 \boldsymbol\Sigma_k^{-1})]\,.\] + +

Computing the partial derivative of $Q$ with respect to $\Sigma_k^{-1}$ and equating the partial derivative to zero, we have:

+ +\[\begin{aligned} + 0 &= \frac{\partial Q}{\partial \boldsymbol\Sigma_k^{-1}}\\ + &= \frac{1}{2}\sum_{t=1}^T r_{tk} (\boldsymbol\Sigma_k^\top - \mathbf S_{tk}^\top) + \frac{1}{2} (N_0 \boldsymbol\Sigma_k^\top - \mathbf S_0^\top)\\ + &= \frac{1}{2} \sum_{t=1}^T r_{tk} (\boldsymbol\Sigma_k-\mathbf S_{tk}) + \frac{1}{2}(N_0 \boldsymbol\Sigma_k-\mathbf S_0)\\ + \boldsymbol\Sigma_k &= \frac{\mathbf S_0 + \sum_{t=1}^T r_{tk} \mathbf S_{tk}}{N_0 + \sum_{t=1}^T r_{tk}}\,.\\ +\end{aligned}\] + +

Further reading:

+ +

Section 4.6.2 and 11.4.2.8 of: +Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.

+ +
+ +
+ +
+
+ + + diff --git a/2023/11/28/toss-coin.html b/2023/11/28/toss-coin.html new file mode 100644 index 000000000..ad52139db --- /dev/null +++ b/2023/11/28/toss-coin.html @@ -0,0 +1,274 @@ + + + + + + + + +Estimate the head probability of a coin | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Estimate the head probability of a coin

+ math/probability + +
+ +
+

The problem

+ +

Toss a coin $T$ times. +Let $X=\{x_1,\dots,x_T\} \in \{+,-\}^T$ be the result. +Let $N_+ = \sum_{t=1}^T\mathbb I(x_t=+)$, $N_- = \sum_{t=1}^T\mathbb I(x_t=-)$. +Let $P(x=+ \mid \theta)$ be the probability that the coin shows head in a toss. +How to estimate $\theta$ from $X$?

+ +

MLE

+ +\[\arg\max \log P(X \mid \theta) = \arg\max \big(N_+\log\theta + N_-\log(1-\theta)\big)\,.\] + +

Taking derivative w.r.t. $\theta$ and letting it equal to zero yields

+ +\[\theta = \frac{N_+}{N_+ + N_-}\,.\] + +

As can be easily observed, it overfit when there’s not enough data(, e.g. when $N_+=6$, $N_-=0$).

+ +

MAP

+ +

Apply a beta prior $P(\theta \mid a, b) = \mathrm{Beta}(\theta \mid a, b)$. +Set $a = b = 2$ so that it’s proper.

+ +\[\arg\max \log P(\theta \mid X, a, b) = \arg\max \big(\log P(\theta \mid a, b) + \log P(X \mid \theta)\big)\,.\] + +

Similarly, this yields

+ +\[\theta = \frac{N_+ + a - 1}{N_+ + N_- + a + b - 2} = \frac{N_+ + 1}{N_+ + N_- + 2}\,.\] + +

This is also called Laplace smoothing.

+ +

Full Bayesian

+ +

Apply a prior $P(\theta \mid a, b) = \mathrm{Beta}(\theta \mid a, b)$, and find the posterior:

+ +\[P(\theta \mid X, a, b) = \frac{P(\theta \mid a, b) P(X \mid \theta)}{\int_0^1 P(\theta \mid a, b) P(X \mid \theta) \mathrm d \theta}\,.\] + +

To address the integral, notice that

+ +\[\int_0^1 x^\alpha (1-x)^\beta \mathrm dx = B(\alpha+1,\beta+1) = \frac{\Gamma(\alpha+1)\Gamma(\beta+1)}{\Gamma(\alpha+\beta+2)}\,,\] + +

where $B(\cdot,\cdot)$ is the beta function, and $\Gamma(\cdot)$ is the gamma function. +Therefore,

+ +\[P(\theta \mid X, a, b) = \mathrm{Beta}(\theta \mid N_+ + a, N_- + b)\,.\] + +

Now it’s straightforward to estimate the uncertainty in $\theta$ given $a$ and $b$.

+ +

Empirical Bayes

+ +

Here we abuse the term “empirical Bayes” since it original refers to a graphic model like this:

+ +

empirical bayes

+ +

Whereas the model we are using is like this:

+ +

coin model

+ +

Since the derivation is similar (use of EM), we’ll continue with that notation.

+ +

Again, apply a beta prior $P(\theta \mid a, b)$, but now we regard $(a,b)$ as unknown parameters. +By EM, the auxiliary function $Q$ is,

+ +\[\log P(X \mid a, b) \ge Q(P, \tilde P) = \int_0^1 P(\theta \mid X, a^{(t-1)}, b^{(t-1)}) \log \tilde P(X, \theta \mid a, b) \mathrm d \theta\,,\] + +

where at E-step, we have already computed $P(\theta \mid X, a^{(t-1)}, b^{(t-1)})$. +Factorizing the logarithm,

+ +\[\log \tilde P(X,\theta \mid a,b) = \log \tilde P(\theta \mid a,b) + \log \tilde P(X \mid \theta)\,,\] + +

we notece that the second term on the r.h.s. does not rely on $a,b$. +Therefore, we need only to optimize over the first term. +So now the auxiliary function reduces to

+ +\[\begin{aligned} +Q(P, \tilde P) +&= \int_0^1 P(\theta \mid X, a^{(t-1)},b^{(t-1)}) \log \tilde P(\theta \mid a, b)\\ +&= \int_0^1 \mathrm{Beta}(\theta \mid N_+ + a^{(t-1)}, N_- + b^{(t-1)}) \log \mathrm{Beta}(\theta \mid a, b)\,.\\ +\end{aligned}\] + +

Taking partial derivative w.r.t. $a$ on both sides:

+ +\[\begin{aligned} +\frac{\partial Q}{\partial a} +&= \frac{\partial}{\partial a} \int_0^1 \mathrm{Beta}(\theta \mid N_++a^{(t-1)},N_-+b^{(t-1)}) \log \mathrm{Beta}(\theta \mid a,b)\\ +&= \frac{\partial}{\partial a}\int_0^1 \mathrm{Beta}(\theta \mid N_++a^{(t-1)},N_-+b^{(t-1)}) [(a-1)\log\theta + (b-1)\log(1-\theta) - \log B(a,b)] \mathrm d \theta\\ +&= \int_0^1 \mathrm{Beta}(\theta \mid N_++a^{(t-1)},N_-+b^{(t-1)}) \frac{\partial}{\partial a} [(a-1)\log\theta + (b-1)\log(1-\theta) - \log B(a,b)] \mathrm d \theta\\ +&= \int_0^1 \mathrm{Beta}(\theta \mid N_++a^{(t-1)},N_-+b^{(t-1)}) \left[\log\theta - \frac{\partial}{\partial a}\log B(a,b)\right] \mathrm d\theta\\ +&= \int_0^1 \mathrm{Beta}(\theta \mid N_++a^{(t-1)},N_-+b^{(t-1)}) \log\theta \,\mathrm d\theta - \frac{\partial}{\partial a}\log B(a,b)\int_0^1 \mathrm{Beta}(\theta \mid N_++a^{(t-1)},N_-+b^{(t-1)}) \,\mathrm d\theta\\ +&= \frac{1}{B(N_++a^{(t-1)},N_-+b^{(t-1)})} \int_0^1 \theta^{N_++a^{(t-1)}-1} (1-\theta)^{N_-+b^{(t-1)}-1} \log\theta \,\mathrm d\theta - \frac{\partial}{\partial a} \log B(a,b)\,.\\ +\end{aligned}\] + +

Notice that

+ +\[\int_0^1 x^{\alpha-1} (1-x)^{\beta-1} \log x \,\mathrm d x = B(\alpha,\beta)(\psi(\alpha)-\psi(\alpha+\beta))\,,\] + +

where $\psi(x) \triangleq \frac{\partial}{\partial x}\log\Gamma(x)$. +We may using the same notation $\psi$ to expand the log-derivative of beta function. +Thus,

+ +\[\frac{\partial Q}{\partial a} = \psi(N_++a^{(t-1)})-\psi(N_++a^{(t-1)}+N_-+b^{(t-1)}) - (\psi(a) - \psi(a+b))\,.\] + +

Similarly,

+ +\[\frac{\partial Q}{\partial b} = \psi(N_-+b^{(t-1)}) - \psi(N_++a^{(t-1)}+N_-+b^{(t-1)}) - (\psi(b)-\psi(a+b))\,.\] + +

Setting initial value $a^{(0)}=b^{(0)}=1$, we may find optimal solution for $a$ and $b$.

+ +

BUT REALLY, here we may compute directly $\log P(X \mid a, b)$ due to the conjugate beta prior!

+ +

It turns out that

+ +\[L(a,b) \triangleq \log P(X \mid a, b) = \log\frac{B(N_++a,N_-+b)}{B(a,b)}\,.\] + +

Hence,

+ +\[\begin{cases} +\frac{\partial L}{\partial a} = \psi(N_++a) + \psi(a+b) - \psi(a) - \psi(N_++N_-+a+b)\\ +\frac{\partial L}{\partial b} = \psi(N_-+b) + \psi(a+b) - \psi(b) - \psi(N_++N_-+a+b)\\ +\end{cases}\] + +

Coding time:

+ +
import numpy as np
+from scipy.special import digamma, betaln
+from scipy.optimize import minimize
+
+# the log-likelihood
+def fun(x, p, n):
+    a, b = x
+    # the minus sign is because we are doing gradient descent (not ascent)
+    return -(betaln(p + a, n + b) - betaln(a, b))
+
+def jac(x, p, n):
+    a, b = x
+    # the minus sign is because we are doing gradient descent (not ascent)
+    ja = -(digamma(p + a) + digamma(a + b) - digamma(a)
+           - digamma(p + n + a + b))
+    jb = -(digamma(n + b) + digamma(a + b) - digamma(b)
+           - digamma(p + n + a + b))
+    return np.array([ja, jb])
+
+# Suppose N+ = 6 and N- = 0:
+print(minimize(fun, np.ones(2), args=(6, 0), method='L-BFGS-B', jac=jac,
+               bounds=[(1e-10, None), (1e-10, None)]))
+
+ +

The optimization result is:

+ +
message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
+  success: True
+   status: 0
+      fun: 1.6934720292738348e-10
+        x: [ 1.817e+00  1.000e-10]
+      nit: 2
+      jac: [-5.917e-11  1.693e+00]
+     nfev: 3
+     njev: 3
+ hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
+
+ +

From the result, the mean of the prior distribution goes to 1.0 (from left), the mode does not exists, and the density at 1.0 goes to infinity. +Such prior will drive $\theta$ to 1. +We observe that the model has severly overfit, exactly the same case if we were using simple MLE.

+ +

Conclusion

+ +

In conclusion, data-driven approach to set hyperparameters (e.g. empirical Bayes) (, at least in this example,) works only when there are enough well-sampled data.

+ +
+ +
+ +
+
+ + + diff --git a/2024/01/05/type-assertion-numba-trick.html b/2024/01/05/type-assertion-numba-trick.html new file mode 100644 index 000000000..2c8659093 --- /dev/null +++ b/2024/01/05/type-assertion-numba-trick.html @@ -0,0 +1,122 @@ + + + + + + + + +Assert variable types in numba | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Assert variable types in numba

+ dev/python + +
+ +
+

To assert that a variable is of any specific type, e.g., float32[:], one may apply this trick that makes use of numba signature:

+ +
import numba as nb
+
+
+# Define an auxiliary function that admits only the type you
+# want to assert, e.g. float32[:]
+assert_f32_1d = nb.njit(nb.none(nb.float32[:]))(lambda x: None)
+
+def function_to_debug_type(x, y, z):
+    ...
+    some_variable = ...
+    ...
+    # If `some_variable` is not of type float32[:], numba will
+    # point it out.
+    assert_f32_1d(some_variable)
+
+ +
+ +
+ +
+
+ + + diff --git a/2024/01/26/attempt-fully-differentiable-nnomp-alternative.html b/2024/01/26/attempt-fully-differentiable-nnomp-alternative.html new file mode 100644 index 000000000..78bfb92d8 --- /dev/null +++ b/2024/01/26/attempt-fully-differentiable-nnomp-alternative.html @@ -0,0 +1,174 @@ + + + + + + + + +An attempt to build fully differentiable alternative of (non-negative) matching pursuit algorithm for solving L0-sparsity dictionary learning | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

An attempt to build fully differentiable alternative of (non-negative) matching pursuit algorithm for solving L0-sparsity dictionary learning

+ machine learning/dictionary learning + +
+ +
+

Introduction

+ +

In sparse dictionary learning, sparse coding and dictionary update are solved in an alternating manner (Aharon et al., 2006). +In sparse coding stage, the following problem is solved given the dictionary $\mathbf D \in \mathbb R^{d \times n}$ and signals $y_j \in \mathbb R^d$:

+ +\[\min_{x_j}\ \|y_j-\mathbf D x_j\|_2^2 \quad \text{s.t. } \|x\|_0 \le K\,,\tag{1}\] + +

where $K$ is the sparsity. +Sometimes, there’s an additional constraint $x_j \succeq 0$ if non-negative sparse coding is required. +Since $L_0$ constraint is intractable to optimize exactly, either approximate greedy algorithm like (non-negative) orthogonal matching pursuit (Cai & Wang, 2011; Yaghoobi et al., 2015; Nguyen et al., 2019), or relaxation of $L_0$ to $L_1$ sparsity as (non-negative) basis pursuit (Chen & Donoho, 1994; Gregor & LeCun, 2010; Zhang et al., 2018; Tolooshams & Ba, 2022) are regarded as idiomatic solutions.

+ +

Proposed method

+ +

(Louizos et al., 2018) suggests a novel approach to handle the intractability of $L_0$ constraint. +Instead of tackling the $L_0$ constraint directly, the authors address the expectation of the $L_0$ norms by introducing Bernoulli random variables. +In the parlance of the sparse coding problem (1),

+ +\[\min_{x_j,\pi_j}\ \mathbb E_{q(z_j \mid \pi_j)}\left[\|y_j - \mathbf D (x_j' \odot z_j)\|_2^2\right] \quad \text{s.t. } \mathbf 1^\top \pi_j \le K\,,\tag{2}\] + +

where $x_j$ has been reparameterized as $x_j’ \odot z_j$, and for each $i$, $z_{ji} \sim \mathrm{Bernoulli}(\pi_{ji})$, $x_{ji}’ \in \mathbb R$, the symbol $\odot$ denotes elementwise product. +Note that Equation (2.1) can be trivially extend to non-negative sparse coding case by reparameterization $x_j := \exp(x_j’) \odot z_j$ or $x_j := \mathrm{softplus}(x_j’) \odot z_j$, where $\mathrm{softplus}(\cdot) = \log(1 + \exp(\cdot))$. +(Louizos et al., 2018) further introduces a smoother on the discrete random variable $z_j$ to allow for reparameterization trick (Kingma & Welling, 2014; Rezende et al., 2014), and the expectation in Equation (2) can be estimated by Monte Carlo sampling.

+ +

To solve the constrained minimization in Equation (2), it’s natural to proceed using Lagrangian multiplier and optimize under bound constraint only:

+ +\[\min_{x_j,\pi_j}\max_{\lambda_j \ge 0}\ \mathbb E_{q(z_j \mid \pi_j)}\left[\|y_j - \mathbf D (x_j' \odot z_j)\|_2^2\right] + \lambda_j(\mathbf 1^\top \pi_j - K)\,.\tag{3}\] + +

On the one hand, one may optimize $x_j,\pi_j,\lambda_j$ jointly via gradient descent. +However, it’s worthy noting that one must perform gradient ascent on $\lambda_j$, which can be achieved by negating its gradient before the descent step. +On the other hand, dual gradient ascent can be adopted. +Here, given fixed $\lambda_j$, the objective (3) is minimized till a critical point; then given fixed $x_j$ and $\pi_j$, $\lambda_j$ is updated with one-step gradient ascent; finally, iterate.

+ +

In practice, potentially a great number of signals are required to be sparse coded given the dictionary:

+ +\[\min_{\boldsymbol x,\boldsymbol\pi}\max_{\boldsymbol\lambda \succeq 0}\ \sum_{j=1}^m \left\{\mathbb E_{q(z_j \mid \pi_j)}\left[\|y_j - \mathbf D (x_j' \odot z_j)\|_2^2\right] + \lambda_j(\mathbf 1^\top \pi_j - K)\right\}\,.\tag{4}\] + +

It’s not uncommon that all the variables to optimize, especially $\{x_j,\pi_j\}_{j=1}^m$, are unable to fit into memory, thus failing to run gradient descent. +Notice that for each $j$, the optimal solution $(x_j^\ast,\pi_j^\ast)$ are related to $(y_j,\lambda_j)$; that is, $x_j^\ast = x(y_j,\lambda_j)$, $\pi_j^\ast = \pi(y_j,\lambda_j)$. +Therefore, I propose to perform amortized inference: to use a neural network $f$ parameterized by $\boldsymbol\phi$ that takes as input $(y_j,\lambda_j)$ to predict $x_j$ and $\pi_j$. +I found the use of ReLU activation in such network promotes training the most. +The objective (4) now becomes:

+ +\[\min_{\boldsymbol\phi} \max_{\boldsymbol\lambda \succeq 0}\ \sum_{j=1}^m \left\{\mathbb E_{q(z_j \mid \boldsymbol\phi)} \left[\|y_j - \mathbf D (f_x(y_j,\lambda_j;\boldsymbol\phi) \odot z_j)\|_2^2\right] + \lambda_j (\mathbf 1^\top f_\pi(y_j,\lambda_j;\boldsymbol\phi) - K)\right\}\,.\tag{5}\] + +

With dictionary learning, the dictionary need to be learned. +Using the objective (5), I found it preferable to optimize using the procedure below:

+ +
    +
  1. Given $\boldsymbol\lambda$, reinitialize $\boldsymbol\phi$, and jointly learn $\boldsymbol\phi$ and $\mathbf D$ until stationary point.
  2. +
  3. Given $\boldsymbol\phi$ and $\mathbf D$, perform one-step gradient ascent on $\boldsymbol\lambda$.
  4. +
  5. Iterate.
  6. +
+ +

I found the reinitialization step on the amortized network critically important. +Without it, the network tends to predict all-zero and eventually learns nothing. +However, the dictionary needs to be initialized only at the very beginning.

+ +

Experiments

+ +

For dictionary learning without non-negativity constraint on sparse coding, I compared against (Rubinstein et al., 2008) in image denoising. +My proposed fully differentiable solution converges slower and denoises poorer than K-SVD supported by batch OMP.

+ +

For dictionary learning with non-negative constraint on sparse coding, I compare against (Nguyen et al., 2019) in exploration of atoms of discourse, which is known to admit a non-negative sparse coding form (Arora et al., 2018). +While being faster, my proposed method still performs worse than non-negative OMP, in that the learned dictionary atoms are mostly not atoms of discourse.

+ +

Hence, this is the main reason why I record my attempt here in a post rather than write a paper. +Perhaps, the proposed method is promising, but it’s not well-prepared yet.

+ + +
+ +
+ +
+
+ + + diff --git a/2024/02/01/make-faded-color-wallpaper-for-mac.html b/2024/02/01/make-faded-color-wallpaper-for-mac.html new file mode 100644 index 000000000..37e0ed326 --- /dev/null +++ b/2024/02/01/make-faded-color-wallpaper-for-mac.html @@ -0,0 +1,235 @@ + + + + + + + + +使用 matplotlib 制作用于 macOS 的渐变色桌面 | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

使用 matplotlib 制作用于 macOS 的渐变色桌面

+ os/macOS | misc | dev/python + +
+ +
+

最近我喜欢上了纯色桌面,显得干净整洁。然而我发现一个问题,就是 Dock 在一些颜色下会变得不容易辨识。经过实验,Dock 在黑色下显得最清楚,但是使用纯黑色作桌面我感觉不是很美观。我希望有一个渐变色桌面,其中大部分是我想要的某种颜色,然后从偏底部至底部渐变为黑色,从而使 Dock 更清楚。然而 macOS 并没有提供这样的桌面。于是我决定使用 Python 的 matplotlib 自己画一个这样的桌面。

+ +

思路是:

+ +
    +
  1. 使用 system_profiler SPDisplaysDataType | grep Resolution 获取屏幕的像素上的长宽;
  2. +
  3. 使用 matplotlib.pyplot.cm.colors.LinearSegmentedColormap 制作一个由我想要的颜色渐变为黑色的 colormap;
  4. +
  5. 构造一个以第 1 步为长宽、以第 2 步为 colormap 的矩阵,使其颜色满足上述渐变色要求;
  6. +
  7. 保存为图片。
  8. +
+ +

主要问题出在第 4 步。我先去掉坐标轴,以为就没问题了,然而之后发现保存的图总是有一圈白色边框,怎么都去不掉(我尝试了这个问题下的若干评论)。最终我采用了这个回答的写法,虽然并不清楚原理 😅。总之问题就算解决了吧。

+ +

完整代码如下:

+ +
#!/usr/bin/env python3
+import argparse
+from pathlib import Path
+import subprocess
+import re
+
+import numpy as np
+import matplotlib
+
+matplotlib.use('Agg')
+from matplotlib import pyplot as plt
+
+
+def generate_wallpaper(
+    name: str,
+    primary_color_rgb,
+    start_fade_position: float,
+    force_save: bool,
+):
+    """
+    Save faded color as wallpaper.
+
+    :param name: the name to save
+    :param primary_color_rgb: the RGB 3-tuple of uint8 value range
+    :param start_fade_position: the position to start fading
+    :param force_save: ``True`` to overwrite existing files
+    """
+    whs = []
+
+    # 第 1 步,获取屏幕长宽
+    proc = subprocess.run(['system_profiler', 'SPDisplaysDataType'],
+                          text=True,
+                          capture_output=True,
+                          check=True)
+    for line in re.findall(r'(.*)\n', proc.stdout):
+        m = re.search(r'Resolution: (\d+) x (\d+)', line)
+        if m:
+            whs.append((int(m.group(1)), int(m.group(2))))
+
+    # 第 2 步,构造渐变色 colormap
+    colors = [np.asarray(primary_color_rgb) / 255, np.zeros(3)]
+    cmap = plt.cm.colors.LinearSegmentedColormap.from_list(
+        'colormap', colors, N=256)
+    for j, (w, h) in enumerate(whs, 1):
+        # 第 3 步,构造矩阵
+        image = np.zeros((h, w))
+        start = int(h * start_fade_position)
+        steps = h - start
+        # 使用 linspace 构造渐变色
+        image[start:] = np.linspace(0, 1, steps)[:, np.newaxis]
+
+        # 这里是不知道为什么能 work 的部分
+        sizes = image.shape[::-1]
+        fig = plt.figure()
+        fig.set_size_inches(1. * sizes[0] / sizes[1], 1, forward = False)
+        ax = plt.Axes(fig, [0., 0., 1., 1.])
+        ax.set_axis_off()
+        fig.add_axes(ax)
+        ax.imshow(image, cmap=cmap, aspect='auto')
+
+        # 保存为图片
+        tofile = Path(f'{name}_{j}.jpg')
+        if not force_save and tofile.exists():
+            raise FileExistsError
+        fig.savefig(tofile, dpi=sizes[0])
+        plt.close(fig)
+
+
+# 一些命令行参数
+def make_parser():
+    parser = argparse.ArgumentParser(
+        description='Generate faded color wallpaper.')
+    parser.add_argument(
+        '-n', '--name', help='default to "wallpaper"', default='wallpaper')
+    parser.add_argument(
+        '-c',
+        '--color',
+        nargs=3,
+        metavar=('R', 'G', 'B'),
+        type=int,
+        help='default to black',
+        default=[0, 0, 0])
+    parser.add_argument(
+        '-p',
+        '--fade-start-position',
+        type=float,
+        help='default to 0.0',
+        default=0.0)
+    parser.add_argument(
+        '-f',
+        '--force',
+        action='store_true',
+        help='force overwrite existing files')
+    return parser
+
+
+def main():
+    args = make_parser().parse_args()
+    generate_wallpaper(args.name, args.color, args.fade_start_position,
+                       args.force)
+
+
+if __name__ == '__main__':
+    main()
+
+ +

来跑一个试试:

+ +
# 以上代码保存为 wallpaper_gen.py
+python3 wallpaper_gen.py -c 0 54 9 -p 0.7
+
+ +

生成的图如下:

+ +

darkgreen

+ +
+ +
+ +
+
+ + + diff --git a/2024/02/04/host-python-packages-jekyll-github-pages.html b/2024/02/04/host-python-packages-jekyll-github-pages.html new file mode 100644 index 000000000..3aff72608 --- /dev/null +++ b/2024/02/04/host-python-packages-jekyll-github-pages.html @@ -0,0 +1,154 @@ + + + + + + + + +Host Python packages with Jekyll on GitHub Pages | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Host Python packages with Jekyll on GitHub Pages

+ dev/python + +
+ +
+

I have a collection of Python packages, either hosted locally or on GitHub, dedicated to my own use, and I don’t feel like uploading them to PyPI. +However, it soon becomes a pain when I need to install one of them to my virtual environment, since they all scatter about my disk, and I have to cd to the corresponding project directory and do pip install .. +It would be preferable to stay in my current repository, and do pip install .... +If the package is already hosted on GitHub, like alfred_fzf_helper, I may do pip install git+https://github.com/kkew3/alfred_fzf_helper.git directly. +This is not good enough, since I still need to memorize the URL, and it’s not convenient, if not impossible, to specify the version requirements.

+ +

Luckily, hosting a private Python package repository is possible, and freely available with Jekyll and GitHub Pages. +Following this guide, after making a directory pip under the root of my site, I put my Python source distribution tarballs into it. +After some googling, I find that Jekyll does not support autoindexing out-of-the-box. +If I push the tarballs onto GitHub, pip won’t be able to find the source distributions.

+ +

I will exploit the --find-links option of pip install instead. +What we need, then, is simply an HTML page that lists all the URLs to the tarballs hosted. +With simple Liquid, I loop over all static files under pip directory and list them in an unordered list:

+ +
---
+layout: default
+---
+
+<h1>Index of {{ page.path }}</h1>
+<ul>
+  {% assign pip_packages = site.static_files | where: "pip_package", true %}
+  {% for item in pip_packages %}
+    <li><a href="{{ site.baseurl }}{{ item.path }}">{{ item.path }}</a></li>
+  {% endfor %}
+</ul>
+
+ +

where pip_package is defined in _config.yml like this (see here for more details):

+ +
defaults:
+  - scope:
+      path: "pip"
+    values:
+      pip_package: true
+
+ +

Finally, I insert the following lines to ~/.config/pip/pip.conf:

+ +
[install]
+find-links = https://kkew3.github.io/pip
+
+ +

To check if it works, create a virtual environment (omitted below) and install one of the hosted package:

+ +
pip install "alfred-fzf-helper>=0.2"
+
+ +

It works!

+ +
+ +
+ +
+
+ + + diff --git a/2024/02/11/quad-approximate-sigmoid-derivative.html b/2024/02/11/quad-approximate-sigmoid-derivative.html new file mode 100644 index 000000000..906698498 --- /dev/null +++ b/2024/02/11/quad-approximate-sigmoid-derivative.html @@ -0,0 +1,128 @@ + + + + + + + + +Piecewise quadratic approximation of sigmoid(z) (1-sigmoid(z)) | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Piecewise quadratic approximation of sigmoid(z) (1-sigmoid(z))

+ math/approximation + +
+ +
+

This post shows an approach that approximates $\sigma(z)(1-\sigma(z))$ using piecewise quadratic function, where $\sigma(z)$ is defined to be $1/(1+\exp(-z))$, a.k.a. the sigmoid function.

+ +

First, notice that $\sigma(z)(1-\sigma(z)) \approx \log(1+\exp(h - a z^2))$ for certain choice of $h$ and $a$:

+ +

softplus approximate dsigma

+ +

Second, the approximator $\log(1+\exp(\cdot))$ is called a softplus. +So it’s natural to proceed: $\log(1+\exp(h - a z^2)) \approx \max(0, h - a z^2)$. +Our goal, then, is to choose the height parameter $h$ and width parameter $a$ such that $\sigma(z)(1-\sigma(z)) \approx \max(0, h - a z^2)$.

+ +

The height parameter is straightforward to estimate. +We need only to match the max of $\sigma(z)(1-\sigma(z))$ to $h$. +Hence, $h := \sigma(0)(1-\sigma(0))$.

+ +

Noticing that both the original function and the approximator are nonnegative, we may match up their integrals:

+ +\[\int_{-\infty}^\infty \sigma(z)(1-\sigma(z))\,\mathrm d z = \int_{-\infty}^\infty \max(0, h - a z^2)\,\mathrm d z\] + +

where the left hand side is 1. +Plugging in the value of $h$, this equation solves to $a := \frac{16}{9}(\sigma(0)(1-\sigma(0)))^3$.

+ +

max quad approximate dsigma

+ +
+ +
+ +
+
+ + + diff --git a/2024/02/26/simple-numerical-matrix-inversion.html b/2024/02/26/simple-numerical-matrix-inversion.html new file mode 100644 index 000000000..d99d440ad --- /dev/null +++ b/2024/02/26/simple-numerical-matrix-inversion.html @@ -0,0 +1,149 @@ + + + + + + + + +A simple numerical method to compute matrix inversion | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

A simple numerical method to compute matrix inversion

+ math/linear algebra + +
+ +
+

I need to do matrix inversion in C recently; so I did some research on how to implement it. +While the requirement later proves unnecessary, I want to jot down my efforts on this subject for future reference.

+ +

(Pan & Schreiber, 1992) proposed CUINV algorithm based on Newton’s iteration. +It’s fast and simple to implement. +Here’s my verbatim reimplementation in Python, which is simple(?) (see TODO in comment) to translate to C.

+ +
import numpy as np
+
+def cuinv(A, maxiter, tol):
+    n = A.shape[0]
+    I = np.eye(n)
+    s = np.linalg.svd(A, compute_uv=False)  # TODO: how to implement this?
+    a0 = 2 / (np.min(s)**2 + np.max(s)**2)
+    X = a0 * A.T
+    X_prev = np.copy(X)
+    T = X @ A
+    T2 = None
+    t2_valid = False
+    diff = tol + 1  # so that it runs at least one iteration
+
+    for _ in range(maxiter):
+        if diff < tol:
+            break
+        X = (2 * I - T) @ X
+        if t2_valid:
+            T = 2 * T - T2
+        else:
+            T = X @ A
+        t2_valid = False
+        if np.trace(T) < n - 0.5:
+            T2 = T @ T
+            delta = np.linalg.norm(T - T2, ord='fro')
+            if delta >= 0.25:
+                t2_valid = True
+            else:
+                rho = 0.5 - np.sqrt(0.25 - delta)
+                X = 1 / rho * (T2 - (2 + rho) * T + (1 + 2 * rho) * I) @ X
+                T = X @ A
+        diff = np.linalg.norm(X - X_prev, ord='fro')
+        X_prev = X
+    return X
+
+ + +
+ +
+ +
+
+ + + diff --git a/2024/05/17/learn-bayesian-lr-from-imbalanced-data.html b/2024/05/17/learn-bayesian-lr-from-imbalanced-data.html new file mode 100644 index 000000000..437fb2acb --- /dev/null +++ b/2024/05/17/learn-bayesian-lr-from-imbalanced-data.html @@ -0,0 +1,419 @@ + + + + + + + + +Learn Bayesian Logistic regression from imbalanced data | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Learn Bayesian Logistic regression from imbalanced data

+ machine learning/bayesian + +
+ +
+

Dataset

+ +

toy 2d dataset

+ +

Obviously, this is an imbalanced dataset. +A dumb classifier may assign “yellow” to all points and yield apparently satisfactory accuracy.

+ +

Bayesian Logistic regression

+ +

Denote the $k$-th component of the softmax of $\boldsymbol z$ as:

+ +\[\mathcal S_k(\boldsymbol z) \triangleq \frac{\exp(z_k)}{\sum_{k'}\exp(z_{k'})}\,.\] + +

The likelihood is:

+ +\[p(y=k \mid \boldsymbol x, \mathbf W, \boldsymbol b) = \mathcal S_k(\mathbf W \boldsymbol x + \boldsymbol b)\,,\] + +

where matrix $\mathbf W$ consists of $K$ weight vector $\boldsymbol w_k \in \mathbb R^d$, $\boldsymbol x \in \mathbb R^d$, and $\boldsymbol b \in \mathbb R^K$.

+ +

For now, assign an uninformative Gaussian prior:

+ +\[\forall k,\ \boldsymbol w_k \sim \mathcal N(0, \mathbf I)\,,\quad b_k \sim \mathcal N(0, 1)\,. +\tag{1}\] + +

The posterior (given the dataset $\mathcal D$) is:

+ +\[p(\mathbf W, \boldsymbol b \mid \mathcal D) \propto \prod_{k=1}^K p(\boldsymbol w_k) p(b_k) \prod_{j=1}^m p(y_j \mid \boldsymbol x_j, \mathbf W, \boldsymbol b)\,. +\tag{2.1}\] + +

The predictive posterior is:

+ +\[p(y \mid \boldsymbol x, \mathcal D) = \int p(y \mid \boldsymbol x, \mathbf W, \boldsymbol b) p(\mathbf W, \boldsymbol b \mid \mathcal D)\,\mathrm d \mathbf W \mathrm d \boldsymbol b\,. +\tag{2.2}\] + +

Although both (2.1) and (2.2) are intractable, we may find $q(\mathbf W, \boldsymbol b) \approx p(\mathbf W, \boldsymbol b \mid \mathcal D)$ by variational inference, and estimate the predictive posterior by Monte Carlo after plugging in $q$. +Since such procedure is out of scope, we won’t include details about it.

+ +

Let’s see the decision boundary and the uncertainty (measured by entropy) of the Bayesian LR:

+ +

uninformative decision boundary

+ +

uninformative uncertainty

+ +

The model learns to be a dumb classifier!

+ +

We may apply rescaling (a.k.a. threshold shifting) to the learned classifier, by dividing the predictive posterior by the class prior (i.e. the proportion of samples of class $k$ in all samples), and use it to make prediction. +The rescaled decision boundary and uncertainty are:

+ +

uninformative rescaled decision boundary

+ +

uninformative rescaled uncertainty

+ +

This benefits the minority class, but deteriorates the overall accuracy a lot.

+ +

Strengthen the prior

+ +

It turns out that if we strengthen the prior (by increasing its precision, or equivalently, decreasing its variance) of the intercepts in (1), things become much better. +The new prior is:

+ +\[\forall k,\ b_k \sim \mathcal N(0, 10^{-6})\,. +\tag{3}\] + +

What we just encode into the prior reads:

+ +
+

I’m pretty sure that the two class weigh the same, despite the “purple” class appears inferior.

+
+ +

The result plots are:

+ +

precise uninformative decision boundary

+ +

precise uninformative uncertainty

+ +

Bias the prior

+ +

What if we go further by biasing the classifier a little towards the minority class ($k=0$, “purple”)? +The new prior is:

+ +\[b_0 \sim \mathcal N(2, 10^{-6})\,,\quad b_1 \sim \mathcal N(0, 10^{-6})\,. +\tag{4}\] + +

This prior reads:

+ +
+

I’m pretty sure there’re even a bit more “purple” class than “yellow” class a priori, despite they’re not sampled as much in the dataset.

+
+ +

The plots are now:

+ +

precise biased decision boundary

+ +

precise biased uncertainty

+ +

Pefect!

+ +

Conclusion

+ +

In this post, we see that under Bayesian framework, Bayesian LR is able to naturally combat imbalanced dataset by adjusting its prior belief.

+ +

This codebase generates all the figures in the post.

+ +

Appendix

+ +

Features and labels of the toy dataset.

+ +

The features:

+ +
array([[-0.46601866,  1.18801609],
+       [ 0.53858625,  0.60716392],
+       [-0.97431137,  0.69753311],
+       [-1.09220402,  0.87799492],
+       [-2.03843356,  0.28665154],
+       [-0.34062009,  0.79352777],
+       [-1.16225216,  0.79350459],
+       [ 0.19419328,  1.60986703],
+       [ 0.41018415,  1.54828838],
+       [-0.61113336,  0.99020048],
+       [ 0.08837677,  0.95373644],
+       [-1.77183232, -0.12717568],
+       [-0.54560628,  1.07613052],
+       [-1.69901425,  0.55489764],
+       [-0.7449788 ,  0.7519103 ],
+       [-1.84473763,  0.55248995],
+       [-0.50824943,  1.08964891],
+       [-1.35655196,  0.7102918 ],
+       [-0.71295569,  0.38030989],
+       [ 0.0582823 ,  1.35158484],
+       [-2.74743505, -0.18849513],
+       [-2.36125827, -0.22542297],
+       [ 0.28512568,  1.52124326],
+       [-0.67059538,  0.61188467],
+       [-1.08310962,  0.57068698],
+       [-1.59421684,  0.32055693],
+       [-0.58608561,  0.98441983],
+       [ 0.91449962,  1.74231742],
+       [-1.78271812,  0.25676529],
+       [-0.30880495,  0.98633121],
+       [-0.80196522,  0.56542478],
+       [-1.64551419,  0.2527351 ],
+       [ 0.88404065,  1.80009243],
+       [ 0.07752252,  1.19103008],
+       [ 0.01499115,  1.35642701],
+       [-1.37772455,  0.58176578],
+       [-0.9893581 ,  0.6000557 ],
+       [-0.20708577,  0.97773425],
+       [-0.97487675,  0.67788572],
+       [-0.84898247,  0.76214066],
+       [-2.87107864,  0.01823837],
+       [-1.52762479,  0.15224236],
+       [-1.19066619,  0.61716677],
+       [-0.78719074,  1.22733157],
+       [ 0.37887222,  1.38907542],
+       [-0.29892079,  1.20534091],
+       [-1.21904812,  0.45126808],
+       [-0.01954643,  1.00443244],
+       [-2.7534539 , -0.41174779],
+       [ 0.00290918,  1.19376387],
+       [-0.3465645 ,  0.97372693],
+       [-0.38706669,  0.98612011],
+       [-0.3909804 ,  1.1737113 ],
+       [ 0.67985963,  1.57038317],
+       [-1.5574845 ,  0.38938231],
+       [-0.70276487,  0.84873314],
+       [-0.77152456,  1.24328845],
+       [-0.78685252,  0.71866813],
+       [-1.58251503,  0.47314274],
+       [-0.86990291,  1.01246542],
+       [-0.76296641,  1.03057172],
+       [-1.46908977,  0.50048994],
+       [ 0.41590518,  1.35808005],
+       [-0.23171796,  0.97466644],
+       [-0.35599838,  1.05651836],
+       [-1.86300113,  0.31105633],
+       [-1.06979785,  0.89343042],
+       [ 0.89051152,  1.36968058],
+       [-1.64250124,  0.5395521 ],
+       [ 0.19072792,  1.39594182],
+       [-0.68980859,  1.51412568],
+       [-0.66216014,  0.94064958],
+       [-1.98324693,  0.36500688],
+       [-1.77543305,  0.48759471],
+       [ 0.99143992,  1.53242166],
+       [-2.03402523,  0.27661546],
+       [-0.98138839,  0.86047666],
+       [ 0.86594322,  1.60352598],
+       [-1.25510995,  0.40788484],
+       [-1.28207069,  0.55164356],
+       [-0.50983219,  1.05505834],
+       [ 0.98003606,  0.56171673],
+       [-1.86097117,  0.44004685],
+       [-1.09945843,  0.63380337],
+       [-1.44294885,  0.18391039],
+       [-1.60512757,  0.25456073],
+       [ 0.5505329 ,  1.63447114],
+       [-1.13622159,  0.87658095],
+       [-0.18029101,  0.98458234],
+       [-1.48031015,  0.3667454 ],
+       [ 0.94295697,  1.51965296],
+       [-1.94413955,  0.257857  ],
+       [-1.92812486, -0.15406208],
+       [-0.28437139,  0.8520255 ],
+       [-0.95551392,  0.28517945],
+       [-1.44252631,  0.5455637 ],
+       [-0.22064889,  1.33439538],
+       [-1.52749019,  0.50443876],
+       [ 0.757785  ,  0.42124458],
+       [-0.49536512,  0.9627005 ]])
+
+ +

The labels:

+ +
array([1,
+       0,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       0,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       0,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       1,
+       0,
+       1,
+       1,
+       1,
+       1,
+       0,
+       1,
+       1,
+       0,
+       1])
+
+ +
+ +
+ +
+
+ + + diff --git a/2024/06/13/leverage-ollama-in-iterm2-ai-integration.html b/2024/06/13/leverage-ollama-in-iterm2-ai-integration.html new file mode 100644 index 000000000..481ba8fb9 --- /dev/null +++ b/2024/06/13/leverage-ollama-in-iterm2-ai-integration.html @@ -0,0 +1,154 @@ + + + + + + + + +Leverage Ollama in iTerm2 AI integration | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Leverage Ollama in iTerm2 AI integration

+ os/macOS + +
+ +
+

Introduction

+ +

Recently, iTerm2 released version 3.5.0, which includes generative AI integration in OpenAI API. +Ollama is an open platform for large language models (LLM). +Starting from February 2024, Ollama has built-in support of OpenAI chat completions API. +Putting them together, we can now ask AI to compose commands for us seamlessly in iTerm2 interface, using Ollama bot locally.

+ +

Configuration

+ +

Here are the steps to start using the AI integration in iTerm2:

+ +
    +
  1. Install the AI plugin from iTerm2 site.
  2. +
  3. In iTerm2 preferences, under General section and AI tab, enter “OpenAI API key” with anything non-empty, fill in the AI prompt, specify the model and the custom URL.
  4. +
+ +

For example, mine is like below:

+ +
    +
  • OpenAI API key: abc +
  • +
  • AI prompt: Return commands suitable for copy/pasting into \(shell) on \(uname). Do NOT include commentary NOR Markdown triple-backtick code blocks as your whole response will be copied into my terminal automatically. If not otherwise specified, you should always give at most one line of command. The command should do this: \(ai.prompt).
  • +
  • Model: codegemma:instruct.
  • +
  • Token limit: 16384.
  • +
  • Custom URL: http://localhost/v1/chat/completions.
  • +
  • Use legacy “completions” API: false.
  • +
+ +

Remarks:

+ +
    +
  • If your Ollama runs on a server in WLAN, e.g. at IP address 192.168.0.107, just replace the localhost in custom URL with that IP address.
  • +
  • Don’t forget to start Ollama by ollama serve before using iTerm2’s AI integration.
  • +
+ +

Workflow

+ +

My favorite iTerm2 workflow after the configuration above:

+ +
    +
  1. Press command + shift + . to activate the composer.
  2. +
  3. Specify my need in plain English, and press command + y to send the input text to Ollama.
  4. +
  5. After a few seconds, the text should be replaced by Ollama’s response.
  6. +
  7. Press shift + enter to send the response to the terminal.
  8. +
+ +

A demo:

+ +

demo

+ +
+ +
+ +
+
+ + + diff --git a/2024/07/06/compute-accuracy-from-f1-score.html b/2024/07/06/compute-accuracy-from-f1-score.html new file mode 100644 index 000000000..22a73430c --- /dev/null +++ b/2024/07/06/compute-accuracy-from-f1-score.html @@ -0,0 +1,237 @@ + + + + + + + + +Compute accuracy from F1 score | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Compute accuracy from F1 score

+ machine learning + +
+ +
+

I encountered a similar problem today as the one in this post, where I wish to find the accuracy given F1 score only. +F1 score is well suited to my imbalanced classification problem, so I compute it during training; but I then find it difficult to interprete. +There’s a surprising lack of relevant information when I searched the web. +Luckily, it’s not a difficult task either.

+ +

Since each F1 score corresponds to a range of accuracies, we may regard finding the accuracy given F1 score an optimization problem. +The process consists of two steps: 1) find the minimum accuracy; 2) find the maximum accuracy. To find the maximum, we may reduce it to finding the negative of the minimum of the negative accuracy. +Thus we will only handle step 1 below.

+ +

Known constants:

+ +
    +
  • $s_F$: the F1 score.
  • +
  • $r_P$ and $r_N$: the positive and negative class ratio.
  • +
+ +

Variables:

+ +
    +
  • $r_{TP}$, $r_{TN}$, $r_{FP}$, $r_{FN}$: the true positive, true negative, false positive and false negative ratio (i.e. divided by the total sample count).
  • +
+ +

Objective: +$s_A = r_{TP} + r_{TN}$.

+ +

Constraints:

+ +
    +
  • $r_{TP} \ge 0$, $r_{TN} \ge 0$, $r_{FP} \ge 0$, $r_{FN} \ge 0$.
  • +
  • $r_{TP} + r_{FN} = r_P$, $r_{TN} + r_{FP} = r_N$.
  • +
  • $\frac{2 \cdot r_{TP} / (r_{TP} + r_{FP}) \cdot r_{TP} / (r_{TP} + r_{FN})}{r_{TP} / (r_{TP} + r_{FP}) + r_{TP} / (r_{TP} + r_{FN})} = s_F$. The left hand side is just the F1 score formula.
  • +
+ +

Python implementation:

+ +
# jax is not necessary, just that I don't want to spend time on finding
+# partial derivative of the F1 score with respect to true positive,
+# etc.
+import jax
+import numpy as np
+from scipy.special import softmax
+from scipy.optimize import minimize
+
+# Used to avoid divid-by-zero error.
+EPS = 1e-8
+
+def f1_score_constraint(x, f1_score):
+    """
+    :param x: the array (tp, fp, tn, fn)
+    :param f1_score: the known F1 score
+    """
+    tp, fp, fn = x[0], x[2], x[3]
+    precision = tp / (tp + fp)
+    recall = tp / (tp + fn)
+    return 2 * (precision * recall) / (precision + recall) - f1_score
+
+
+def positive_sum_constraint(x, n_positive):
+    """
+    :param x: the array (tp, fp, tn, fn)
+    :param n_positive: the known positive class ratio
+    """
+    tp, fn = x[0], x[3]
+    return tp + fn - n_positive
+
+
+def negative_sum_constraint(x, n_negative):
+    """
+    :param x: the array (tp, fp, tn, fn)
+    :param n_negative: the known negative class ratio
+    """
+    tn, fp = x[1], x[2]
+    return tn + fp - n_negative
+
+
+def accuracy(x):
+    """
+    :param x: the array (tp, fp, tn, fn)
+    """
+    tp, tn = x[0], x[1]
+    return tp + tn
+
+
+# Ideally this should give a feasible solution. But in practice, I
+# find it works fine even if it's not feasible.
+def rand_init():
+    return softmax(np.random.randn(4))
+
+
+def find_min_accuracy_from_f1(f1_score, n_positive, n_negative):
+    """
+    :param f1_score: the known F1 socre
+    :param n_positive: the known positive class ratio
+    :param n_negative: the known negative class ratio
+    """
+    res = minimize(
+        accuracy,
+        rand_init(),
+        method='SLSQP',
+        jac=jax.grad(accuracy),
+        bounds=[(EPS, None), (EPS, None), (EPS, None), (EPS, None)],
+        constraints=[
+            {
+                'type': 'eq',
+                'fun': f1_score_constraint,
+                'jax': jax.grad(f1_score_constraint),
+                'args': (f1_score,),
+            },
+            {
+                'type': 'eq',
+                'fun': positive_sum_constraint,
+                'jac': jax.grad(positive_sum_constraint),
+                'args': (n_positive,),
+            },
+            {
+                'type': 'eq',
+                'fun': negative_sum_constraint,
+                'jac': jax.grad(negative_sum_constraint),
+                'args': (n_negative,),
+            },
+        ],
+        options={'maxiter': 1000},
+    )
+    return res.fun
+
+ +

Calling the function find_min_accuracy_from_f1 with data, we get the minimum possible accuracy given F1 score:

+ +
>>> find_min_accuracy_from_f1(0.457, 0.044, 0.9559)
+0.8953
+
+ +
+ +
+ +
+
+ + + diff --git a/2024/08/09/gamma-in-bn-vae.html b/2024/08/09/gamma-in-bn-vae.html new file mode 100644 index 000000000..8384cf0bf --- /dev/null +++ b/2024/08/09/gamma-in-bn-vae.html @@ -0,0 +1,139 @@ + + + + + + + + +Effect of gamma in BN-VAE | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Effect of gamma in BN-VAE

+ machine learning/bayesian + +
+ +
+

Abstract

+ +

This post discusses the effect of $\gamma$ in BN-VAE (Zhu et al., 2020).

+ +

Introduction

+ +

BN-VAE (see more about it here (in Chinese)) attempts to solve KL vanishing problem (a.k.a. posterior collapse) in Gaussian-VAE by batch-normalizing the variational posterior mean, which casts a positive lower bound on the Kullback-Leibler divergence term (over the dataset) in ELBO, thus avoiding KL vanishing problem. +The batch normalization procedure includes a fixed hyperparameter $\gamma \ge 0$, which controls the lower bound of the KL; the larger $\gamma$, the larger the lower bound. +When $\gamma=0$, KL vanishing occurs.

+ +

Zhu et al. (2020) visualizes the distribution of the variational posterior mean when $\gamma$ equals 0.3 and 1. +What will happen if $\gamma > 1$? +How does $\gamma > 0$ solves the KL vanishing problem? +We’ll explore these questions below.

+ +

$\gamma>1$ introduces posterior hole problem

+ +

Posterior hole problem happens when the aggregate variational posterior (a.k.a. average encoder distribution (Hoffman & Johnson, 2016)) does not match the prior. +When measured in KL divergence, this means:

+ +\[D_{KL}(q_\phi(z) \parallel p(z)) > 0\] + +

Here, $q_\phi(z) = \sum_{i=1}^N \frac{1}{N} q_\phi(z \mid x_i)$ where $N$ is the dataset size, is the aggregate variational posterior.

+ +

In Gaussian-VAE, the variational posterior $q_\phi(z \mid x_i) = \mathcal N(z \mid \mu_i, \sigma_i^2)$, where $(\mu_i,\sigma_i^2)$ are typically computed by a neural network called the inference network (Kingma & Welling, 2013) parameterized by $\phi$ given $x_i$; and $q_\phi(z \mid x_i)$ can usually be factorized into each dimension $j$ as $q_\phi(z \mid x_i) = \prod_{j=1}^d q_\phi(z_j \mid x_i)$, where each $q_\phi(z_j \mid x_i)$ is an univariate Gaussian parameterized by $(\mu_{ij}, \sigma_{ij}^2)$. +Thus, the aggregate variational posterior is an $N$-mixture of Gaussians whose mean, at each dimension $j$, is $\bar\mu_j = \frac{1}{N}\sum_{i=1}^N \mu_{ij}$ and variance is $\bar\sigma_j^2 = \frac{1}{N}\sum_{i=1}^N \sigma_{ij}^2$.

+ +

If $q_\phi$ is transformed according to BN-VAE, then $\bar\mu_j = \beta$ where $\beta$ is a learnable parameter. +Furthermore, we have variance $\mathbb E_{q_\phi(z_j)}[z_j^2] - \mathbb E_{q_\phi(z_j)}[z_j]^2 = \gamma^2 + \bar\sigma^2$. +If we follow Zhu et al. (2020) to use a standard Gaussian $\mathcal N(z \mid \mathbf 0, \mathbf I)$ as prior $p$, then according to this post, $D_{KL}(q_\phi(z) \parallel p(z)$, at each dimension $j$, will be lower bounded by $D_{KL}(q_0(z_j) \parallel p(z_j))$ where $q_0(z_j) = \mathcal N(z_j \mid \beta, \gamma^2 + \bar\sigma^2)$, which is consistently greater than zero when $\gamma > 1$ (Razavi et al., 2019). +It follows immediately (Soch, Joram, et al., 2024), that $D_{KL}(q_\phi(z) \parallel p(z)) \ge \sum_{j=1}^d D_{KL}(q_0(z_i) \parallel p(z_i)) > 0$.

+ +

TO BE CONTINUED

+ +
+ +
+ +
+
+ + + diff --git a/2024/08/09/lower-bound-of-kl-divergence-between-any-density-and-gaussian.html b/2024/08/09/lower-bound-of-kl-divergence-between-any-density-and-gaussian.html new file mode 100644 index 000000000..23770fe45 --- /dev/null +++ b/2024/08/09/lower-bound-of-kl-divergence-between-any-density-and-gaussian.html @@ -0,0 +1,194 @@ + + + + + + + + +Lower bound of KL divergence between any density and Gaussian | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Lower bound of KL divergence between any density and Gaussian

+ math/probability + +
+ +
+

Abstract

+ +

In this post, I explain how to derive a lower bound of the Kullback-Leibler divergence between any density $q$, e.g. a Gaussian mixture, and a Gaussian $p$.

+ +

Framework

+ +

We may cast the problem finding the lower bound to a constrained minimization problem:

+ +
+ +\[\begin{aligned} + \min_{q'}\ &D_{KL}(q' \parallel p)\\ + \text{s.t. } &\int_{\mathcal X} q'(x)\,\mathrm dx = 1\\ + &\ldots \ \text{other constraints} +\end{aligned}\tag{1}\] + +

where $\mathcal X$ is the support of $q’$, and we’ll fill in “other constraints” with what we know about the density $q$, like its mean and variance. +The solution of Equation (1) will be the lower bound we’re seeking for.

+ +

The Lagrangian would be:

+ +\[L = \int_{\mathcal X} q'(x)\log \frac{q'(x)}{p(x)}\,\mathrm dx + \lambda_0 (\int_{\mathcal X} q'(x)\,\mathrm dx - 1) + \ldots \tag{2}\] + +

Taking the functional derivative of $L$ with respect to $q’$ and letting it equal zero yields:

+ +\[\begin{aligned} + 0 &= 1 + \log q'(x) - \log p(x) + \lambda_0 + \ldots\\ + \log q'(x) &= -\lambda_0 - 1 + \log p(x) + \ldots\\ + q'(x) &= \exp(-\lambda_0 -1 + \log p(x) + \ldots) +\end{aligned}\] + +

Finally, plugging $q’(x)$ back into the constraints and solve for the Lagrange multipliers $\lambda_0$, etc.

+ +

Example

+ +

In this simple example, we assume that $p(x) = \mathcal N(x \mid 0, 1)$ be a standard univariate Gaussian, and assume that $q$ and $p$ have the same support. +Suppose also that we know the mean and variance of $q$ to be: $\mathbb E_q[x] = 0$, $\mathbb E_q[x^2] - \mathbb E_q[x]^2 = \mathbb E_q[x^2] = \sigma^2$.

+ +

The Lagrangian is:

+ +
+ +\[\require{enclose} +L = \int_{-\infty}^\infty q'(x) \log \frac{q'(x)}{p(x)}\,\mathrm dx + \lambda_0 (\underbrace{\int_{-\infty}^\infty q'(x)\,\mathrm dx - 1}_{\substack{\enclose{circle}{1}}}) + \lambda_1 (\underbrace{\int_{-\infty}^\infty x^2 q'(x)\,\mathrm dx - \sigma^2}_{\substack{\enclose{circle}{2}}})\tag{3}\] + +

where we have encoded the mean and variance constraints into one term (see why here). +Taking the derivative and letting it equal zero yields:

+ +
+ +\[\begin{align} + 0 &= 1 + \log q'(x) - \log p(x) + \lambda_0 + \lambda_1 x^2\\ + \log q'(x) &\stackrel{1}{=} -\lambda_0 - 1 - (\frac{1}{2} + \lambda_1) x^2\\ + q'(x) &= \exp(-\lambda_0 - 1 - (\frac{1}{2} + \lambda_1) x^2)\tag{4}\\ +\end{align}\] + +

where equal sign ‘$1$’ is because $\log p(x) = -\frac{1}{2}x^2 + C$, and the constant $C$ has been absorbed into $\lambda_0$.

+ +

Plugging Equation (4) back to and solving the integral yields:

+ +
+ +\[\frac{\sqrt{\pi}\exp(-\lambda_0 - 1)}{\sqrt{\frac{1}{2} + \lambda_1}} = 1\tag{5.1}\] + +

Likewise, plugging (4) back to and solving the integral yields:

+ +
+ +\[\frac{\sqrt{\pi} \exp(-\lambda_0 - 1)}{2\sqrt{(\frac{1}{2} + \lambda_1)^3}} = \sigma^2\tag{5.2}\] + +

Solving Equations (5.1, 5.2) gives:

+ +
+ +\[\begin{cases} + \lambda_0 = -1 + \frac{1}{2} \log 2\pi\sigma^2\\ + \lambda_1 = -\frac{1}{2} + \frac{1}{2\sigma^2}\\ +\end{cases}\tag{6}\] + +

Plugging Equation (6) to Equation (4), it’s immediate that

+ +\[q'(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{x^2}{2\sigma^2})\] + +

i.e., a Gaussian $\mathcal N(x \mid 0, \sigma^2)$. +Therefore, according to Soch, Joram, et al. (2024),

+ +\[D_{KL}(q \parallel p) \ge \frac{1}{2}(\sigma^2 - \log\sigma^2 - 1)\] + + +
+ +
+ +
+
+ + + diff --git a/2024/08/27/conditioning-of-cvae-variational-posterior.html b/2024/08/27/conditioning-of-cvae-variational-posterior.html new file mode 100644 index 000000000..d2f158aa5 --- /dev/null +++ b/2024/08/27/conditioning-of-cvae-variational-posterior.html @@ -0,0 +1,192 @@ + + + + + + + + +Conditioning of the variational posterior in CVAE (Sohn, 2015) | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Conditioning of the variational posterior in CVAE (Sohn, 2015)

+ math/probability | machine learning/bayesian + +
+ +
+

Abstract

+ +

This post explores alternative conditioning of the variational posterior $q$ in CVAE (Sohn et al. 2015), and concludes that the conditioning of $q$ on $y$ is important to predictive inference.

+ +

Introduction

+ +

CVAE models the likelihood $p(y \mid x)$ as a continuous mixture of latent $z$:

+ +
+ +\[p(y \mid x) = \int p_\theta(z \mid x) p_\theta(y \mid x,z)\,\mathrm dz\,. \tag{1}\] + +

Since (1) is intractable, Sohn et al. instead optimize its evidence lower bound (ELBO):

+ +\[\mathcal L_{\text{CVAE}}(x,y;\theta,\phi) = \mathbb E_q[\log p_\theta(y \mid x,z) - \log q_\phi(z \mid x,y)]\,. \tag{2}\] + +

Here, the variational posterior $q$ conditions on $x$ and $y$. +At test time, the authors propose to use importance sampling leveraging the trained variational posterior:

+ +
+ +\[p(y \mid x) \approx \frac{1}{S} \sum_{s=1}^S \frac{p_\theta(y \mid x,z_s) p_\theta(z_s \mid x)}{q_\phi(z_s \mid x,y)}\,, \tag{3}\] + +

where $z_s \sim q_\phi(z \mid x,y)$.

+ +

What if $q$ conditions on $x$ only? +This post explores this possibility, and reaches the conclusion that without conditioning on $y$, $q$ at optimum won’t ever attain the true posterior $p(z \mid x,y)$, and should not be otherwise better in terms of reducing the variance in importance sampling.

+ +

Warm up: proving the effecacy of importance sampling

+ +

We assume that infinite data is available for learning, and $q$ is from a flexible enough probability family. +The data are drawn from the joint data distribution $p_D(x,y)$, where we have stressed with a subscript $D$. +We assume that $x$ is continuous and $y$ is discrete. +The goal is to maximize the expected ELBO in terms of $p_D(x,y)$. +However, we assume that $p_\theta(y \mid x,z)$ won’t approaches to $p_D(y \mid x)$ whatever value $\theta$ picks. +We will drop $\theta$ and $\phi$ below for brevity.

+ +

We may easily pose this setup as a constrained maximization problem: +$\max \mathbb E[\log p(y,z \mid x) - \log q(z \mid x,y)]$ subject to $q$ being a probability, where the expectation is taken with respect to $p_D(x,y) q(z \mid x,y)$.

+ +

The Lagrangian is:

+ +
+ +\[\int \sum_y p_D(x,y) \int q(z \mid x,y) \log \frac{p(y,z \mid x)}{q(z \mid x,y)}\,\mathrm dz\,\mathrm dx + \int \sum_y \mu(x,y) \left(\int q(z \mid x,y)\,\mathrm dz - 1\right)\,\mathrm dx\,, \tag{4}\] + +

where $\mu(x,y)$ is the Lagrange multiplier. +Now find the Gateaux derivative and let it equal zero:

+ +\[0 = p_D(x,y) (\log p(y,z \mid x) - (1 + \log q(z \mid x,y)) + \mu(x,y))\,.\] + +

Absorbing $p_D(x,y) > 0$ and the constant 1 into $\mu(x,y)$ yields:

+ +\[\log q(z \mid x,y) = \mu(x,y) + \log p(y,z \mid x)\,,\] + +

where $\mu(x,y) = -\log \int p(y,z \mid x)\,\mathrm dz = -\log p_D(y \mid x)$. +It thus follows that, at optimum, $q(z \mid x,y) = p(z \mid x,y)$. +Hence, when evaluating Equation (3), at optimum, the right hand side equals the left hand side with zero variance.

+ +

Conditioning only on x gives worse approximation

+ +

Following the same setup as the previous section, we start from the Lagrangian (4). +Note that now we assume $q \triangleq q(z \mid x)$, and that the Lagrange multiplier is $\mu(x)$ instead of $\mu(x,y)$. +Rearranging the terms:

+ +\[\begin{multline} + \int p_D(x) \int q(z \mid x) \left(\sum_y p_D(y \mid x) \log p(y,z \mid x) - \log q(z \mid x)\right)\,\mathrm dz\,\mathrm dz \\ + + \int \mu(x) \left(\int q(z \mid x)\,\mathrm dz - 1\right)\,\mathrm dx\,. +\end{multline}\] + +

Let its Gateaux derivative with respect to $q$ equal zero:

+ +\[0 = p_D(x) \left(\sum_y p_D(y \mid x) \log p(y,z \mid x) - (1 + \log q(z \mid x))\right) + \mu(x)\,.\] + +

Absorbing $p_D(x) > 0$ and the constant 1 into $\mu(x)$ yields:

+ +\[\log q(z \mid x) = \mu(x) + \sum_y p_D(y \mid x) \log p(z \mid x,y) - \mathbb H(p_D(y \mid x))\,,\] + +

where $\mathbb H(p_D(y \mid x)) = -\sum_y p_D(y \mid x) \log p_D(y \mid x)$ is the entropy. +We see immediately that:

+ +
+ +\[q(z \mid x) \propto \exp(\mathbb E_{p_D(y \mid x)}[\log p(z \mid x,y)])\,. \tag{5}\] + +

This means that when not conditioning on $y$, $q(z \mid x)$ can never achieve the true posterior $p(z \mid x,y)$, unless $\mathbb H(p_D(y \mid x)) = 0$, which is unlikely to occur.

+ +
+ +
+ +
+
+ + + diff --git a/2024/11/10/jieba-vim.html b/2024/11/10/jieba-vim.html new file mode 100644 index 000000000..7556e7e7b --- /dev/null +++ b/2024/11/10/jieba-vim.html @@ -0,0 +1,166 @@ + + + + + + + + +vim/nvim 中文分词插件 jieba.vim | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

vim/nvim 中文分词插件 jieba.vim

+ editor/vim + +
+ +
+
+

jieba.vim,做最好的 Vim 中文分词插件。

+
+ +

jieba.vim 正处于积极开发中,欢迎关注、Star ⭐️、下载试用最新进展,欢迎在 issue 讨论并提意见建议。

+ +

Vim/Nvim 中文编辑现状

+ +

Vim/Nvim 的中文文本编辑体验不足是广为人知的问题。由于中文不用空格分隔词语,Vim 的原生 word motion 功能无法有效发挥作用,因此用户只能依赖基本的 h/l 键位逐字移动光标。作为一名常用 Vim 写作的人,我对这方面的不便深有体会,经过调研确定没有已存在的完善项目后,我于一年多前开始在业余时间开发 jieba.vim,旨在解决这个问题。

+ +

jieba.vim 简介

+ +

jieba.vim 是一个基于 jieba 的 Vim/Nvim 中文分词插件,通过增强 w/W/e/E/b/B/ge/gE 键位,使 Vim 能够按中文词语移动光标。jieba.vim 的初版原型使用 python3 开发,初步解决了按词移动光标的问题。虽然(可能)由于其过于缓慢的词典加载速度并没有获得什么关注,但是基于该原型的 lua 移植版 neo451/jieba.nvim 在 Vim 中文编辑圈的小为人知还是证明了 jieba.vim 解决方案的有效性,给了我继续改进的动力。随后,我发布了 jieba.vim 的改进版,解决了其速度问题,并于近一个月开始重写 jieba.vim 的核心逻辑,更严谨地确立其发展方向。

+ +

jieba.vim 对自身的定位及路线图

+ +

Features 依重要性排序:

+ +
    +
  1. jieba.vim 应保持其与 Vim 的兼容性。经过增强的 w/W/e/E/b/B/ge/gE 键位在无中文 ASCII 文本上应与 Vim 原生行为完全一致,其中包括各种特殊情况(例如 cw)。这需要通过大量测试(单元测试与 property-based tests),以及对单元测试的正确性验证(见 junegunn/vader.vim)予以保证。
  2. +
  3. jieba.vim 应具有较快的速度(包括词典加载速度、键位反应速度、词典懒加载),以确保良好的用户体验。
  4. +
  5. jieba.vim 应易于安装。目前 jieba.vim 的改进版需要本地安装 Rust 并进行编译。未来将通过发布预编译链接库避免该情况;同时针对 neovim 将尝试直接从 lua 链接 Rust 核心 crate,从而免除 lua 调用 python、python 再调用 Rust 的额外调用关系。
  6. +
  7. jieba.vim 应具有较广的功能覆盖,例如在 normal/visual/operator-pending 模式下对 {count}w/W/e/E/b/B/ge/gE 键位的映射支持、以及对自定义 Vim 选项 'iskeyword''virtualedit' 的支持。此外,对 word text object iw/iW/aw/aW 的支持也处于计划内。
  8. +
+ +

jieba.vim 分支一览

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
分支简介开发语言许可
+初版原型 (main)normal 模式下按词移动光标完善,其余模式下有 bug。python3MIT
+改进版 (rust)normal 模式下按词移动光标完善,其余模式下有 bug,词典加载速度快 60%。Rust + python3MIT
+最新进展 (dev/rust)目前已完成 normal/visual/operator-pending 模式下的 {count}w/W/e/E/b/B/ge/gE 键位,未发现 bug,欢迎下载试用。Rust + python3Apache v2.0
+ + +
+ +
+ +
+
+ + + diff --git a/2024/12/09/gaussian-kl-div-torch.html b/2024/12/09/gaussian-kl-div-torch.html new file mode 100644 index 000000000..68644965f --- /dev/null +++ b/2024/12/09/gaussian-kl-div-torch.html @@ -0,0 +1,232 @@ + + + + + + + + +KL divergence between two full-rank Gaussians in PyTorch | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

KL divergence between two full-rank Gaussians in PyTorch

+ math/probability | dev/pytorch + +
+ +
+

Abstract

+ +

In this post, we will go through the PyTorch code to compute the Kullback-Leibler divergence between two full-rank Gaussians. +The code might be useful if one considers using full-rank Gaussian as variational posterior while training a variational autoencoder.

+ +

KL divergence between two full-rank Gaussians

+ +

It’s common practice to parameterize the covariance matrix $\boldsymbol\Sigma$ of a $d$-dimensional full-rank Gaussian using a $D$-dimensional vector of nonzero elements of $\mathbf L$, where $D = d(1+d)/2$ and $\boldsymbol\Sigma = \mathbf L \mathbf L^\top$ is the Cholesky decomposition. +So we will assume it here. +Note that the diagonal of $\mathbf L$ must be positive so that $\boldsymbol\Sigma$ is positive definite. +We will enforce this by taking the exponential on the diagonal elements (e.g. the first $d$ elements of our parameterization).

+ +

Let the two Gaussians be $p(\boldsymbol x) = \mathcal N(\boldsymbol x \mid \boldsymbol\mu_1, \boldsymbol\Sigma_1)$ and $q(\boldsymbol x) = \mathcal N(\boldsymbol x \mid \boldsymbol\mu_2, \boldsymbol\Sigma_2)$. +Per The Book of Statistical Proofs, the KL divergence between them is:

+ +\[D_\mathrm{KL}(p \parallel q) = \frac{1}{2}\left((\boldsymbol\mu_2 - \boldsymbol\mu_1)^\top \boldsymbol\Sigma_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1) + \operatorname{tr}(\boldsymbol\Sigma_2^{-1} \boldsymbol\Sigma_1) - \log \frac{\det \boldsymbol\Sigma_1}{\det \boldsymbol\Sigma_2} - d\right)\,.\] + +

Plugging in our parameterization of the covariance matrices:

+ +\[\begin{aligned} + D_\mathrm{KL}(p \parallel q) + &= \frac{1}{2}\left((\boldsymbol\mu_2 - \boldsymbol\mu_1)^\top \mathbf L_2^{-\top} \mathbf L_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1) + \operatorname{tr}((\mathbf L_2 \mathbf L_2^\top)^{-1} (\mathbf L_1 \mathbf L_1^\top)) - \log \frac{\det(\mathbf L_1 \mathbf L_1^\top)}{\det(\mathbf L_2 \mathbf L_2^\top)} - d\right)\\ + &= \frac{1}{2}\left((\mathbf L_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1))^\top (\mathbf L_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1)) + \operatorname{tr}((\mathbf L_2^{-1} \mathbf L_1)^\top (\mathbf L_2^{-1} \mathbf L_1)) - 2\log\frac{\det\mathbf L_1}{\det\mathbf L_2} - d\right)\,.\\ +\end{aligned}\] + +

We have used the following facts:

+ +
    +
  • the cyclic property of trace;
  • +
  • $\det \mathbf A = \det \mathbf A^\top$;
  • +
  • $\log\det (\mathbf A \mathbf B) = \log\det(\mathbf A) + \log\det(\mathbf B)$.
  • +
+ +

It follows that:

+ +\[D_\mathrm{KL}(p \parallel q) = \frac{1}{2}\big(\boldsymbol y^\top \boldsymbol y + \|\mathbf M\|_F^2 - 2 (\operatorname{tr}(\log \mathbf L_1) - \operatorname{tr}(\log \mathbf L_2)) - d\big)\,,\] + +

where $\mathbf L_2 \boldsymbol y = \boldsymbol\mu_2 - \boldsymbol\mu_1$, and $\mathbf L_2 \mathbf M = \mathbf L_1$.

+ +

We have denoted:

+ +
    +
  • $\|\cdot\|_F$ as the Frobenius norm of a matrix;
  • +
  • $\log \mathbf A$ as the elementwise logarithm of $\mathbf A$.
  • +
+ +

We have used the following facts:

+ +
    +
  • $\operatorname{tr}(\mathbf A^\top \mathbf A) = \|\mathbf A\|_F^2$;
  • +
  • $\log\det \mathbf L = \operatorname{tr}(\log \mathbf L)$ when $\mathbf L$ is a lower triangular matrix.
  • +
+ +

Code

+ +
import torch
+from torch import distributions as D
+
+
+def form_cholesky_tril_from_elements(d, scale_tril_elems):
+    """
+    Form the Cholesky lower triangular matrix from its elements.
+
+    Args:
+        d (int): The number of rows/columns in the square matrix.
+        scale_tril_elems (torch.Tensor): The Cholesky lower triangular
+            elements, of shape (batch_size, (1 + d) * d // 2).
+
+    Returns:
+        torch.Tensor: A tensor of shape (batch_size, d, d).
+    """
+    batch_size = scale_tril_elems.size(0)
+    device = scale_tril_elems.device
+    i, j = torch.tril_indices(d, d, device=device)
+    l_mat = torch.zeros(batch_size, d, d, device=device)
+    l_mat[:, i, j] = scale_tril_elems
+    l_mat_diag = l_mat.diagonal(dim1=1, dim2=2)
+    l_mat_diag.copy_(l_mat_diag.exp())
+    return l_mat
+
+
+d = 3
+batch_size = 5
+
+
+def groundtruth(mean1, scale_tril1, mean2, scale_tril2):
+    p = D.MultivariateNormal(loc=mean1, scale_tril=scale_tril1)
+    q = D.MultivariateNormal(loc=mean2, scale_tril=scale_tril2)
+    return D.kl_divergence(p, q)
+
+
+def ours(mean1, scale_tril1, mean2, scale_tril2):
+    y = torch.linalg.solve_triangular(
+        scale_tril2, (mean2 - mean1).unsqueeze(-1), upper=False).squeeze(-1)
+    y2 = y.square().sum(-1)
+    M = torch.linalg.solve_triangular(scale_tril2, scale_tril1, upper=False)
+    M2 = M.square().flatten(-2, -1).sum(-1)
+    return 0.5 * (y2 + M2 - 2 * (
+        scale_tril1.diagonal(dim1=-2, dim2=-1).log().sum(-1)
+        - scale_tril2.diagonal(dim1=-2, dim2=-1).log().sum(-1)) - d)
+
+
+# Randomize p and q's parameterization.
+mean1 = torch.randn(batch_size, d)
+mean2 = torch.randn(batch_size, d)
+scale_tril1 = form_cholesky_tril_from_elements(
+    d, torch.randn(batch_size, (1 + d) * d // 2))
+scale_tril2 = form_cholesky_tril_from_elements(
+    d, torch.randn(batch_size, (1 + d) * d // 2))
+
+# Assert the correctness.
+assert torch.allclose(groundtruth(mean1, scale_tril1, mean2, scale_tril2),
+                      ours(mean1, scale_tril1, mean2, scale_tril2))
+
+ +

Profile our implementation:

+ +

%timeit groundtruth(mean1, scale_tril1, mean2, scale_tril2) (baseline):

+ +
164 μs ± 178 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
+
+ +

%timeit ours(mean1, scale_tril1, mean2, scale_tril2) (our implementation):

+ +
46.2 μs ± 71.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
+
+ +
+ +
+ +
+
+ + + diff --git a/404.html b/404.html new file mode 100644 index 000000000..ce48b7f15 --- /dev/null +++ b/404.html @@ -0,0 +1,109 @@ + + + + + + + + +Kaiwen’s personal website | My blogs and research reports. + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+

404

+ +

Page not found :(

+

The requested page could not be found.

+
+ +
+
+ + + diff --git a/about/index.html b/about/index.html new file mode 100644 index 000000000..22fbe32d2 --- /dev/null +++ b/about/index.html @@ -0,0 +1,101 @@ + + + + + + + + +About | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

About

+
+ +
+

Hi. I’m Kaiwen Wu, graduating as a CS Masters from UC San Diego.

+ +
+ +
+ +
+
+ + + diff --git a/assets/main.css b/assets/main.css new file mode 100644 index 000000000..240b0033f --- /dev/null +++ b/assets/main.css @@ -0,0 +1 @@ +body,h1,h2,h3,h4,h5,h6,p,blockquote,pre,hr,dl,dd,ol,ul,figure{margin:0;padding:0}body{font:400 16px/1.5 -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";color:#111;background-color:#fdfdfd;-webkit-text-size-adjust:100%;-webkit-font-feature-settings:"kern" 1;-moz-font-feature-settings:"kern" 1;-o-font-feature-settings:"kern" 1;font-feature-settings:"kern" 1;font-kerning:normal;display:flex;min-height:100vh;flex-direction:column}h1,h2,h3,h4,h5,h6,p,blockquote,pre,ul,ol,dl,figure,.highlight{margin-bottom:15px}main{display:block}img{max-width:100%;vertical-align:middle}figure>img{display:block}figcaption{font-size:14px}ul,ol{margin-left:30px}li>ul,li>ol{margin-bottom:0}h1,h2,h3,h4,h5,h6{font-weight:400}a{color:#2a7ae2;text-decoration:none}a:visited{color:#1756a9}a:hover{color:#111;text-decoration:underline}.social-media-list a:hover{text-decoration:none}.social-media-list a:hover .username{text-decoration:underline}blockquote{color:#828282;border-left:4px solid #e8e8e8;padding-left:15px;font-size:18px;letter-spacing:-1px;font-style:italic}blockquote>:last-child{margin-bottom:0}pre,code{font-size:15px;border:1px solid #e8e8e8;border-radius:3px;background-color:#eef}code{padding:1px 5px}pre{padding:8px 12px;overflow-x:auto}pre>code{border:0;padding-right:0;padding-left:0}.wrapper{max-width:-webkit-calc(800px - (30px * 2));max-width:calc(800px - (30px * 2));margin-right:auto;margin-left:auto;padding-right:30px;padding-left:30px}@media screen and (max-width: 800px){.wrapper{max-width:-webkit-calc(800px - (30px));max-width:calc(800px - (30px));padding-right:15px;padding-left:15px}}.wrapper:after,.footer-col-wrapper:after{content:"";display:table;clear:both}.svg-icon{width:16px;height:16px;display:inline-block;fill:#828282;padding-right:5px;vertical-align:text-top}.social-media-list li+li{padding-top:5px}table{margin-bottom:30px;width:100%;text-align:left;color:#3f3f3f;border-collapse:collapse;border:1px solid #e8e8e8}table tr:nth-child(even){background-color:#f7f7f7}table th,table td{padding:10px 15px}table th{background-color:#f0f0f0;border:1px solid #dedede;border-bottom-color:#c9c9c9}table td{border:1px solid #e8e8e8}.site-header{border-top:5px solid #424242;border-bottom:1px solid #e8e8e8;min-height:55.95px;position:relative}.site-title{font-size:26px;font-weight:300;line-height:54px;letter-spacing:-1px;margin-bottom:0;float:left}.site-title,.site-title:visited{color:#424242}.site-nav{float:right;line-height:54px}.site-nav .nav-trigger{display:none}.site-nav .menu-icon{display:none}.site-nav .page-link{color:#111;line-height:1.5}.site-nav .page-link:not(:last-child){margin-right:20px}@media screen and (max-width: 600px){.site-nav{position:absolute;top:9px;right:15px;background-color:#fdfdfd;border:1px solid #e8e8e8;border-radius:5px;text-align:right}.site-nav label[for="nav-trigger"]{display:block;float:right;width:36px;height:36px;z-index:2;cursor:pointer}.site-nav .menu-icon{display:block;float:right;width:36px;height:26px;line-height:0;padding-top:10px;text-align:center}.site-nav .menu-icon>svg{fill:#424242}.site-nav input ~ .trigger{clear:both;display:none}.site-nav input:checked ~ .trigger{display:block;padding-bottom:5px}.site-nav .page-link{display:block;padding:5px 10px;margin-left:20px}.site-nav .page-link:not(:last-child){margin-right:0}}.site-footer{border-top:1px solid #e8e8e8;padding:30px 0}.footer-heading{font-size:18px;margin-bottom:15px}.contact-list,.social-media-list{list-style:none;margin-left:0}.footer-col-wrapper{font-size:15px;color:#828282;margin-left:-15px}.footer-col{float:left;margin-bottom:15px;padding-left:15px}.footer-col-1{width:-webkit-calc(35% - (30px / 2));width:calc(35% - (30px / 2))}.footer-col-2{width:-webkit-calc(20% - (30px / 2));width:calc(20% - (30px / 2))}.footer-col-3{width:-webkit-calc(45% - (30px / 2));width:calc(45% - (30px / 2))}@media screen and (max-width: 800px){.footer-col-1,.footer-col-2{width:-webkit-calc(50% - (30px / 2));width:calc(50% - (30px / 2))}.footer-col-3{width:-webkit-calc(100% - (30px / 2));width:calc(100% - (30px / 2))}}@media screen and (max-width: 600px){.footer-col{float:none;width:-webkit-calc(100% - (30px / 2));width:calc(100% - (30px / 2))}}.page-content{padding:30px 0;flex:1}.page-heading{font-size:32px}.post-list-heading{font-size:28px}.post-list{margin-left:0;list-style:none}.post-list>li{margin-bottom:30px}.post-meta{font-size:14px;color:#828282}.post-link{display:block;font-size:24px}.post-header{margin-bottom:30px}.post-title{font-size:42px;letter-spacing:-1px;line-height:1}@media screen and (max-width: 800px){.post-title{font-size:36px}}.post-content{margin-bottom:30px}.post-content h2{font-size:32px}@media screen and (max-width: 800px){.post-content h2{font-size:28px}}.post-content h3{font-size:26px}@media screen and (max-width: 800px){.post-content h3{font-size:22px}}.post-content h4{font-size:20px}@media screen and (max-width: 800px){.post-content h4{font-size:18px}}.highlight{background:#fff}.highlighter-rouge .highlight{background:#eef}.highlight .c{color:#998;font-style:italic}.highlight .err{color:#a61717;background-color:#e3d2d2}.highlight .k{font-weight:bold}.highlight .o{font-weight:bold}.highlight .cm{color:#998;font-style:italic}.highlight .cp{color:#999;font-weight:bold}.highlight .c1{color:#998;font-style:italic}.highlight .cs{color:#999;font-weight:bold;font-style:italic}.highlight .gd{color:#000;background-color:#fdd}.highlight .gd .x{color:#000;background-color:#faa}.highlight .ge{font-style:italic}.highlight .gr{color:#a00}.highlight .gh{color:#999}.highlight .gi{color:#000;background-color:#dfd}.highlight .gi .x{color:#000;background-color:#afa}.highlight .go{color:#888}.highlight .gp{color:#555}.highlight .gs{font-weight:bold}.highlight .gu{color:#aaa}.highlight .gt{color:#a00}.highlight .kc{font-weight:bold}.highlight .kd{font-weight:bold}.highlight .kp{font-weight:bold}.highlight .kr{font-weight:bold}.highlight .kt{color:#458;font-weight:bold}.highlight .m{color:#099}.highlight .s{color:#d14}.highlight .na{color:teal}.highlight .nb{color:#0086B3}.highlight .nc{color:#458;font-weight:bold}.highlight .no{color:teal}.highlight .ni{color:purple}.highlight .ne{color:#900;font-weight:bold}.highlight .nf{color:#900;font-weight:bold}.highlight .nn{color:#555}.highlight .nt{color:navy}.highlight .nv{color:teal}.highlight .ow{font-weight:bold}.highlight .w{color:#bbb}.highlight .mf{color:#099}.highlight .mh{color:#099}.highlight .mi{color:#099}.highlight .mo{color:#099}.highlight .sb{color:#d14}.highlight .sc{color:#d14}.highlight .sd{color:#d14}.highlight .s2{color:#d14}.highlight .se{color:#d14}.highlight .sh{color:#d14}.highlight .si{color:#d14}.highlight .sx{color:#d14}.highlight .sr{color:#009926}.highlight .s1{color:#d14}.highlight .ss{color:#990073}.highlight .bp{color:#999}.highlight .vc{color:teal}.highlight .vg{color:teal}.highlight .vi{color:teal}.highlight .il{color:#099} diff --git a/assets/minima-social-icons.svg b/assets/minima-social-icons.svg new file mode 100644 index 000000000..fa7399fe2 --- /dev/null +++ b/assets/minima-social-icons.svg @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/posts_imgs/2023-10-04/output_11_0.png b/assets/posts_imgs/2023-10-04/output_11_0.png new file mode 100644 index 000000000..c6e775ef0 Binary files /dev/null and b/assets/posts_imgs/2023-10-04/output_11_0.png differ diff --git a/assets/posts_imgs/2023-10-04/output_13_0.png b/assets/posts_imgs/2023-10-04/output_13_0.png new file mode 100644 index 000000000..3b6263f0f Binary files /dev/null and b/assets/posts_imgs/2023-10-04/output_13_0.png differ diff --git a/assets/posts_imgs/2023-11-03/plate.jpg b/assets/posts_imgs/2023-11-03/plate.jpg new file mode 100644 index 000000000..cd7fabfce Binary files /dev/null and b/assets/posts_imgs/2023-11-03/plate.jpg differ diff --git a/assets/posts_imgs/2023-11-28/classic_empirical_bayes.jpg b/assets/posts_imgs/2023-11-28/classic_empirical_bayes.jpg new file mode 100644 index 000000000..cfcfcc50c Binary files /dev/null and b/assets/posts_imgs/2023-11-28/classic_empirical_bayes.jpg differ diff --git a/assets/posts_imgs/2023-11-28/coin_with_prior.jpg b/assets/posts_imgs/2023-11-28/coin_with_prior.jpg new file mode 100644 index 000000000..c21ad02b9 Binary files /dev/null and b/assets/posts_imgs/2023-11-28/coin_with_prior.jpg differ diff --git a/assets/posts_imgs/2024-02-01/darkgreen.jpg b/assets/posts_imgs/2024-02-01/darkgreen.jpg new file mode 100644 index 000000000..eb21068ec Binary files /dev/null and b/assets/posts_imgs/2024-02-01/darkgreen.jpg differ diff --git a/assets/posts_imgs/2024-02-11/dsigma-maxquad.png b/assets/posts_imgs/2024-02-11/dsigma-maxquad.png new file mode 100644 index 000000000..9da4fb6cc Binary files /dev/null and b/assets/posts_imgs/2024-02-11/dsigma-maxquad.png differ diff --git a/assets/posts_imgs/2024-02-11/dsigma-softplus.png b/assets/posts_imgs/2024-02-11/dsigma-softplus.png new file mode 100644 index 000000000..126494de3 Binary files /dev/null and b/assets/posts_imgs/2024-02-11/dsigma-softplus.png differ diff --git a/assets/posts_imgs/2024-05-17/dataset.png b/assets/posts_imgs/2024-05-17/dataset.png new file mode 100644 index 000000000..e4d4221ea Binary files /dev/null and b/assets/posts_imgs/2024-05-17/dataset.png differ diff --git a/assets/posts_imgs/2024-05-17/precise-biased-db.png b/assets/posts_imgs/2024-05-17/precise-biased-db.png new file mode 100644 index 000000000..71159bd12 Binary files /dev/null and b/assets/posts_imgs/2024-05-17/precise-biased-db.png differ diff --git a/assets/posts_imgs/2024-05-17/precise-biased-unc.png b/assets/posts_imgs/2024-05-17/precise-biased-unc.png new file mode 100644 index 000000000..599747979 Binary files /dev/null and b/assets/posts_imgs/2024-05-17/precise-biased-unc.png differ diff --git a/assets/posts_imgs/2024-05-17/precise-uninformative-db.png b/assets/posts_imgs/2024-05-17/precise-uninformative-db.png new file mode 100644 index 000000000..4137dda0f Binary files /dev/null and b/assets/posts_imgs/2024-05-17/precise-uninformative-db.png differ diff --git a/assets/posts_imgs/2024-05-17/precise-uninformative-unc.png b/assets/posts_imgs/2024-05-17/precise-uninformative-unc.png new file mode 100644 index 000000000..5548ad5ec Binary files /dev/null and b/assets/posts_imgs/2024-05-17/precise-uninformative-unc.png differ diff --git a/assets/posts_imgs/2024-05-17/uninformative-db.png b/assets/posts_imgs/2024-05-17/uninformative-db.png new file mode 100644 index 000000000..7e33c6ee4 Binary files /dev/null and b/assets/posts_imgs/2024-05-17/uninformative-db.png differ diff --git a/assets/posts_imgs/2024-05-17/uninformative-rescaled-db.png b/assets/posts_imgs/2024-05-17/uninformative-rescaled-db.png new file mode 100644 index 000000000..683ba5146 Binary files /dev/null and b/assets/posts_imgs/2024-05-17/uninformative-rescaled-db.png differ diff --git a/assets/posts_imgs/2024-05-17/uninformative-rescaled-unc.png b/assets/posts_imgs/2024-05-17/uninformative-rescaled-unc.png new file mode 100644 index 000000000..11bcabe4c Binary files /dev/null and b/assets/posts_imgs/2024-05-17/uninformative-rescaled-unc.png differ diff --git a/assets/posts_imgs/2024-05-17/uninformative-unc.png b/assets/posts_imgs/2024-05-17/uninformative-unc.png new file mode 100644 index 000000000..59b7745b3 Binary files /dev/null and b/assets/posts_imgs/2024-05-17/uninformative-unc.png differ diff --git a/assets/posts_imgs/2024-06-13/iterm2-ai-demo.gif b/assets/posts_imgs/2024-06-13/iterm2-ai-demo.gif new file mode 100644 index 000000000..b85797a1c Binary files /dev/null and b/assets/posts_imgs/2024-06-13/iterm2-ai-demo.gif differ diff --git a/assets/posts_imgs/pizzahut-free-soda.jpg b/assets/posts_imgs/pizzahut-free-soda.jpg new file mode 100644 index 000000000..b642b5029 Binary files /dev/null and b/assets/posts_imgs/pizzahut-free-soda.jpg differ diff --git a/assets/spare-time-research/chem-eq-balance.pdf b/assets/spare-time-research/chem-eq-balance.pdf new file mode 100644 index 000000000..46c3562b4 Binary files /dev/null and b/assets/spare-time-research/chem-eq-balance.pdf differ diff --git a/assets/spare-time-research/covid19-test-analysis.pdf b/assets/spare-time-research/covid19-test-analysis.pdf new file mode 100644 index 000000000..ef5128146 Binary files /dev/null and b/assets/spare-time-research/covid19-test-analysis.pdf differ diff --git a/assets/spare-time-research/cross-walker.pdf b/assets/spare-time-research/cross-walker.pdf new file mode 100644 index 000000000..f0ae95265 Binary files /dev/null and b/assets/spare-time-research/cross-walker.pdf differ diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 000000000..9b212b6fb --- /dev/null +++ b/docs/index.html @@ -0,0 +1,107 @@ + + + + + + + + +Docs | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+ + +
+
+ + + diff --git a/feed.xml b/feed.xml new file mode 100644 index 000000000..9e2aea796 --- /dev/null +++ b/feed.xml @@ -0,0 +1,937 @@ +Jekyll2024-12-19T08:25:53+00:00https://kkew3.github.io/feed.xmlKaiwen’s personal websiteMy blogs and research reports.KL divergence between two full-rank Gaussians in PyTorch2024-12-09T08:03:00+00:002024-12-09T08:03:00+00:00https://kkew3.github.io/2024/12/09/gaussian-kl-div-torch<h2 id="abstract">Abstract</h2> + +<p>In this post, we will go through the <a href="https://pytorch.org/">PyTorch</a> code to compute the Kullback-Leibler divergence between two full-rank Gaussians. +The code might be useful if one considers using full-rank Gaussian as variational posterior while training a <a href="https://arxiv.org/abs/1312.6114">variational autoencoder</a>.</p> + +<h2 id="kl-divergence-between-two-full-rank-gaussians">KL divergence between two full-rank Gaussians</h2> + +<p>It’s common practice to parameterize the covariance matrix $\boldsymbol\Sigma$ of a $d$-dimensional full-rank Gaussian using a $D$-dimensional vector of nonzero elements of $\mathbf L$, where $D = d(1+d)/2$ and $\boldsymbol\Sigma = \mathbf L \mathbf L^\top$ is the Cholesky decomposition. +So we will assume it here. +Note that the diagonal of $\mathbf L$ must be positive so that $\boldsymbol\Sigma$ is positive definite. +We will enforce this by taking the exponential on the diagonal elements (e.g. the first $d$ elements of our parameterization).</p> + +<p>Let the two Gaussians be $p(\boldsymbol x) = \mathcal N(\boldsymbol x \mid \boldsymbol\mu_1, \boldsymbol\Sigma_1)$ and $q(\boldsymbol x) = \mathcal N(\boldsymbol x \mid \boldsymbol\mu_2, \boldsymbol\Sigma_2)$. +Per <a href="https://statproofbook.github.io/P/mvn-kl">The Book of Statistical Proofs</a>, the KL divergence between them is:</p> + +\[D_\mathrm{KL}(p \parallel q) = \frac{1}{2}\left((\boldsymbol\mu_2 - \boldsymbol\mu_1)^\top \boldsymbol\Sigma_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1) + \operatorname{tr}(\boldsymbol\Sigma_2^{-1} \boldsymbol\Sigma_1) - \log \frac{\det \boldsymbol\Sigma_1}{\det \boldsymbol\Sigma_2} - d\right)\,.\] + +<p>Plugging in our parameterization of the covariance matrices:</p> + +\[\begin{aligned} + D_\mathrm{KL}(p \parallel q) + &amp;= \frac{1}{2}\left((\boldsymbol\mu_2 - \boldsymbol\mu_1)^\top \mathbf L_2^{-\top} \mathbf L_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1) + \operatorname{tr}((\mathbf L_2 \mathbf L_2^\top)^{-1} (\mathbf L_1 \mathbf L_1^\top)) - \log \frac{\det(\mathbf L_1 \mathbf L_1^\top)}{\det(\mathbf L_2 \mathbf L_2^\top)} - d\right)\\ + &amp;= \frac{1}{2}\left((\mathbf L_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1))^\top (\mathbf L_2^{-1} (\boldsymbol\mu_2 - \boldsymbol\mu_1)) + \operatorname{tr}((\mathbf L_2^{-1} \mathbf L_1)^\top (\mathbf L_2^{-1} \mathbf L_1)) - 2\log\frac{\det\mathbf L_1}{\det\mathbf L_2} - d\right)\,.\\ +\end{aligned}\] + +<p>We have used the following facts:</p> + +<ul> + <li>the <a href="https://en.wikipedia.org/wiki/Trace_(linear_algebra)#Cyclic_property">cyclic property of trace</a>;</li> + <li>$\det \mathbf A = \det \mathbf A^\top$;</li> + <li>$\log\det (\mathbf A \mathbf B) = \log\det(\mathbf A) + \log\det(\mathbf B)$.</li> +</ul> + +<p>It follows that:</p> + +\[D_\mathrm{KL}(p \parallel q) = \frac{1}{2}\big(\boldsymbol y^\top \boldsymbol y + \|\mathbf M\|_F^2 - 2 (\operatorname{tr}(\log \mathbf L_1) - \operatorname{tr}(\log \mathbf L_2)) - d\big)\,,\] + +<p>where $\mathbf L_2 \boldsymbol y = \boldsymbol\mu_2 - \boldsymbol\mu_1$, and $\mathbf L_2 \mathbf M = \mathbf L_1$.</p> + +<p>We have denoted:</p> + +<ul> + <li>$\|\cdot\|_F$ as the Frobenius norm of a matrix;</li> + <li>$\log \mathbf A$ as the elementwise logarithm of $\mathbf A$.</li> +</ul> + +<p>We have used the following facts:</p> + +<ul> + <li>$\operatorname{tr}(\mathbf A^\top \mathbf A) = \|\mathbf A\|_F^2$;</li> + <li>$\log\det \mathbf L = \operatorname{tr}(\log \mathbf L)$ when $\mathbf L$ is a lower triangular matrix.</li> +</ul> + +<h3 id="code">Code</h3> + +<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">torch</span> +<span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">distributions</span> <span class="k">as</span> <span class="n">D</span> + + +<span class="k">def</span> <span class="nf">form_cholesky_tril_from_elements</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">scale_tril_elems</span><span class="p">):</span> + <span class="s">""" + Form the Cholesky lower triangular matrix from its elements. + + Args: + d (int): The number of rows/columns in the square matrix. + scale_tril_elems (torch.Tensor): The Cholesky lower triangular + elements, of shape (batch_size, (1 + d) * d // 2). + + Returns: + torch.Tensor: A tensor of shape (batch_size, d, d). + """</span> + <span class="n">batch_size</span> <span class="o">=</span> <span class="n">scale_tril_elems</span><span class="p">.</span><span class="n">size</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> + <span class="n">device</span> <span class="o">=</span> <span class="n">scale_tril_elems</span><span class="p">.</span><span class="n">device</span> + <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">tril_indices</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">)</span> + <span class="n">l_mat</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">)</span> + <span class="n">l_mat</span><span class="p">[:,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">scale_tril_elems</span> + <span class="n">l_mat_diag</span> <span class="o">=</span> <span class="n">l_mat</span><span class="p">.</span><span class="n">diagonal</span><span class="p">(</span><span class="n">dim1</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">dim2</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> + <span class="n">l_mat_diag</span><span class="p">.</span><span class="n">copy_</span><span class="p">(</span><span class="n">l_mat_diag</span><span class="p">.</span><span class="n">exp</span><span class="p">())</span> + <span class="k">return</span> <span class="n">l_mat</span> + + +<span class="n">d</span> <span class="o">=</span> <span class="mi">3</span> +<span class="n">batch_size</span> <span class="o">=</span> <span class="mi">5</span> + + +<span class="k">def</span> <span class="nf">groundtruth</span><span class="p">(</span><span class="n">mean1</span><span class="p">,</span> <span class="n">scale_tril1</span><span class="p">,</span> <span class="n">mean2</span><span class="p">,</span> <span class="n">scale_tril2</span><span class="p">):</span> + <span class="n">p</span> <span class="o">=</span> <span class="n">D</span><span class="p">.</span><span class="n">MultivariateNormal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">mean1</span><span class="p">,</span> <span class="n">scale_tril</span><span class="o">=</span><span class="n">scale_tril1</span><span class="p">)</span> + <span class="n">q</span> <span class="o">=</span> <span class="n">D</span><span class="p">.</span><span class="n">MultivariateNormal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">mean2</span><span class="p">,</span> <span class="n">scale_tril</span><span class="o">=</span><span class="n">scale_tril2</span><span class="p">)</span> + <span class="k">return</span> <span class="n">D</span><span class="p">.</span><span class="n">kl_divergence</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span> + + +<span class="k">def</span> <span class="nf">ours</span><span class="p">(</span><span class="n">mean1</span><span class="p">,</span> <span class="n">scale_tril1</span><span class="p">,</span> <span class="n">mean2</span><span class="p">,</span> <span class="n">scale_tril2</span><span class="p">):</span> + <span class="n">y</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">solve_triangular</span><span class="p">(</span> + <span class="n">scale_tril2</span><span class="p">,</span> <span class="p">(</span><span class="n">mean2</span> <span class="o">-</span> <span class="n">mean1</span><span class="p">).</span><span class="n">unsqueeze</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="n">upper</span><span class="o">=</span><span class="bp">False</span><span class="p">).</span><span class="n">squeeze</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> + <span class="n">y2</span> <span class="o">=</span> <span class="n">y</span><span class="p">.</span><span class="n">square</span><span class="p">().</span><span class="nb">sum</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> + <span class="n">M</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">solve_triangular</span><span class="p">(</span><span class="n">scale_tril2</span><span class="p">,</span> <span class="n">scale_tril1</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> + <span class="n">M2</span> <span class="o">=</span> <span class="n">M</span><span class="p">.</span><span class="n">square</span><span class="p">().</span><span class="n">flatten</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">).</span><span class="nb">sum</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> + <span class="k">return</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="p">(</span><span class="n">y2</span> <span class="o">+</span> <span class="n">M2</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">*</span> <span class="p">(</span> + <span class="n">scale_tril1</span><span class="p">.</span><span class="n">diagonal</span><span class="p">(</span><span class="n">dim1</span><span class="o">=-</span><span class="mi">2</span><span class="p">,</span> <span class="n">dim2</span><span class="o">=-</span><span class="mi">1</span><span class="p">).</span><span class="n">log</span><span class="p">().</span><span class="nb">sum</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> + <span class="o">-</span> <span class="n">scale_tril2</span><span class="p">.</span><span class="n">diagonal</span><span class="p">(</span><span class="n">dim1</span><span class="o">=-</span><span class="mi">2</span><span class="p">,</span> <span class="n">dim2</span><span class="o">=-</span><span class="mi">1</span><span class="p">).</span><span class="n">log</span><span class="p">().</span><span class="nb">sum</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span> <span class="o">-</span> <span class="n">d</span><span class="p">)</span> + + +<span class="c1"># Randomize p and q's parameterization. +</span><span class="n">mean1</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> +<span class="n">mean2</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span> +<span class="n">scale_tril1</span> <span class="o">=</span> <span class="n">form_cholesky_tril_from_elements</span><span class="p">(</span> + <span class="n">d</span><span class="p">,</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">d</span><span class="p">)</span> <span class="o">*</span> <span class="n">d</span> <span class="o">//</span> <span class="mi">2</span><span class="p">))</span> +<span class="n">scale_tril2</span> <span class="o">=</span> <span class="n">form_cholesky_tril_from_elements</span><span class="p">(</span> + <span class="n">d</span><span class="p">,</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">d</span><span class="p">)</span> <span class="o">*</span> <span class="n">d</span> <span class="o">//</span> <span class="mi">2</span><span class="p">))</span> + +<span class="c1"># Assert the correctness. +</span><span class="k">assert</span> <span class="n">torch</span><span class="p">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">groundtruth</span><span class="p">(</span><span class="n">mean1</span><span class="p">,</span> <span class="n">scale_tril1</span><span class="p">,</span> <span class="n">mean2</span><span class="p">,</span> <span class="n">scale_tril2</span><span class="p">),</span> + <span class="n">ours</span><span class="p">(</span><span class="n">mean1</span><span class="p">,</span> <span class="n">scale_tril1</span><span class="p">,</span> <span class="n">mean2</span><span class="p">,</span> <span class="n">scale_tril2</span><span class="p">))</span> +</code></pre></div></div> + +<p>Profile our implementation:</p> + +<p><code class="language-plaintext highlighter-rouge">%timeit groundtruth(mean1, scale_tril1, mean2, scale_tril2)</code> (baseline):</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>164 μs ± 178 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) +</code></pre></div></div> + +<p><code class="language-plaintext highlighter-rouge">%timeit ours(mean1, scale_tril1, mean2, scale_tril2)</code> (our implementation):</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>46.2 μs ± 71.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) +</code></pre></div></div>Abstractvim/nvim 中文分词插件 jieba.vim2024-11-10T06:53:06+00:002024-11-10T06:53:06+00:00https://kkew3.github.io/2024/11/10/jieba-vim<blockquote> + <p>jieba.vim,做最好的 Vim 中文分词插件。</p> +</blockquote> + +<p>jieba.vim 正处于积极开发中,欢迎关注、Star ⭐️、下载试用<a href="https://github.com/kkew3/jieba.vim/tree/dev/rust">最新进展</a>,欢迎在 <a href="https://github.com/kkew3/jieba.vim/issues">issue</a> 讨论并提意见建议。</p> + +<h2 id="vimnvim-中文编辑现状">Vim/Nvim 中文编辑现状</h2> + +<p><a href="https://www.vim.org/">Vim</a>/<a href="https://neovim.io/">Nvim</a> 的中文文本编辑体验不足是广为人知的问题。由于中文不用空格分隔词语,Vim 的原生 <a href="https://vimhelp.org/motion.txt.html#word-motions">word motion</a> 功能无法有效发挥作用,因此用户只能依赖基本的 h/l 键位逐字移动光标。作为一名常用 Vim 写作的人,我对这方面的不便深有体会,经过调研确定没有已存在的完善项目后,我于一年多前开始在业余时间开发 jieba.vim,旨在解决这个问题。</p> + +<h2 id="jiebavim-简介">jieba.vim 简介</h2> + +<p>jieba.vim 是一个基于 <a href="https://github.com/fxsjy/jieba">jieba</a> 的 Vim/Nvim 中文分词插件,通过增强 w/W/e/E/b/B/ge/gE 键位,使 Vim 能够按中文词语移动光标。jieba.vim 的<a href="https://github.com/kkew3/jieba.vim/tree/main">初版原型</a>使用 python3 开发,初步解决了按词移动光标的问题。虽然(可能)由于其过于缓慢的词典加载速度并没有获得什么关注,但是基于该原型的 lua 移植版 <a href="https://github.com/neo451/jieba.nvim">neo451/jieba.nvim</a> 在 Vim 中文编辑圈的小为人知还是证明了 jieba.vim 解决方案的有效性,给了我继续改进的动力。随后,我发布了 jieba.vim 的<a href="https://github.com/kkew3/jieba.vim/tree/rust">改进版</a>,解决了其速度问题,并于近一个月开始重写 jieba.vim 的核心逻辑,更严谨地确立其发展方向。</p> + +<h2 id="jiebavim-对自身的定位及路线图">jieba.vim 对自身的定位及路线图</h2> + +<p>Features 依重要性排序:</p> + +<ol> + <li>jieba.vim 应保持其与 Vim 的兼容性。经过增强的 w/W/e/E/b/B/ge/gE 键位在无中文 ASCII 文本上应与 Vim 原生行为<em>完全一致</em>,其中包括各种特殊情况(例如 <a href="https://vimhelp.org/change.txt.html#cw">cw</a>)。这需要通过大量测试(单元测试与 property-based tests),以及对单元测试的正确性验证(见 <a href="https://github.com/junegunn/vader.vim">junegunn/vader.vim</a>)予以保证。</li> + <li>jieba.vim 应具有较快的速度(包括词典加载速度、键位反应速度、词典懒加载),以确保良好的用户体验。</li> + <li>jieba.vim 应易于安装。目前 jieba.vim 的<a href="https://github.com/kkew3/jieba.vim/tree/rust">改进版</a>需要本地安装 Rust 并进行编译。未来将通过发布预编译链接库避免该情况;同时针对 neovim 将尝试直接从 lua 链接 Rust 核心 crate,从而免除 lua 调用 python、python 再调用 Rust 的额外调用关系。</li> + <li>jieba.vim 应具有较广的功能覆盖,例如在 normal/visual/operator-pending 模式下对 {count}w/W/e/E/b/B/ge/gE 键位的映射支持、以及对自定义 Vim 选项 <a href="https://vimhelp.org/options.txt.html#%27iskeyword%27"><code class="language-plaintext highlighter-rouge">'iskeyword'</code></a>、<a href="https://vimhelp.org/options.txt.html#%27virtualedit%27"><code class="language-plaintext highlighter-rouge">'virtualedit'</code></a> 的支持。此外,对 word <a href="https://vimhelp.org/motion.txt.html#text-objects">text object</a> iw/iW/aw/aW 的支持也处于计划内。</li> +</ol> + +<h2 id="jiebavim-分支一览">jieba.vim 分支一览</h2> + +<table> + <thead> + <tr> + <th>分支</th> + <th>简介</th> + <th>开发语言</th> + <th>许可</th> + </tr> + </thead> + <tbody> + <tr> + <td>[初版原型](https://github.com/kkew3/jieba.vim/tree/main) (main)</td> + <td>normal 模式下按词移动光标完善,其余模式下有 bug。</td> + <td>python3</td> + <td>MIT</td> + </tr> + <tr> + <td>[改进版](https://github.com/kkew3/jieba.vim/tree/rust) (rust)</td> + <td>normal 模式下按词移动光标完善,其余模式下有 bug,词典加载速度快 60%。</td> + <td>Rust + python3</td> + <td>MIT</td> + </tr> + <tr> + <td>[最新进展](https://github.com/kkew3/jieba.vim/tree/dev/rust) (dev/rust)</td> + <td>目前已完成 normal/visual/operator-pending 模式下的 {count}w/W/e/E/b/B/ge/gE 键位,未发现 bug,欢迎下载试用。</td> + <td>Rust + python3</td> + <td>Apache v2.0</td> + </tr> + </tbody> +</table>jieba.vim,做最好的 Vim 中文分词插件。Conditioning of the variational posterior in CVAE (Sohn, 2015)2024-08-27T11:03:07+00:002024-08-27T11:03:07+00:00https://kkew3.github.io/2024/08/27/conditioning-of-cvae-variational-posterior<h2 id="abstract">Abstract</h2> + +<p>This post explores alternative conditioning of the variational posterior $q$ in CVAE (<a href="https://papers.nips.cc/paper_files/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html">Sohn et al. 2015</a>), and concludes that the conditioning of $q$ on $y$ is important to predictive inference.</p> + +<h2 id="introduction">Introduction</h2> + +<p>CVAE models the likelihood $p(y \mid x)$ as a continuous mixture of latent $z$:</p> + +<div id="eq-def"></div> + +\[p(y \mid x) = \int p_\theta(z \mid x) p_\theta(y \mid x,z)\,\mathrm dz\,. \tag{1}\] + +<p>Since (<a href="#eq-def">1</a>) is intractable, Sohn et al. instead optimize its evidence lower bound (ELBO):</p> + +\[\mathcal L_{\text{CVAE}}(x,y;\theta,\phi) = \mathbb E_q[\log p_\theta(y \mid x,z) - \log q_\phi(z \mid x,y)]\,. \tag{2}\] + +<p>Here, the variational posterior $q$ conditions on $x$ and $y$. +At test time, the authors propose to use importance sampling leveraging the trained variational posterior:</p> + +<div id="eq-is"></div> + +\[p(y \mid x) \approx \frac{1}{S} \sum_{s=1}^S \frac{p_\theta(y \mid x,z_s) p_\theta(z_s \mid x)}{q_\phi(z_s \mid x,y)}\,, \tag{3}\] + +<p>where $z_s \sim q_\phi(z \mid x,y)$.</p> + +<p>What if $q$ conditions on $x$ only? +This post explores this possibility, and reaches the conclusion that without conditioning on $y$, $q$ at optimum won’t ever attain the true posterior $p(z \mid x,y)$, and should not be otherwise better in terms of reducing the variance in importance sampling.</p> + +<h2 id="warm-up-proving-the-effecacy-of-importance-sampling">Warm up: proving the effecacy of importance sampling</h2> + +<p>We assume that infinite data is available for learning, and $q$ is from a flexible enough probability family. +The data are drawn from the joint data distribution $p_D(x,y)$, where we have stressed with a subscript $D$. +We assume that $x$ is continuous and $y$ is discrete. +The goal is to maximize the expected ELBO in terms of $p_D(x,y)$. +However, we assume that $p_\theta(y \mid x,z)$ won’t approaches to $p_D(y \mid x)$ whatever value $\theta$ picks. +We will drop $\theta$ and $\phi$ below for brevity.</p> + +<p>We may easily pose this setup as a constrained maximization problem: +$\max \mathbb E[\log p(y,z \mid x) - \log q(z \mid x,y)]$ subject to $q$ being a probability, where the expectation is taken with respect to $p_D(x,y) q(z \mid x,y)$.</p> + +<p>The Lagrangian is:</p> + +<div id="eq-lagrangian-q-xy"></div> + +\[\int \sum_y p_D(x,y) \int q(z \mid x,y) \log \frac{p(y,z \mid x)}{q(z \mid x,y)}\,\mathrm dz\,\mathrm dx + \int \sum_y \mu(x,y) \left(\int q(z \mid x,y)\,\mathrm dz - 1\right)\,\mathrm dx\,, \tag{4}\] + +<p>where $\mu(x,y)$ is the Lagrange multiplier. +Now find the <a href="https://www.youtube.com/watch?v=6VvmMkAx5Jc&amp;t=982s">Gateaux derivative</a> and let it equal zero:</p> + +\[0 = p_D(x,y) (\log p(y,z \mid x) - (1 + \log q(z \mid x,y)) + \mu(x,y))\,.\] + +<p>Absorbing $p_D(x,y) &gt; 0$ and the constant 1 into $\mu(x,y)$ yields:</p> + +\[\log q(z \mid x,y) = \mu(x,y) + \log p(y,z \mid x)\,,\] + +<p>where $\mu(x,y) = -\log \int p(y,z \mid x)\,\mathrm dz = -\log p_D(y \mid x)$. +It thus follows that, at optimum, $q(z \mid x,y) = p(z \mid x,y)$. +Hence, when evaluating Equation (<a href="#eq-is">3</a>), at optimum, the right hand side equals the left hand side with zero variance.</p> + +<h2 id="conditioning-only-on-x-gives-worse-approximation">Conditioning only on x gives worse approximation</h2> + +<p>Following the same setup as the previous section, we start from the Lagrangian (<a href="#eq-lagrangian-q-xy">4</a>). +Note that now we assume $q \triangleq q(z \mid x)$, and that the Lagrange multiplier is $\mu(x)$ instead of $\mu(x,y)$. +Rearranging the terms:</p> + +\[\begin{multline} + \int p_D(x) \int q(z \mid x) \left(\sum_y p_D(y \mid x) \log p(y,z \mid x) - \log q(z \mid x)\right)\,\mathrm dz\,\mathrm dz \\ + + \int \mu(x) \left(\int q(z \mid x)\,\mathrm dz - 1\right)\,\mathrm dx\,. +\end{multline}\] + +<p>Let its Gateaux derivative with respect to $q$ equal zero:</p> + +\[0 = p_D(x) \left(\sum_y p_D(y \mid x) \log p(y,z \mid x) - (1 + \log q(z \mid x))\right) + \mu(x)\,.\] + +<p>Absorbing $p_D(x) &gt; 0$ and the constant 1 into $\mu(x)$ yields:</p> + +\[\log q(z \mid x) = \mu(x) + \sum_y p_D(y \mid x) \log p(z \mid x,y) - \mathbb H(p_D(y \mid x))\,,\] + +<p>where $\mathbb H(p_D(y \mid x)) = -\sum_y p_D(y \mid x) \log p_D(y \mid x)$ is the entropy. +We see immediately that:</p> + +<div id="eq-main-result"></div> + +\[q(z \mid x) \propto \exp(\mathbb E_{p_D(y \mid x)}[\log p(z \mid x,y)])\,. \tag{5}\] + +<p>This means that when not conditioning on $y$, $q(z \mid x)$ can never achieve the true posterior $p(z \mid x,y)$, unless $\mathbb H(p_D(y \mid x)) = 0$, which is unlikely to occur.</p>AbstractEffect of gamma in BN-VAE2024-08-09T11:00:44+00:002024-08-09T11:00:44+00:00https://kkew3.github.io/2024/08/09/gamma-in-bn-vae<h2 id="abstract">Abstract</h2> + +<p>This post discusses the effect of $\gamma$ in BN-VAE (<a href="https://arxiv.org/abs/2004.12585">Zhu et al., 2020</a>).</p> + +<h2 id="introduction">Introduction</h2> + +<p>BN-VAE (see more about it <a href="https://kexue.fm/archives/7381">here</a> (in Chinese)) attempts to solve KL vanishing problem (a.k.a. posterior collapse) in Gaussian-VAE by batch-normalizing the variational posterior mean, which casts a positive lower bound on the <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a> term (over the dataset) in <a href="https://en.wikipedia.org/wiki/Evidence_lower_bound">ELBO</a>, thus avoiding KL vanishing problem. +The batch normalization procedure includes a fixed hyperparameter $\gamma \ge 0$, which controls the lower bound of the KL; the larger $\gamma$, the larger the lower bound. +When $\gamma=0$, KL vanishing occurs.</p> + +<p>Zhu et al. (2020) visualizes the distribution of the variational posterior mean when $\gamma$ equals 0.3 and 1. +What will happen if $\gamma &gt; 1$? +How does $\gamma &gt; 0$ solves the KL vanishing problem? +We’ll explore these questions below.</p> + +<h2 id="gamma1-introduces-posterior-hole-problem">$\gamma&gt;1$ introduces posterior hole problem</h2> + +<p>Posterior hole problem happens when the aggregate variational posterior (a.k.a. average encoder distribution (<a href="https://approximateinference.org/2016/accepted/HoffmanJohnson2016.pdf">Hoffman &amp; Johnson, 2016</a>)) does not match the prior. +When measured in KL divergence, this means:</p> + +\[D_{KL}(q_\phi(z) \parallel p(z)) &gt; 0\] + +<p>Here, $q_\phi(z) = \sum_{i=1}^N \frac{1}{N} q_\phi(z \mid x_i)$ where $N$ is the dataset size, is the aggregate variational posterior.</p> + +<p>In Gaussian-VAE, the variational posterior $q_\phi(z \mid x_i) = \mathcal N(z \mid \mu_i, \sigma_i^2)$, where $(\mu_i,\sigma_i^2)$ are typically computed by a neural network called the inference network (<a href="https://arxiv.org/pdf/1312.6114">Kingma &amp; Welling, 2013</a>) parameterized by $\phi$ given $x_i$; and $q_\phi(z \mid x_i)$ can usually be factorized into each dimension $j$ as $q_\phi(z \mid x_i) = \prod_{j=1}^d q_\phi(z_j \mid x_i)$, where each $q_\phi(z_j \mid x_i)$ is an univariate Gaussian parameterized by $(\mu_{ij}, \sigma_{ij}^2)$. +Thus, the aggregate variational posterior is an $N$-mixture of Gaussians whose mean, at each dimension $j$, is $\bar\mu_j = \frac{1}{N}\sum_{i=1}^N \mu_{ij}$ and variance is $\bar\sigma_j^2 = \frac{1}{N}\sum_{i=1}^N \sigma_{ij}^2$.</p> + +<p>If $q_\phi$ is transformed according to BN-VAE, then $\bar\mu_j = \beta$ where $\beta$ is a learnable parameter. +Furthermore, we have variance $\mathbb E_{q_\phi(z_j)}[z_j^2] - \mathbb E_{q_\phi(z_j)}[z_j]^2 = \gamma^2 + \bar\sigma^2$. +If we follow Zhu et al. (2020) to use a standard Gaussian $\mathcal N(z \mid \mathbf 0, \mathbf I)$ as prior $p$, then according to <a href="https://kkew3.github.io/2024/08/09/lower-bound-of-kl-divergence-between-any-density-and-gaussian.html">this post</a>, $D_{KL}(q_\phi(z) \parallel p(z)$, at each dimension $j$, will be lower bounded by $D_{KL}(q_0(z_j) \parallel p(z_j))$ where $q_0(z_j) = \mathcal N(z_j \mid \beta, \gamma^2 + \bar\sigma^2)$, which is consistently greater than zero when $\gamma &gt; 1$ (<a href="https://arxiv.org/pdf/1901.03416">Razavi et al., 2019</a>). +It follows immediately (<a href="https://statproofbook.github.io/P/kl-add">Soch, Joram, et al., 2024</a>), that $D_{KL}(q_\phi(z) \parallel p(z)) \ge \sum_{j=1}^d D_{KL}(q_0(z_i) \parallel p(z_i)) &gt; 0$.</p> + +<p><em>TO BE CONTINUED</em></p>AbstractLower bound of KL divergence between any density and Gaussian2024-08-09T09:03:39+00:002024-08-09T09:03:39+00:00https://kkew3.github.io/2024/08/09/lower-bound-of-kl-divergence-between-any-density-and-gaussian<h2 id="abstract">Abstract</h2> + +<p>In this post, I explain how to derive a lower bound of the <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a> between any density $q$, e.g. a Gaussian mixture, and a Gaussian $p$.</p> + +<h2 id="framework">Framework</h2> + +<p>We may cast the problem finding the lower bound to a constrained minimization problem:</p> + +<div id="eq-1"></div> + +\[\begin{aligned} + \min_{q'}\ &amp;D_{KL}(q' \parallel p)\\ + \text{s.t. } &amp;\int_{\mathcal X} q'(x)\,\mathrm dx = 1\\ + &amp;\ldots \ \text{other constraints} +\end{aligned}\tag{1}\] + +<p>where $\mathcal X$ is the support of $q’$, and we’ll fill in “other constraints” with what we know about the density $q$, like its mean and variance. +The solution of Equation (<a href="#eq-1">1</a>) will be the lower bound we’re seeking for.</p> + +<p>The <a href="https://en.wikipedia.org/wiki/Lagrange_multiplier">Lagrangian</a> would be:</p> + +\[L = \int_{\mathcal X} q'(x)\log \frac{q'(x)}{p(x)}\,\mathrm dx + \lambda_0 (\int_{\mathcal X} q'(x)\,\mathrm dx - 1) + \ldots \tag{2}\] + +<p>Taking the functional derivative of $L$ with respect to $q’$ and letting it equal zero yields:</p> + +\[\begin{aligned} + 0 &amp;= 1 + \log q'(x) - \log p(x) + \lambda_0 + \ldots\\ + \log q'(x) &amp;= -\lambda_0 - 1 + \log p(x) + \ldots\\ + q'(x) &amp;= \exp(-\lambda_0 -1 + \log p(x) + \ldots) +\end{aligned}\] + +<p>Finally, plugging $q’(x)$ back into the constraints and solve for the Lagrange multipliers $\lambda_0$, etc.</p> + +<h2 id="example">Example</h2> + +<p>In this simple example, we assume that $p(x) = \mathcal N(x \mid 0, 1)$ be a standard univariate Gaussian, and assume that $q$ and $p$ have the same support. +Suppose also that we know the mean and variance of $q$ to be: $\mathbb E_q[x] = 0$, $\mathbb E_q[x^2] - \mathbb E_q[x]^2 = \mathbb E_q[x^2] = \sigma^2$.</p> + +<p>The Lagrangian is:</p> + +<div id="eq-3"></div> + +\[\require{enclose} +L = \int_{-\infty}^\infty q'(x) \log \frac{q'(x)}{p(x)}\,\mathrm dx + \lambda_0 (\underbrace{\int_{-\infty}^\infty q'(x)\,\mathrm dx - 1}_{\substack{\enclose{circle}{1}}}) + \lambda_1 (\underbrace{\int_{-\infty}^\infty x^2 q'(x)\,\mathrm dx - \sigma^2}_{\substack{\enclose{circle}{2}}})\tag{3}\] + +<p>where we have encoded the mean and variance constraints into one term (see why <a href="https://michael-franke.github.io/intro-data-analysis/the-maximum-entropy-principle.html#example-2-derivation-of-maximum-entropy-pdf-with-given-mean-mu-and-variance-sigma2">here</a>). +Taking the derivative and letting it equal zero yields:</p> + +<div id="eq-4"></div> + +\[\begin{align} + 0 &amp;= 1 + \log q'(x) - \log p(x) + \lambda_0 + \lambda_1 x^2\\ + \log q'(x) &amp;\stackrel{1}{=} -\lambda_0 - 1 - (\frac{1}{2} + \lambda_1) x^2\\ + q'(x) &amp;= \exp(-\lambda_0 - 1 - (\frac{1}{2} + \lambda_1) x^2)\tag{4}\\ +\end{align}\] + +<p>where equal sign ‘$1$’ is because $\log p(x) = -\frac{1}{2}x^2 + C$, and the constant $C$ has been absorbed into $\lambda_0$.</p> + +<p>Plugging Equation (<a href="#eq-4">4</a>) back to <a href="#eq-3">⓵</a> and solving the integral yields:</p> + +<div id="eq-5.1"></div> + +\[\frac{\sqrt{\pi}\exp(-\lambda_0 - 1)}{\sqrt{\frac{1}{2} + \lambda_1}} = 1\tag{5.1}\] + +<p>Likewise, plugging (<a href="#eq-4">4</a>) back to <a href="#eq-3">⓶</a> and solving the integral yields:</p> + +<div id="eq-5.2"></div> + +\[\frac{\sqrt{\pi} \exp(-\lambda_0 - 1)}{2\sqrt{(\frac{1}{2} + \lambda_1)^3}} = \sigma^2\tag{5.2}\] + +<p>Solving Equations (<a href="#eq-5.1">5.1</a>, <a href="#eq-5.2">5.2</a>) gives:</p> + +<div id="eq-6"></div> + +\[\begin{cases} + \lambda_0 = -1 + \frac{1}{2} \log 2\pi\sigma^2\\ + \lambda_1 = -\frac{1}{2} + \frac{1}{2\sigma^2}\\ +\end{cases}\tag{6}\] + +<p>Plugging Equation (<a href="#eq-6">6</a>) to Equation (<a href="#eq-4">4</a>), it’s immediate that</p> + +\[q'(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{x^2}{2\sigma^2})\] + +<p>i.e., a Gaussian $\mathcal N(x \mid 0, \sigma^2)$. +Therefore, according to <a href="https://statproofbook.github.io/P/norm-kl">Soch, Joram, et al. (2024)</a>,</p> + +\[D_{KL}(q \parallel p) \ge \frac{1}{2}(\sigma^2 - \log\sigma^2 - 1)\]AbstractCompute accuracy from F1 score2024-07-06T01:51:59+00:002024-07-06T01:51:59+00:00https://kkew3.github.io/2024/07/06/compute-accuracy-from-f1-score<p>I encountered a similar problem today as the one in <a href="https://stackoverflow.com/questions/42041078/calculating-accuracy-from-precision-recall-f1-score-scikit-learn">this</a> post, where I wish to find the accuracy given F1 score only. +F1 score is <a href="https://datascience.stackexchange.com/a/65342/153995">well suited</a> to my imbalanced classification problem, so I compute it during training; but I then find it difficult to interprete. +There’s a surprising lack of relevant information when I searched the web. +Luckily, it’s not a difficult task either.</p> + +<p>Since each F1 score corresponds to a range of accuracies, we may regard finding the accuracy given F1 score an optimization problem. +The process consists of two steps: 1) find the minimum accuracy; 2) find the maximum accuracy. To find the maximum, we may reduce it to finding the <em>negative</em> of the minimum of the <em>negative</em> accuracy. +Thus we will only handle step 1 below.</p> + +<p>Known constants:</p> + +<ul> + <li>$s_F$: the F1 score.</li> + <li>$r_P$ and $r_N$: the positive and negative class ratio.</li> +</ul> + +<p>Variables:</p> + +<ul> + <li>$r_{TP}$, $r_{TN}$, $r_{FP}$, $r_{FN}$: the true positive, true negative, false positive and false negative ratio (i.e. divided by the total sample count).</li> +</ul> + +<p>Objective: +$s_A = r_{TP} + r_{TN}$.</p> + +<p>Constraints:</p> + +<ul> + <li>$r_{TP} \ge 0$, $r_{TN} \ge 0$, $r_{FP} \ge 0$, $r_{FN} \ge 0$.</li> + <li>$r_{TP} + r_{FN} = r_P$, $r_{TN} + r_{FP} = r_N$.</li> + <li>$\frac{2 \cdot r_{TP} / (r_{TP} + r_{FP}) \cdot r_{TP} / (r_{TP} + r_{FN})}{r_{TP} / (r_{TP} + r_{FP}) + r_{TP} / (r_{TP} + r_{FN})} = s_F$. The left hand side is just the F1 score formula.</li> +</ul> + +<p>Python implementation:</p> + +<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># jax is not necessary, just that I don't want to spend time on finding +# partial derivative of the F1 score with respect to true positive, +# etc. +</span><span class="kn">import</span> <span class="nn">jax</span> +<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> +<span class="kn">from</span> <span class="nn">scipy.special</span> <span class="kn">import</span> <span class="n">softmax</span> +<span class="kn">from</span> <span class="nn">scipy.optimize</span> <span class="kn">import</span> <span class="n">minimize</span> + +<span class="c1"># Used to avoid divid-by-zero error. +</span><span class="n">EPS</span> <span class="o">=</span> <span class="mf">1e-8</span> + +<span class="k">def</span> <span class="nf">f1_score_constraint</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">f1_score</span><span class="p">):</span> + <span class="s">""" + :param x: the array (tp, fp, tn, fn) + :param f1_score: the known F1 score + """</span> + <span class="n">tp</span><span class="p">,</span> <span class="n">fp</span><span class="p">,</span> <span class="n">fn</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> + <span class="n">precision</span> <span class="o">=</span> <span class="n">tp</span> <span class="o">/</span> <span class="p">(</span><span class="n">tp</span> <span class="o">+</span> <span class="n">fp</span><span class="p">)</span> + <span class="n">recall</span> <span class="o">=</span> <span class="n">tp</span> <span class="o">/</span> <span class="p">(</span><span class="n">tp</span> <span class="o">+</span> <span class="n">fn</span><span class="p">)</span> + <span class="k">return</span> <span class="mi">2</span> <span class="o">*</span> <span class="p">(</span><span class="n">precision</span> <span class="o">*</span> <span class="n">recall</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">precision</span> <span class="o">+</span> <span class="n">recall</span><span class="p">)</span> <span class="o">-</span> <span class="n">f1_score</span> + + +<span class="k">def</span> <span class="nf">positive_sum_constraint</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">n_positive</span><span class="p">):</span> + <span class="s">""" + :param x: the array (tp, fp, tn, fn) + :param n_positive: the known positive class ratio + """</span> + <span class="n">tp</span><span class="p">,</span> <span class="n">fn</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> + <span class="k">return</span> <span class="n">tp</span> <span class="o">+</span> <span class="n">fn</span> <span class="o">-</span> <span class="n">n_positive</span> + + +<span class="k">def</span> <span class="nf">negative_sum_constraint</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">n_negative</span><span class="p">):</span> + <span class="s">""" + :param x: the array (tp, fp, tn, fn) + :param n_negative: the known negative class ratio + """</span> + <span class="n">tn</span><span class="p">,</span> <span class="n">fp</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> + <span class="k">return</span> <span class="n">tn</span> <span class="o">+</span> <span class="n">fp</span> <span class="o">-</span> <span class="n">n_negative</span> + + +<span class="k">def</span> <span class="nf">accuracy</span><span class="p">(</span><span class="n">x</span><span class="p">):</span> + <span class="s">""" + :param x: the array (tp, fp, tn, fn) + """</span> + <span class="n">tp</span><span class="p">,</span> <span class="n">tn</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> + <span class="k">return</span> <span class="n">tp</span> <span class="o">+</span> <span class="n">tn</span> + + +<span class="c1"># Ideally this should give a feasible solution. But in practice, I +# find it works fine even if it's not feasible. +</span><span class="k">def</span> <span class="nf">rand_init</span><span class="p">():</span> + <span class="k">return</span> <span class="n">softmax</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span> + + +<span class="k">def</span> <span class="nf">find_min_accuracy_from_f1</span><span class="p">(</span><span class="n">f1_score</span><span class="p">,</span> <span class="n">n_positive</span><span class="p">,</span> <span class="n">n_negative</span><span class="p">):</span> + <span class="s">""" + :param f1_score: the known F1 socre + :param n_positive: the known positive class ratio + :param n_negative: the known negative class ratio + """</span> + <span class="n">res</span> <span class="o">=</span> <span class="n">minimize</span><span class="p">(</span> + <span class="n">accuracy</span><span class="p">,</span> + <span class="n">rand_init</span><span class="p">(),</span> + <span class="n">method</span><span class="o">=</span><span class="s">'SLSQP'</span><span class="p">,</span> + <span class="n">jac</span><span class="o">=</span><span class="n">jax</span><span class="p">.</span><span class="n">grad</span><span class="p">(</span><span class="n">accuracy</span><span class="p">),</span> + <span class="n">bounds</span><span class="o">=</span><span class="p">[(</span><span class="n">EPS</span><span class="p">,</span> <span class="bp">None</span><span class="p">),</span> <span class="p">(</span><span class="n">EPS</span><span class="p">,</span> <span class="bp">None</span><span class="p">),</span> <span class="p">(</span><span class="n">EPS</span><span class="p">,</span> <span class="bp">None</span><span class="p">),</span> <span class="p">(</span><span class="n">EPS</span><span class="p">,</span> <span class="bp">None</span><span class="p">)],</span> + <span class="n">constraints</span><span class="o">=</span><span class="p">[</span> + <span class="p">{</span> + <span class="s">'type'</span><span class="p">:</span> <span class="s">'eq'</span><span class="p">,</span> + <span class="s">'fun'</span><span class="p">:</span> <span class="n">f1_score_constraint</span><span class="p">,</span> + <span class="s">'jax'</span><span class="p">:</span> <span class="n">jax</span><span class="p">.</span><span class="n">grad</span><span class="p">(</span><span class="n">f1_score_constraint</span><span class="p">),</span> + <span class="s">'args'</span><span class="p">:</span> <span class="p">(</span><span class="n">f1_score</span><span class="p">,),</span> + <span class="p">},</span> + <span class="p">{</span> + <span class="s">'type'</span><span class="p">:</span> <span class="s">'eq'</span><span class="p">,</span> + <span class="s">'fun'</span><span class="p">:</span> <span class="n">positive_sum_constraint</span><span class="p">,</span> + <span class="s">'jac'</span><span class="p">:</span> <span class="n">jax</span><span class="p">.</span><span class="n">grad</span><span class="p">(</span><span class="n">positive_sum_constraint</span><span class="p">),</span> + <span class="s">'args'</span><span class="p">:</span> <span class="p">(</span><span class="n">n_positive</span><span class="p">,),</span> + <span class="p">},</span> + <span class="p">{</span> + <span class="s">'type'</span><span class="p">:</span> <span class="s">'eq'</span><span class="p">,</span> + <span class="s">'fun'</span><span class="p">:</span> <span class="n">negative_sum_constraint</span><span class="p">,</span> + <span class="s">'jac'</span><span class="p">:</span> <span class="n">jax</span><span class="p">.</span><span class="n">grad</span><span class="p">(</span><span class="n">negative_sum_constraint</span><span class="p">),</span> + <span class="s">'args'</span><span class="p">:</span> <span class="p">(</span><span class="n">n_negative</span><span class="p">,),</span> + <span class="p">},</span> + <span class="p">],</span> + <span class="n">options</span><span class="o">=</span><span class="p">{</span><span class="s">'maxiter'</span><span class="p">:</span> <span class="mi">1000</span><span class="p">},</span> + <span class="p">)</span> + <span class="k">return</span> <span class="n">res</span><span class="p">.</span><span class="n">fun</span> +</code></pre></div></div> + +<p>Calling the function <code class="language-plaintext highlighter-rouge">find_min_accuracy_from_f1</code> with data, we get the minimum possible accuracy given F1 score:</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt;&gt;&gt; find_min_accuracy_from_f1(0.457, 0.044, 0.9559) +0.8953 +</code></pre></div></div>I encountered a similar problem today as the one in this post, where I wish to find the accuracy given F1 score only. F1 score is well suited to my imbalanced classification problem, so I compute it during training; but I then find it difficult to interprete. There’s a surprising lack of relevant information when I searched the web. Luckily, it’s not a difficult task either.Leverage Ollama in iTerm2 AI integration2024-06-13T14:46:53+00:002024-06-13T14:46:53+00:00https://kkew3.github.io/2024/06/13/leverage-ollama-in-iterm2-ai-integration<h2 id="introduction">Introduction</h2> + +<p>Recently, <a href="https://iterm2.com/">iTerm2</a> released version <a href="https://iterm2.com/downloads/stable/iTerm2-3_5_0.changelog">3.5.0</a>, which includes generative AI integration in OpenAI API. +<a href="https://ollama.com/">Ollama</a> is an open platform for large language models (LLM). +Starting from February 2024, Ollama has built-in <a href="https://ollama.com/blog/openai-compatibility">support</a> of OpenAI chat completions API. +Putting them together, we can <a href="https://gitlab.com/gnachman/iterm2/-/issues/11455">now</a> ask AI to compose commands for us seamlessly in iTerm2 interface, using Ollama bot locally.</p> + +<h2 id="configuration">Configuration</h2> + +<p>Here are the steps to start using the AI integration in iTerm2:</p> + +<ol> + <li>Install the AI plugin from <a href="https://iterm2.com/ai-plugin.html">iTerm2 site</a>.</li> + <li>In iTerm2 preferences, under <code class="language-plaintext highlighter-rouge">General</code> section and <code class="language-plaintext highlighter-rouge">AI</code> tab, enter “OpenAI API key” with anything non-empty, fill in the <a href="https://gitlab.com/gnachman/iterm2/-/wikis/AI-Prompt">AI prompt</a>, specify the model and the custom URL.</li> +</ol> + +<p>For example, mine is like below:</p> + +<ul> + <li>OpenAI API key: <code class="language-plaintext highlighter-rouge">abc</code></li> + <li>AI prompt: <code class="language-plaintext highlighter-rouge">Return commands suitable for copy/pasting into \(shell) on \(uname). Do NOT include commentary NOR Markdown triple-backtick code blocks as your whole response will be copied into my terminal automatically. If not otherwise specified, you should always give at most one line of command. The command should do this: \(ai.prompt)</code>.</li> + <li>Model: <code class="language-plaintext highlighter-rouge">codegemma:instruct</code>.</li> + <li>Token limit: <code class="language-plaintext highlighter-rouge">16384</code>.</li> + <li>Custom URL: <code class="language-plaintext highlighter-rouge">http://localhost/v1/chat/completions</code>.</li> + <li>Use legacy “completions” API: false.</li> +</ul> + +<p>Remarks:</p> + +<ul> + <li>If your Ollama runs on a server in WLAN, e.g. at IP address <code class="language-plaintext highlighter-rouge">192.168.0.107</code>, just replace the <code class="language-plaintext highlighter-rouge">localhost</code> in custom URL with that IP address.</li> + <li>Don’t forget to start Ollama by <code class="language-plaintext highlighter-rouge">ollama serve</code> before using iTerm2’s AI integration.</li> +</ul> + +<h2 id="workflow">Workflow</h2> + +<p>My favorite iTerm2 workflow after the configuration above:</p> + +<ol> + <li>Press <code class="language-plaintext highlighter-rouge">command + shift + .</code> to activate the composer.</li> + <li>Specify my need in plain English, and press <code class="language-plaintext highlighter-rouge">command + y</code> to send the input text to Ollama.</li> + <li>After a few seconds, the text should be replaced by Ollama’s response.</li> + <li>Press <code class="language-plaintext highlighter-rouge">shift + enter</code> to send the response to the terminal.</li> +</ol> + +<p>A demo:</p> + +<p><img src="/assets/posts_imgs/2024-06-13/iterm2-ai-demo.gif" alt="demo" /></p>IntroductionLearn Bayesian Logistic regression from imbalanced data2024-05-17T03:21:31+00:002024-05-17T03:21:31+00:00https://kkew3.github.io/2024/05/17/learn-bayesian-lr-from-imbalanced-data<h2 id="dataset">Dataset</h2> + +<p><img src="/assets/posts_imgs/2024-05-17/dataset.png" alt="toy 2d dataset" /></p> + +<p>Obviously, this is an imbalanced dataset. +A dumb classifier may assign “yellow” to all points and yield apparently satisfactory accuracy.</p> + +<h2 id="bayesian-logistic-regression">Bayesian Logistic regression</h2> + +<p>Denote the $k$-th component of the softmax of $\boldsymbol z$ as:</p> + +\[\mathcal S_k(\boldsymbol z) \triangleq \frac{\exp(z_k)}{\sum_{k'}\exp(z_{k'})}\,.\] + +<p>The likelihood is:</p> + +\[p(y=k \mid \boldsymbol x, \mathbf W, \boldsymbol b) = \mathcal S_k(\mathbf W \boldsymbol x + \boldsymbol b)\,,\] + +<p>where matrix $\mathbf W$ consists of $K$ weight vector $\boldsymbol w_k \in \mathbb R^d$, $\boldsymbol x \in \mathbb R^d$, and $\boldsymbol b \in \mathbb R^K$.</p> + +<p>For now, assign an uninformative Gaussian prior:</p> + +\[\forall k,\ \boldsymbol w_k \sim \mathcal N(0, \mathbf I)\,,\quad b_k \sim \mathcal N(0, 1)\,. +\tag{1}\] + +<p>The posterior (given the dataset $\mathcal D$) is:</p> + +\[p(\mathbf W, \boldsymbol b \mid \mathcal D) \propto \prod_{k=1}^K p(\boldsymbol w_k) p(b_k) \prod_{j=1}^m p(y_j \mid \boldsymbol x_j, \mathbf W, \boldsymbol b)\,. +\tag{2.1}\] + +<p>The predictive posterior is:</p> + +\[p(y \mid \boldsymbol x, \mathcal D) = \int p(y \mid \boldsymbol x, \mathbf W, \boldsymbol b) p(\mathbf W, \boldsymbol b \mid \mathcal D)\,\mathrm d \mathbf W \mathrm d \boldsymbol b\,. +\tag{2.2}\] + +<p>Although both (2.1) and (2.2) are intractable, we may find $q(\mathbf W, \boldsymbol b) \approx p(\mathbf W, \boldsymbol b \mid \mathcal D)$ by variational inference, and estimate the predictive posterior by Monte Carlo after plugging in $q$. +Since such procedure is out of scope, we won’t include details about it.</p> + +<p>Let’s see the decision boundary and the uncertainty (measured by entropy) of the Bayesian LR:</p> + +<p><img src="/assets/posts_imgs/2024-05-17/uninformative-db.png" alt="uninformative decision boundary" /></p> + +<p><img src="/assets/posts_imgs/2024-05-17/uninformative-unc.png" alt="uninformative uncertainty" /></p> + +<p>The model learns to be a dumb classifier!</p> + +<p>We may apply rescaling (a.k.a. threshold shifting) to the learned classifier, by dividing the predictive posterior by the class prior (i.e. the proportion of samples of class $k$ in all samples), and use it to make prediction. +The rescaled decision boundary and uncertainty are:</p> + +<p><img src="/assets/posts_imgs/2024-05-17/uninformative-rescaled-db.png" alt="uninformative rescaled decision boundary" /></p> + +<p><img src="/assets/posts_imgs/2024-05-17/uninformative-rescaled-unc.png" alt="uninformative rescaled uncertainty" /></p> + +<p>This benefits the minority class, but deteriorates the overall accuracy <em>a lot</em>.</p> + +<h2 id="strengthen-the-prior">Strengthen the prior</h2> + +<p>It turns out that if we strengthen the prior (by increasing its precision, or equivalently, decreasing its variance) of the intercepts in (1), things become much better. +The new prior is:</p> + +\[\forall k,\ b_k \sim \mathcal N(0, 10^{-6})\,. +\tag{3}\] + +<p>What we just encode into the prior reads:</p> + +<blockquote> + <p>I’m pretty sure that the two class weigh the same, despite the “purple” class appears inferior.</p> +</blockquote> + +<p>The result plots are:</p> + +<p><img src="/assets/posts_imgs/2024-05-17/precise-uninformative-db.png" alt="precise uninformative decision boundary" /></p> + +<p><img src="/assets/posts_imgs/2024-05-17/precise-uninformative-unc.png" alt="precise uninformative uncertainty" /></p> + +<h2 id="bias-the-prior">Bias the prior</h2> + +<p>What if we go further by biasing the classifier a little towards the minority class ($k=0$, “purple”)? +The new prior is:</p> + +\[b_0 \sim \mathcal N(2, 10^{-6})\,,\quad b_1 \sim \mathcal N(0, 10^{-6})\,. +\tag{4}\] + +<p>This prior reads:</p> + +<blockquote> + <p>I’m pretty sure there’re even a bit more “purple” class than “yellow” class a priori, despite they’re not sampled as much in the dataset.</p> +</blockquote> + +<p>The plots are now:</p> + +<p><img src="/assets/posts_imgs/2024-05-17/precise-biased-db.png" alt="precise biased decision boundary" /></p> + +<p><img src="/assets/posts_imgs/2024-05-17/precise-biased-unc.png" alt="precise biased uncertainty" /></p> + +<p>Pefect!</p> + +<h2 id="conclusion">Conclusion</h2> + +<p>In this post, we see that under Bayesian framework, Bayesian LR is able to naturally combat imbalanced dataset by adjusting its prior belief.</p> + +<p>This <a href="https://github.com/kkew3/bayeslr-imbalanced">codebase</a> generates all the figures in the post.</p> + +<h2 id="appendix">Appendix</h2> + +<p>Features and labels of the toy dataset.</p> + +<p>The features:</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[-0.46601866, 1.18801609], + [ 0.53858625, 0.60716392], + [-0.97431137, 0.69753311], + [-1.09220402, 0.87799492], + [-2.03843356, 0.28665154], + [-0.34062009, 0.79352777], + [-1.16225216, 0.79350459], + [ 0.19419328, 1.60986703], + [ 0.41018415, 1.54828838], + [-0.61113336, 0.99020048], + [ 0.08837677, 0.95373644], + [-1.77183232, -0.12717568], + [-0.54560628, 1.07613052], + [-1.69901425, 0.55489764], + [-0.7449788 , 0.7519103 ], + [-1.84473763, 0.55248995], + [-0.50824943, 1.08964891], + [-1.35655196, 0.7102918 ], + [-0.71295569, 0.38030989], + [ 0.0582823 , 1.35158484], + [-2.74743505, -0.18849513], + [-2.36125827, -0.22542297], + [ 0.28512568, 1.52124326], + [-0.67059538, 0.61188467], + [-1.08310962, 0.57068698], + [-1.59421684, 0.32055693], + [-0.58608561, 0.98441983], + [ 0.91449962, 1.74231742], + [-1.78271812, 0.25676529], + [-0.30880495, 0.98633121], + [-0.80196522, 0.56542478], + [-1.64551419, 0.2527351 ], + [ 0.88404065, 1.80009243], + [ 0.07752252, 1.19103008], + [ 0.01499115, 1.35642701], + [-1.37772455, 0.58176578], + [-0.9893581 , 0.6000557 ], + [-0.20708577, 0.97773425], + [-0.97487675, 0.67788572], + [-0.84898247, 0.76214066], + [-2.87107864, 0.01823837], + [-1.52762479, 0.15224236], + [-1.19066619, 0.61716677], + [-0.78719074, 1.22733157], + [ 0.37887222, 1.38907542], + [-0.29892079, 1.20534091], + [-1.21904812, 0.45126808], + [-0.01954643, 1.00443244], + [-2.7534539 , -0.41174779], + [ 0.00290918, 1.19376387], + [-0.3465645 , 0.97372693], + [-0.38706669, 0.98612011], + [-0.3909804 , 1.1737113 ], + [ 0.67985963, 1.57038317], + [-1.5574845 , 0.38938231], + [-0.70276487, 0.84873314], + [-0.77152456, 1.24328845], + [-0.78685252, 0.71866813], + [-1.58251503, 0.47314274], + [-0.86990291, 1.01246542], + [-0.76296641, 1.03057172], + [-1.46908977, 0.50048994], + [ 0.41590518, 1.35808005], + [-0.23171796, 0.97466644], + [-0.35599838, 1.05651836], + [-1.86300113, 0.31105633], + [-1.06979785, 0.89343042], + [ 0.89051152, 1.36968058], + [-1.64250124, 0.5395521 ], + [ 0.19072792, 1.39594182], + [-0.68980859, 1.51412568], + [-0.66216014, 0.94064958], + [-1.98324693, 0.36500688], + [-1.77543305, 0.48759471], + [ 0.99143992, 1.53242166], + [-2.03402523, 0.27661546], + [-0.98138839, 0.86047666], + [ 0.86594322, 1.60352598], + [-1.25510995, 0.40788484], + [-1.28207069, 0.55164356], + [-0.50983219, 1.05505834], + [ 0.98003606, 0.56171673], + [-1.86097117, 0.44004685], + [-1.09945843, 0.63380337], + [-1.44294885, 0.18391039], + [-1.60512757, 0.25456073], + [ 0.5505329 , 1.63447114], + [-1.13622159, 0.87658095], + [-0.18029101, 0.98458234], + [-1.48031015, 0.3667454 ], + [ 0.94295697, 1.51965296], + [-1.94413955, 0.257857 ], + [-1.92812486, -0.15406208], + [-0.28437139, 0.8520255 ], + [-0.95551392, 0.28517945], + [-1.44252631, 0.5455637 ], + [-0.22064889, 1.33439538], + [-1.52749019, 0.50443876], + [ 0.757785 , 0.42124458], + [-0.49536512, 0.9627005 ]]) +</code></pre></div></div> + +<p>The labels:</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([1, + 0, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 0, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 0, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + 0, + 1, + 1, + 1, + 1, + 0, + 1, + 1, + 0, + 1]) +</code></pre></div></div>DatasetA simple numerical method to compute matrix inversion2024-02-26T10:57:01+00:002024-02-26T10:57:01+00:00https://kkew3.github.io/2024/02/26/simple-numerical-matrix-inversion<p>I need to do matrix inversion in C recently; so I did some research on how to implement it. +While the requirement later proves unnecessary, I want to jot down my efforts on this subject for future reference.</p> + +<p>(<a href="https://ntrs.nasa.gov/api/citations/19920002505/downloads/19920002505.pdf">Pan &amp; Schreiber, 1992</a>) proposed CUINV algorithm based on <a href="https://aalexan3.math.ncsu.edu/articles/mat-inv-rep.pdf">Newton’s iteration</a>. +It’s fast and simple to implement. +Here’s my verbatim reimplementation in Python, which is simple(?) (see TODO in comment) to translate to C.</p> + +<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span> + +<span class="k">def</span> <span class="nf">cuinv</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">maxiter</span><span class="p">,</span> <span class="n">tol</span><span class="p">):</span> + <span class="n">n</span> <span class="o">=</span> <span class="n">A</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> + <span class="n">I</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">eye</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> + <span class="n">s</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">svd</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">compute_uv</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="c1"># TODO: how to implement this? +</span> <span class="n">a0</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">/</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">min</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> + <span class="n">X</span> <span class="o">=</span> <span class="n">a0</span> <span class="o">*</span> <span class="n">A</span><span class="p">.</span><span class="n">T</span> + <span class="n">X_prev</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">X</span><span class="p">)</span> + <span class="n">T</span> <span class="o">=</span> <span class="n">X</span> <span class="o">@</span> <span class="n">A</span> + <span class="n">T2</span> <span class="o">=</span> <span class="bp">None</span> + <span class="n">t2_valid</span> <span class="o">=</span> <span class="bp">False</span> + <span class="n">diff</span> <span class="o">=</span> <span class="n">tol</span> <span class="o">+</span> <span class="mi">1</span> <span class="c1"># so that it runs at least one iteration +</span> + <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">maxiter</span><span class="p">):</span> + <span class="k">if</span> <span class="n">diff</span> <span class="o">&lt;</span> <span class="n">tol</span><span class="p">:</span> + <span class="k">break</span> + <span class="n">X</span> <span class="o">=</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">I</span> <span class="o">-</span> <span class="n">T</span><span class="p">)</span> <span class="o">@</span> <span class="n">X</span> + <span class="k">if</span> <span class="n">t2_valid</span><span class="p">:</span> + <span class="n">T</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">T</span> <span class="o">-</span> <span class="n">T2</span> + <span class="k">else</span><span class="p">:</span> + <span class="n">T</span> <span class="o">=</span> <span class="n">X</span> <span class="o">@</span> <span class="n">A</span> + <span class="n">t2_valid</span> <span class="o">=</span> <span class="bp">False</span> + <span class="k">if</span> <span class="n">np</span><span class="p">.</span><span class="n">trace</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">n</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">:</span> + <span class="n">T2</span> <span class="o">=</span> <span class="n">T</span> <span class="o">@</span> <span class="n">T</span> + <span class="n">delta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">T</span> <span class="o">-</span> <span class="n">T2</span><span class="p">,</span> <span class="nb">ord</span><span class="o">=</span><span class="s">'fro'</span><span class="p">)</span> + <span class="k">if</span> <span class="n">delta</span> <span class="o">&gt;=</span> <span class="mf">0.25</span><span class="p">:</span> + <span class="n">t2_valid</span> <span class="o">=</span> <span class="bp">True</span> + <span class="k">else</span><span class="p">:</span> + <span class="n">rho</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mf">0.25</span> <span class="o">-</span> <span class="n">delta</span><span class="p">)</span> + <span class="n">X</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">rho</span> <span class="o">*</span> <span class="p">(</span><span class="n">T2</span> <span class="o">-</span> <span class="p">(</span><span class="mi">2</span> <span class="o">+</span> <span class="n">rho</span><span class="p">)</span> <span class="o">*</span> <span class="n">T</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">rho</span><span class="p">)</span> <span class="o">*</span> <span class="n">I</span><span class="p">)</span> <span class="o">@</span> <span class="n">X</span> + <span class="n">T</span> <span class="o">=</span> <span class="n">X</span> <span class="o">@</span> <span class="n">A</span> + <span class="n">diff</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="n">X_prev</span><span class="p">,</span> <span class="nb">ord</span><span class="o">=</span><span class="s">'fro'</span><span class="p">)</span> + <span class="n">X_prev</span> <span class="o">=</span> <span class="n">X</span> + <span class="k">return</span> <span class="n">X</span> +</code></pre></div></div>I need to do matrix inversion in C recently; so I did some research on how to implement it. While the requirement later proves unnecessary, I want to jot down my efforts on this subject for future reference.Piecewise quadratic approximation of sigmoid(z) (1-sigmoid(z))2024-02-11T08:52:41+00:002024-02-11T08:52:41+00:00https://kkew3.github.io/2024/02/11/quad-approximate-sigmoid-derivative<p>This post shows an approach that approximates $\sigma(z)(1-\sigma(z))$ using piecewise quadratic function, where $\sigma(z)$ is defined to be $1/(1+\exp(-z))$, a.k.a. the sigmoid function.</p> + +<p>First, notice that $\sigma(z)(1-\sigma(z)) \approx \log(1+\exp(h - a z^2))$ for certain choice of $h$ and $a$:</p> + +<p><img src="/assets/posts_imgs/2024-02-11/dsigma-softplus.png" alt="softplus approximate dsigma" /></p> + +<p>Second, the approximator $\log(1+\exp(\cdot))$ is called a <a href="https://paperswithcode.com/method/softplus">softplus</a>. +So it’s natural to proceed: $\log(1+\exp(h - a z^2)) \approx \max(0, h - a z^2)$. +Our goal, then, is to choose the height parameter $h$ and width parameter $a$ such that $\sigma(z)(1-\sigma(z)) \approx \max(0, h - a z^2)$.</p> + +<p>The height parameter is straightforward to estimate. +We need only to match the max of $\sigma(z)(1-\sigma(z))$ to $h$. +Hence, $h := \sigma(0)(1-\sigma(0))$.</p> + +<p>Noticing that both the original function and the approximator are nonnegative, we may match up their integrals:</p> + +\[\int_{-\infty}^\infty \sigma(z)(1-\sigma(z))\,\mathrm d z = \int_{-\infty}^\infty \max(0, h - a z^2)\,\mathrm d z\] + +<p>where the left hand side is 1. +Plugging in the value of $h$, this equation solves to $a := \frac{16}{9}(\sigma(0)(1-\sigma(0)))^3$.</p> + +<p><img src="/assets/posts_imgs/2024-02-11/dsigma-maxquad.png" alt="max quad approximate dsigma" /></p>This post shows an approach that approximates $\sigma(z)(1-\sigma(z))$ using piecewise quadratic function, where $\sigma(z)$ is defined to be $1/(1+\exp(-z))$, a.k.a. the sigmoid function. \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 000000000..ed52e5784 --- /dev/null +++ b/index.html @@ -0,0 +1,510 @@ + + + + + + + + +Kaiwen’s personal website | My blogs and research reports. + + + + + + + + + + + + + + + + + + + + + +
+
+
+

Posts

+ + +

subscribe via RSS

+
+ +
+
+ + + diff --git a/pip/alfred-fzf-helper/alfred-fzf-helper-0.2.0.tar.gz b/pip/alfred-fzf-helper/alfred-fzf-helper-0.2.0.tar.gz new file mode 100644 index 000000000..ef83f7722 Binary files /dev/null and b/pip/alfred-fzf-helper/alfred-fzf-helper-0.2.0.tar.gz differ diff --git a/pip/index.html b/pip/index.html new file mode 100644 index 000000000..1fd515ec2 --- /dev/null +++ b/pip/index.html @@ -0,0 +1,96 @@ + + + + + + + + +Kaiwen’s personal website | My blogs and research reports. + + + + + + + + + + + + + + + + + + + + + +
+
+

Index of pip/index.html

+ + +
+
+ + + diff --git a/robots.txt b/robots.txt new file mode 100644 index 000000000..c60155e87 --- /dev/null +++ b/robots.txt @@ -0,0 +1 @@ +Sitemap: https://kkew3.github.io/sitemap.xml diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 000000000..e08154989 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,211 @@ + + + +https://kkew3.github.io/2016/03/25/manage-vxworks-tornado-executable-project-using-tcl.html +2016-03-25T15:33:31+00:00 + + +https://kkew3.github.io/2016/05/01/banker-algorithm-termination-condition-proof.html +2016-05-01T02:11:59+00:00 + + +https://kkew3.github.io/2016/09/02/validate-xml-via-dtd-using-java.html +2016-09-02T16:26:30+00:00 + + +https://kkew3.github.io/2016/12/27/apache-ant-extension-tutorial.html +2016-12-27T07:23:04+00:00 + + +https://kkew3.github.io/2017/04/23/relation-between-truncated-distribution-and-original-distribution.html +2017-04-23T16:29:35+00:00 + + +https://kkew3.github.io/2017/07/20/matlab-r2011b-neural-network-toolbox-note.html +2017-07-20T09:50:56+00:00 + + +https://kkew3.github.io/2020/05/22/pytorch-crop-images-differentially.html +2020-05-22T18:24:21+00:00 + + +https://kkew3.github.io/2022/02/05/align-strings-in-en-and-zh-like-bsd-ls.html +2022-02-05T10:49:30+00:00 + + +https://kkew3.github.io/2022/02/11/python-align-strings-in-en-and-zh.html +2022-02-11T06:27:19+00:00 + + +https://kkew3.github.io/2022/02/13/list-imported-python-modules-using-ast.html +2022-02-13T14:25:18+00:00 + + +https://kkew3.github.io/2022/02/17/python-tox-usage-note.html +2022-02-17T11:07:47+00:00 + + +https://kkew3.github.io/2022/05/18/python-cannot-import-name-sysconfig-from-distutils.html +2022-05-18T15:02:51+00:00 + + +https://kkew3.github.io/2022/05/24/sync-music-from-mac-to-ipad-without-itunes.html +2022-05-24T12:09:10+00:00 + + +https://kkew3.github.io/2022/05/26/develop-python-cpp-extension-using-cython.html +2022-05-26T14:19:31+00:00 + + +https://kkew3.github.io/2022/06/02/pass-dynamic-array-between-cpp-and-python.html +2022-06-02T08:55:34+00:00 + + +https://kkew3.github.io/2022/07/24/read-hdf5-from-cpp.html +2022-07-24T04:20:10+00:00 + + +https://kkew3.github.io/2022/07/24/set-up-github-pages-macos.html +2022-07-24T10:20:49+00:00 + + +https://kkew3.github.io/2022/08/09/notes-build-cython-using-setup-dot-py.html +2022-08-09T08:24:19+00:00 + + +https://kkew3.github.io/2022/08/31/vae-training-trick.html +2022-08-31T11:48:35+00:00 + + +https://kkew3.github.io/2023/03/05/learn-applescript-for-beginners.html +2023-03-05T07:54:00+00:00 + + +https://kkew3.github.io/2023/03/27/pizzahut-free-soda.html +2023-03-27T06:26:05+00:00 + + +https://kkew3.github.io/2023/04/26/how-to-decide-the-type-of-a-pokemon-quickly.html +2023-04-26T10:12:07+00:00 + + +https://kkew3.github.io/2023/07/05/connect-to-wsl2-from-another-machine-within-wlan.html +2023-07-05T04:59:03+00:00 + + +https://kkew3.github.io/2023/07/05/connect-to-jupyter-notebook-on-wsl2-from-another-machine-within-wlan.html +2023-07-05T07:57:59+00:00 + + +https://kkew3.github.io/2023/08/05/compute-svm-intercept.html +2023-08-05T08:08:26+00:00 + + +https://kkew3.github.io/2023/08/05/dual-of-dual-of-qp.html +2023-08-05T10:54:12+00:00 + + +https://kkew3.github.io/2023/09/10/make-use-of-openmp-via-cython-on-mac.html +2023-09-10T08:49:00+00:00 + + +https://kkew3.github.io/2023/09/24/verify-permutation-equivalence-of-multihead-attention-in-pytorch.html +2023-09-24T08:54:32+00:00 + + +https://kkew3.github.io/2023/10/04/estimate-expectation-of-function-of-random-variable.html +2023-10-04T07:36:32+00:00 + + +https://kkew3.github.io/2023/10/06/dimensionality-reduction-by-svd.html +2023-10-06T08:43:30+00:00 + + +https://kkew3.github.io/2023/11/03/map-estimation-cov-gmm.html +2023-11-03T08:03:17+00:00 + + +https://kkew3.github.io/2023/11/28/toss-coin.html +2023-11-28T11:55:37+00:00 + + +https://kkew3.github.io/2024/01/05/type-assertion-numba-trick.html +2024-01-05T09:04:32+00:00 + + +https://kkew3.github.io/2024/01/26/attempt-fully-differentiable-nnomp-alternative.html +2024-01-26T04:14:34+00:00 + + +https://kkew3.github.io/2024/02/01/make-faded-color-wallpaper-for-mac.html +2024-02-01T03:22:51+00:00 + + +https://kkew3.github.io/2024/02/04/host-python-packages-jekyll-github-pages.html +2024-02-04T09:31:26+00:00 + + +https://kkew3.github.io/2024/02/11/quad-approximate-sigmoid-derivative.html +2024-02-11T08:52:41+00:00 + + +https://kkew3.github.io/2024/02/26/simple-numerical-matrix-inversion.html +2024-02-26T10:57:01+00:00 + + +https://kkew3.github.io/2024/05/17/learn-bayesian-lr-from-imbalanced-data.html +2024-05-17T03:21:31+00:00 + + +https://kkew3.github.io/2024/06/13/leverage-ollama-in-iterm2-ai-integration.html +2024-06-13T14:46:53+00:00 + + +https://kkew3.github.io/2024/07/06/compute-accuracy-from-f1-score.html +2024-07-06T01:51:59+00:00 + + +https://kkew3.github.io/2024/08/09/lower-bound-of-kl-divergence-between-any-density-and-gaussian.html +2024-08-09T09:03:39+00:00 + + +https://kkew3.github.io/2024/08/09/gamma-in-bn-vae.html +2024-08-09T11:00:44+00:00 + + +https://kkew3.github.io/2024/08/27/conditioning-of-cvae-variational-posterior.html +2024-08-27T11:03:07+00:00 + + +https://kkew3.github.io/2024/11/10/jieba-vim.html +2024-11-10T06:53:06+00:00 + + +https://kkew3.github.io/2024/12/09/gaussian-kl-div-torch.html +2024-12-09T08:03:00+00:00 + + +https://kkew3.github.io/about/ + + +https://kkew3.github.io/docs/ + + +https://kkew3.github.io/pip/ + + +https://kkew3.github.io/ + + +https://kkew3.github.io/assets/spare-time-research/chem-eq-balance.pdf +2024-12-19T08:21:46+00:00 + + +https://kkew3.github.io/assets/spare-time-research/covid19-test-analysis.pdf +2024-12-19T08:21:46+00:00 + + +https://kkew3.github.io/assets/spare-time-research/cross-walker.pdf +2024-12-19T08:21:46+00:00 + + diff --git a/tags/algorithm/index.html b/tags/algorithm/index.html new file mode 100644 index 000000000..b71e0c264 --- /dev/null +++ b/tags/algorithm/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “algorithm” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "algorithm"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--applescript/index.html b/tags/dev--applescript/index.html new file mode 100644 index 000000000..5241e0ee6 --- /dev/null +++ b/tags/dev--applescript/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “dev/applescript” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/applescript"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--cpp/index.html b/tags/dev--cpp/index.html new file mode 100644 index 000000000..c51598d24 --- /dev/null +++ b/tags/dev--cpp/index.html @@ -0,0 +1,120 @@ + + + + + + + + +Posts with tag “dev/c++” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/c++"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--cython/index.html b/tags/dev--cython/index.html new file mode 100644 index 000000000..d6417d493 --- /dev/null +++ b/tags/dev--cython/index.html @@ -0,0 +1,120 @@ + + + + + + + + +Posts with tag “dev/cython” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/cython"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--java/index.html b/tags/dev--java/index.html new file mode 100644 index 000000000..e2cfa568c --- /dev/null +++ b/tags/dev--java/index.html @@ -0,0 +1,113 @@ + + + + + + + + +Posts with tag “dev/java” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/java"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--matlab/index.html b/tags/dev--matlab/index.html new file mode 100644 index 000000000..89a695090 --- /dev/null +++ b/tags/dev--matlab/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “dev/matlab” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/matlab"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--network/index.html b/tags/dev--network/index.html new file mode 100644 index 000000000..866e3e036 --- /dev/null +++ b/tags/dev--network/index.html @@ -0,0 +1,113 @@ + + + + + + + + +Posts with tag “dev/network” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/network"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--python/index.html b/tags/dev--python/index.html new file mode 100644 index 000000000..169b35c0f --- /dev/null +++ b/tags/dev--python/index.html @@ -0,0 +1,176 @@ + + + + + + + + +Posts with tag “dev/python” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/python"

+
+ +
+
    +
  1. +

    Host Python packages with Jekyll on GitHub Pages

    + + +
  2. +
  3. +

    使用 matplotlib 制作用于 macOS 的渐变色桌面

    +
    + os/macOS | misc | dev/python +
    + +
  4. +
  5. +

    Assert variable types in numba

    + + +
  6. +
  7. +

    Notes on building Cython using setup.py

    + + +
  8. +
  9. +

    使用Cython在Python和C++间互传大小事先未知的numpy数组

    + + +
  10. +
  11. +

    使用Cython为Python开发C++扩展

    + + +
  12. +
  13. +

    python cannot import name 'sysconfig' from 'distutils'

    + + +
  14. +
  15. +

    Python Tox 使用笔记

    + + +
  16. +
  17. +

    使用抽象语法树ast统计哪些Python包与模块被导入了

    + + +
  18. +
  19. +

    如何在Python中对齐中英文混排字符串

    + + +
  20. +
  21. +

    像BSD ls 一样中英文混排字符串(Python3)

    + + +
  22. +
+
+
+ +
+
+ + + diff --git a/tags/dev--pytorch/index.html b/tags/dev--pytorch/index.html new file mode 100644 index 000000000..092d62f80 --- /dev/null +++ b/tags/dev--pytorch/index.html @@ -0,0 +1,120 @@ + + + + + + + + +Posts with tag “dev/pytorch” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/pytorch"

+
+ + +
+ +
+
+ + + diff --git a/tags/dev--tcl/index.html b/tags/dev--tcl/index.html new file mode 100644 index 000000000..f4cae0442 --- /dev/null +++ b/tags/dev--tcl/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “dev/tcl” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "dev/tcl"

+
+ + +
+ +
+
+ + + diff --git a/tags/editor--vim/index.html b/tags/editor--vim/index.html new file mode 100644 index 000000000..6771f2930 --- /dev/null +++ b/tags/editor--vim/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “editor/vim” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "editor/vim"

+
+ + +
+ +
+
+ + + diff --git a/tags/index.html b/tags/index.html new file mode 100644 index 000000000..8cf3dfe54 --- /dev/null +++ b/tags/index.html @@ -0,0 +1,258 @@ + + + + + + + + +Tags | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+ +
+

Tags

+
+ +
+ + +
+ +
+ +
+
+ + + diff --git a/tags/math--approx/index.html b/tags/math--approx/index.html new file mode 100644 index 000000000..d243fe378 --- /dev/null +++ b/tags/math--approx/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “math/approximation” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "math/approximation"

+
+ + +
+ +
+
+ + + diff --git a/tags/math--la/index.html b/tags/math--la/index.html new file mode 100644 index 000000000..d69b0a2d3 --- /dev/null +++ b/tags/math--la/index.html @@ -0,0 +1,127 @@ + + + + + + + + +Posts with tag “math/linear algebra” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "math/linear algebra"

+
+ + +
+ +
+
+ + + diff --git a/tags/math--prob/index.html b/tags/math--prob/index.html new file mode 100644 index 000000000..69f428bae --- /dev/null +++ b/tags/math--prob/index.html @@ -0,0 +1,148 @@ + + + + + + + + +Posts with tag “math/probability” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "math/probability"

+
+ +
+
    +
  1. +

    KL divergence between two full-rank Gaussians in PyTorch

    + + +
  2. +
  3. +

    Conditioning of the variational posterior in CVAE (Sohn, 2015)

    + + +
  4. +
  5. +

    Lower bound of KL divergence between any density and Gaussian

    + + +
  6. +
  7. +

    Estimate the head probability of a coin

    + + +
  8. +
  9. +

    Maximum a posteriori estimation of the covariance in Gaussian Mixture models

    + + +
  10. +
  11. +

    Estimate the expectation of the function of a random variable

    + + +
  12. +
  13. +

    被截短的随机分布与原分布的关系

    + + +
  14. +
+
+
+ +
+
+ + + diff --git a/tags/misc/index.html b/tags/misc/index.html new file mode 100644 index 000000000..caa608654 --- /dev/null +++ b/tags/misc/index.html @@ -0,0 +1,127 @@ + + + + + + + + +Posts with tag “misc” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "misc"

+
+ + +
+ +
+
+ + + diff --git a/tags/ml--bayes/index.html b/tags/ml--bayes/index.html new file mode 100644 index 000000000..8551443d2 --- /dev/null +++ b/tags/ml--bayes/index.html @@ -0,0 +1,120 @@ + + + + + + + + +Posts with tag “machine learning/bayesian” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "machine learning/bayesian"

+
+ + +
+ +
+
+ + + diff --git a/tags/ml--dict/index.html b/tags/ml--dict/index.html new file mode 100644 index 000000000..e1fdf394e --- /dev/null +++ b/tags/ml--dict/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “machine learning/dictionary learning” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "machine learning/dictionary learning"

+
+ + +
+ +
+
+ + + diff --git a/tags/ml--svm/index.html b/tags/ml--svm/index.html new file mode 100644 index 000000000..9b32599fd --- /dev/null +++ b/tags/ml--svm/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “machine learning/svm” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "machine learning/svm"

+
+ + +
+ +
+
+ + + diff --git a/tags/ml/index.html b/tags/ml/index.html new file mode 100644 index 000000000..c2f504b49 --- /dev/null +++ b/tags/ml/index.html @@ -0,0 +1,120 @@ + + + + + + + + +Posts with tag “machine learning” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "machine learning"

+
+ + +
+ +
+
+ + + diff --git a/tags/os--ios/index.html b/tags/os--ios/index.html new file mode 100644 index 000000000..64f97d12a --- /dev/null +++ b/tags/os--ios/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “os/ios” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "os/ios"

+
+ + +
+ +
+
+ + + diff --git a/tags/os--macos/index.html b/tags/os--macos/index.html new file mode 100644 index 000000000..44fcf28b1 --- /dev/null +++ b/tags/os--macos/index.html @@ -0,0 +1,120 @@ + + + + + + + + +Posts with tag “os/macOS” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "os/macOS"

+
+ + +
+ +
+
+ + + diff --git a/tags/os--ubuntu/index.html b/tags/os--ubuntu/index.html new file mode 100644 index 000000000..babc9fbd6 --- /dev/null +++ b/tags/os--ubuntu/index.html @@ -0,0 +1,104 @@ + + + + + + + + +Posts with tag “os/ubuntu” | Kaiwen’s personal website + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+

Posts tagged with "os/ubuntu"

+
+ + +
+ +
+
+ + +