diff --git a/.yarnrc.yml b/.yarnrc.yml new file mode 100644 index 00000000..72cf3b41 --- /dev/null +++ b/.yarnrc.yml @@ -0,0 +1 @@ +nmHoistingLimits: workspaces diff --git a/README.md b/README.md index 04a7a733..f4bca0c5 100644 --- a/README.md +++ b/README.md @@ -1,67 +1,9 @@ - - # 人人都能用英语 -## 目录 +## 书 -- [简介](README.md) -- [第一章:起点](chapter1.md) -- [第二章:口语](chapter2.md) -- [第三章:语音](chapter3.md) -- [第四章:朗读](chapter4.md) -- [第五章:词典](chapter5.md) -- [第六章:语法](chapter6.md) -- [第七章:精读](chapter7.md) -- [第八章:叮嘱](chapter8.md) -- [后记](end.md) +- [《人人都能用英语》](./book/README.md) +## Enjoy.bot - -## 前言 - - ->有一天,有个人在 Twitter 上提问: - ->> @maozhu1: @xiaolai 还请李老师用 140 字概括一下怎样才能学好英语? - ->我回复说: -> ->>其实一个字就够了:“用”。 - -这本书里的文字,全部的意义,只有两个字:“启发”。 - -有些知识,不仅要了解,还要深入了解。为了深入了解,不仅要学习,还要实践,更要反复试错,在成功中获得激励,在失败中汲取教训,路漫漫其修远,上下求索才可能修成正果。小到开车,大到创业,各种所需要的知识莫不如是。面对这样的知识,我们要了解: - -* What──它究竟是什么? -* Why──为什么它是那个样子? -* How──要掌握它、应用它,必须得遵循什么样的步骤? - -然而,有另外一种知识,往往还是格外重要的知识,在知道它的那一瞬间就可能开始发挥重大的作用,甚至,在知道它(What)的那一瞬间,它所有的重大作用全部都发挥完毕(至于 Why 和 How,甚至可能在了解它的 What 那一瞬间早已经不言自明)。 - -在我个人的记忆里,一路上遇到过很多这种 “只要知道就能够瞬间全部发挥作用” 的知识。学概率统计的时候,遇到 “独立事件” 这个概念,就是这类知识的典型例子。在此之前,我很自然地以为如果连续 9 次抛硬币都是正面朝上,那么第 10 次抛出硬币之后正面朝上的可能性要远远低于背面朝上的可能性 …… 在概率教科书里读到 “独立事件” 的那一瞬间,让我意识到之前的想法是多么的可笑。因为抛硬币正反面的几率是永远都相同的(硬币出现正反面在每次抛时都是相互独立、不受之前结果影响的),各占 50%,所以即便我抛一百次,一万次,甚至更多次都是正面朝上,下一次抛正反面的几率也还是如此,各占 50%。 - -至此,这个知识的所有作用已经全部发挥完毕:它能够彻头彻尾地改变一些人──那些一不小心看到它实际意义的人。无论是谁,在做几乎所有决定的时候,都要考虑 “可能性”(学称 “概率”)。在我不知道 “独立事件” 这个概念之前所做出的很多决定,换在知道 “独立事件” 这个概念之后,我是无论如何都不会那样选择的──这就是改变,并且是质变。 - -另一个令我记忆深刻的例子是很小的时候学习编程语言。多年以来,受影响最深的,并不是当时所学的 BASIC,或者是后来所学的 PASCAL,抑或再后来学的 C、C++什么的;受影响最深的是一种思考方式──在运行程序之前,要反复浏览代码,在脑子里进行预演;而不是写完程序直接运行,出错了再说。这是节省时间提高效率的重要方式。刚开始并不知道 “了解了这种操作方式” 给自己带来了多大的影响;可是,许多年之后,观察到身边大多数人从来都没有 “做事之前先在脑子里预演” 的习惯,才明白很小的时候知道了那样的做法给自己带来的巨大好处──并且是没办法给那些不知道的人讲明白的好处。(也许正因为如此,才总是有人这样无奈罢:会的人,自然会了,不会的人,无论如何也不会。) -更为关键的是,这种知识的获取,是 “不可逆的”。在你知道它的那一瞬间,它就已经改变了一切,你的生活因它而变,再也无法复原。我们再也不可能对这种知识视而不见,听而不闻,置之不理,它瞬间就能根深蒂固,无法铲除。比如,真正理解概率统计常识的人,是绝对不会去买彩票的 …… 因为买彩票这种行为在他们眼里相当于不尊重自己的智商和已学过的知识。可与此同时,彩票是地球上最畅销的商品,可见有多少人一生都未曾有机会了解那些重要的知识。 - -当然,读到这里,读者都会同意:也许最值得传播的(至少应该为之努力的)就是这种知识。传播它们的最大意义,甚至唯一的意义和目的,就是 “启发”。这本书里的文字,全部的意义,只有两个字:“启发”。作者真诚地希望读者在读过这些文字之后,(起码)在英语使用方面有所启发。 - -也许有些读者会发现里面有 “太多的废话”,那仅仅是因为这类知识的特性:它们太 “貌不惊人”,如果非要它们以本来的面目出现,他们是不会注意到的──无论它们实际上有多么重要。否则,他们早就被这类知识彻头彻尾地改变了 …… 可事实上他们不还是原来那个样子么? - -也许有些读者自然而然地依照思维惯性,希望看到更多的 “How”。可正如之前所解释的那样,这类知识中的大多数,读者在了解到 “What” 的那一瞬间,它的作用就已经全部发挥完毕,“Why” 也许只不过是多余的解释,而 “How” 或者不言自明,或者因人而异,需要读者自己摸索…… - -这类知识注定属于少数人。除了之前已经说过的原因之外,还有另外一个重要的原因:人们只愿传播自己相信的知识──哪怕那所谓的知识根本就是错的。重要的不是对错,重要的是感觉。尽管 “觉得正确” 和 “正确” 相差可能十万八千里,但对大多数人来说根本就是一回事儿,起码,他们情愿那就是一回事儿。所以,容易传递的是那些 “很自然地就令人相信(感觉)是正确的”,而不是 “事实上正确的” 事情。 - -前些年我写《把时间当作朋友》的时候,多少对它的传播是悲观的,因为我知道那些文字里所传递的绝大多数是这种道理、这类知识──它们的本质决定了其传播的困难程度。所以,那时候,我觉得只要有零星那么几个人能够被那些文字所改变,就已经万幸了。互联网时代充满了奇迹,那些文字在网上以每个月几万点击的数量被浏览。而 2009 年印刷成书出版后,竟然可以在一年之内重印 11 次,再次出乎我的意料。哪怕真正接受那些道理的读者比率再低,由于基数巨大,我知道我还是结结实实地改变了很多人的。 - -正是这样的动力,使我甘心、耐心地写下这本书里的文字。知道它们无论如何都会再次改变一些人──尽管我自己永远没办法知道最终改变的究竟是哪些人。生活中充满了无奈,然而有些人幸运如我,毕竟能够改变点什么。于是,那些无奈,那些不幸,就多少淡了一些。 - -事实上,这本书也是《把时间当做朋友》的具体延续。《把时间当做朋友》的主旨很简单:时间不会听从我们的管理,我们最多只能与时间做朋友;与时间做朋友的方法只不过是 “用正确的方式做正确的事情”。而这本书,只不过是 把 “正确的事情” 聚焦在 “用英语” 上而已,而后再看看可能的 “正确的方式” 究竟是什么。 - -### 李笑来 - -* 2010 年春于上海初稿 -* 2010 年冬于北京修改 -* 2015 年于北京重新制作网络开放版本 -* 2019 年于北京上传至 GitHub +- [Enjoy App](./enjoy/) diff --git a/book/README.md b/book/README.md new file mode 100644 index 00000000..04a7a733 --- /dev/null +++ b/book/README.md @@ -0,0 +1,67 @@ + + +# 人人都能用英语 + +## 目录 + +- [简介](README.md) +- [第一章:起点](chapter1.md) +- [第二章:口语](chapter2.md) +- [第三章:语音](chapter3.md) +- [第四章:朗读](chapter4.md) +- [第五章:词典](chapter5.md) +- [第六章:语法](chapter6.md) +- [第七章:精读](chapter7.md) +- [第八章:叮嘱](chapter8.md) +- [后记](end.md) + + + +## 前言 + + +>有一天,有个人在 Twitter 上提问: + +>> @maozhu1: @xiaolai 还请李老师用 140 字概括一下怎样才能学好英语? + +>我回复说: +> +>>其实一个字就够了:“用”。 + +这本书里的文字,全部的意义,只有两个字:“启发”。 + +有些知识,不仅要了解,还要深入了解。为了深入了解,不仅要学习,还要实践,更要反复试错,在成功中获得激励,在失败中汲取教训,路漫漫其修远,上下求索才可能修成正果。小到开车,大到创业,各种所需要的知识莫不如是。面对这样的知识,我们要了解: + +* What──它究竟是什么? +* Why──为什么它是那个样子? +* How──要掌握它、应用它,必须得遵循什么样的步骤? + +然而,有另外一种知识,往往还是格外重要的知识,在知道它的那一瞬间就可能开始发挥重大的作用,甚至,在知道它(What)的那一瞬间,它所有的重大作用全部都发挥完毕(至于 Why 和 How,甚至可能在了解它的 What 那一瞬间早已经不言自明)。 + +在我个人的记忆里,一路上遇到过很多这种 “只要知道就能够瞬间全部发挥作用” 的知识。学概率统计的时候,遇到 “独立事件” 这个概念,就是这类知识的典型例子。在此之前,我很自然地以为如果连续 9 次抛硬币都是正面朝上,那么第 10 次抛出硬币之后正面朝上的可能性要远远低于背面朝上的可能性 …… 在概率教科书里读到 “独立事件” 的那一瞬间,让我意识到之前的想法是多么的可笑。因为抛硬币正反面的几率是永远都相同的(硬币出现正反面在每次抛时都是相互独立、不受之前结果影响的),各占 50%,所以即便我抛一百次,一万次,甚至更多次都是正面朝上,下一次抛正反面的几率也还是如此,各占 50%。 + +至此,这个知识的所有作用已经全部发挥完毕:它能够彻头彻尾地改变一些人──那些一不小心看到它实际意义的人。无论是谁,在做几乎所有决定的时候,都要考虑 “可能性”(学称 “概率”)。在我不知道 “独立事件” 这个概念之前所做出的很多决定,换在知道 “独立事件” 这个概念之后,我是无论如何都不会那样选择的──这就是改变,并且是质变。 + +另一个令我记忆深刻的例子是很小的时候学习编程语言。多年以来,受影响最深的,并不是当时所学的 BASIC,或者是后来所学的 PASCAL,抑或再后来学的 C、C++什么的;受影响最深的是一种思考方式──在运行程序之前,要反复浏览代码,在脑子里进行预演;而不是写完程序直接运行,出错了再说。这是节省时间提高效率的重要方式。刚开始并不知道 “了解了这种操作方式” 给自己带来了多大的影响;可是,许多年之后,观察到身边大多数人从来都没有 “做事之前先在脑子里预演” 的习惯,才明白很小的时候知道了那样的做法给自己带来的巨大好处──并且是没办法给那些不知道的人讲明白的好处。(也许正因为如此,才总是有人这样无奈罢:会的人,自然会了,不会的人,无论如何也不会。) +更为关键的是,这种知识的获取,是 “不可逆的”。在你知道它的那一瞬间,它就已经改变了一切,你的生活因它而变,再也无法复原。我们再也不可能对这种知识视而不见,听而不闻,置之不理,它瞬间就能根深蒂固,无法铲除。比如,真正理解概率统计常识的人,是绝对不会去买彩票的 …… 因为买彩票这种行为在他们眼里相当于不尊重自己的智商和已学过的知识。可与此同时,彩票是地球上最畅销的商品,可见有多少人一生都未曾有机会了解那些重要的知识。 + +当然,读到这里,读者都会同意:也许最值得传播的(至少应该为之努力的)就是这种知识。传播它们的最大意义,甚至唯一的意义和目的,就是 “启发”。这本书里的文字,全部的意义,只有两个字:“启发”。作者真诚地希望读者在读过这些文字之后,(起码)在英语使用方面有所启发。 + +也许有些读者会发现里面有 “太多的废话”,那仅仅是因为这类知识的特性:它们太 “貌不惊人”,如果非要它们以本来的面目出现,他们是不会注意到的──无论它们实际上有多么重要。否则,他们早就被这类知识彻头彻尾地改变了 …… 可事实上他们不还是原来那个样子么? + +也许有些读者自然而然地依照思维惯性,希望看到更多的 “How”。可正如之前所解释的那样,这类知识中的大多数,读者在了解到 “What” 的那一瞬间,它的作用就已经全部发挥完毕,“Why” 也许只不过是多余的解释,而 “How” 或者不言自明,或者因人而异,需要读者自己摸索…… + +这类知识注定属于少数人。除了之前已经说过的原因之外,还有另外一个重要的原因:人们只愿传播自己相信的知识──哪怕那所谓的知识根本就是错的。重要的不是对错,重要的是感觉。尽管 “觉得正确” 和 “正确” 相差可能十万八千里,但对大多数人来说根本就是一回事儿,起码,他们情愿那就是一回事儿。所以,容易传递的是那些 “很自然地就令人相信(感觉)是正确的”,而不是 “事实上正确的” 事情。 + +前些年我写《把时间当作朋友》的时候,多少对它的传播是悲观的,因为我知道那些文字里所传递的绝大多数是这种道理、这类知识──它们的本质决定了其传播的困难程度。所以,那时候,我觉得只要有零星那么几个人能够被那些文字所改变,就已经万幸了。互联网时代充满了奇迹,那些文字在网上以每个月几万点击的数量被浏览。而 2009 年印刷成书出版后,竟然可以在一年之内重印 11 次,再次出乎我的意料。哪怕真正接受那些道理的读者比率再低,由于基数巨大,我知道我还是结结实实地改变了很多人的。 + +正是这样的动力,使我甘心、耐心地写下这本书里的文字。知道它们无论如何都会再次改变一些人──尽管我自己永远没办法知道最终改变的究竟是哪些人。生活中充满了无奈,然而有些人幸运如我,毕竟能够改变点什么。于是,那些无奈,那些不幸,就多少淡了一些。 + +事实上,这本书也是《把时间当做朋友》的具体延续。《把时间当做朋友》的主旨很简单:时间不会听从我们的管理,我们最多只能与时间做朋友;与时间做朋友的方法只不过是 “用正确的方式做正确的事情”。而这本书,只不过是 把 “正确的事情” 聚焦在 “用英语” 上而已,而后再看看可能的 “正确的方式” 究竟是什么。 + +### 李笑来 + +* 2010 年春于上海初稿 +* 2010 年冬于北京修改 +* 2015 年于北京重新制作网络开放版本 +* 2019 年于北京上传至 GitHub diff --git a/chapter1.md b/book/chapter1.md similarity index 100% rename from chapter1.md rename to book/chapter1.md diff --git a/chapter2.md b/book/chapter2.md similarity index 100% rename from chapter2.md rename to book/chapter2.md diff --git a/chapter3.md b/book/chapter3.md similarity index 100% rename from chapter3.md rename to book/chapter3.md diff --git a/chapter4.md b/book/chapter4.md similarity index 100% rename from chapter4.md rename to book/chapter4.md diff --git a/chapter5.md b/book/chapter5.md similarity index 100% rename from chapter5.md rename to book/chapter5.md diff --git a/chapter6.md b/book/chapter6.md similarity index 100% rename from chapter6.md rename to book/chapter6.md diff --git a/chapter7.md b/book/chapter7.md similarity index 100% rename from chapter7.md rename to book/chapter7.md diff --git a/chapter8.md b/book/chapter8.md similarity index 100% rename from chapter8.md rename to book/chapter8.md diff --git a/end.md b/book/end.md similarity index 100% rename from end.md rename to book/end.md diff --git a/files/GRE-Analytical-Writing-Argument-Task-Topics.md b/book/files/GRE-Analytical-Writing-Argument-Task-Topics.md similarity index 100% rename from files/GRE-Analytical-Writing-Argument-Task-Topics.md rename to book/files/GRE-Analytical-Writing-Argument-Task-Topics.md diff --git a/files/GRE-Analytical-Writing-Issue-Task-Topics.md b/book/files/GRE-Analytical-Writing-Issue-Task-Topics.md similarity index 100% rename from files/GRE-Analytical-Writing-Issue-Task-Topics.md rename to book/files/GRE-Analytical-Writing-Issue-Task-Topics.md diff --git a/files/TOEFL-writing-topics.pdf b/book/files/TOEFL-writing-topics.pdf similarity index 100% rename from files/TOEFL-writing-topics.pdf rename to book/files/TOEFL-writing-topics.pdf diff --git a/files/TOELFL-PartC-93/audios/1.mp3 b/book/files/TOELFL-PartC-93/audios/1.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/1.mp3 rename to book/files/TOELFL-PartC-93/audios/1.mp3 diff --git a/files/TOELFL-PartC-93/audios/10.mp3 b/book/files/TOELFL-PartC-93/audios/10.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/10.mp3 rename to book/files/TOELFL-PartC-93/audios/10.mp3 diff --git a/files/TOELFL-PartC-93/audios/11.mp3 b/book/files/TOELFL-PartC-93/audios/11.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/11.mp3 rename to book/files/TOELFL-PartC-93/audios/11.mp3 diff --git a/files/TOELFL-PartC-93/audios/12.mp3 b/book/files/TOELFL-PartC-93/audios/12.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/12.mp3 rename to book/files/TOELFL-PartC-93/audios/12.mp3 diff --git a/files/TOELFL-PartC-93/audios/13.mp3 b/book/files/TOELFL-PartC-93/audios/13.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/13.mp3 rename to book/files/TOELFL-PartC-93/audios/13.mp3 diff --git a/files/TOELFL-PartC-93/audios/14.mp3 b/book/files/TOELFL-PartC-93/audios/14.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/14.mp3 rename to book/files/TOELFL-PartC-93/audios/14.mp3 diff --git a/files/TOELFL-PartC-93/audios/15.mp3 b/book/files/TOELFL-PartC-93/audios/15.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/15.mp3 rename to book/files/TOELFL-PartC-93/audios/15.mp3 diff --git a/files/TOELFL-PartC-93/audios/16.mp3 b/book/files/TOELFL-PartC-93/audios/16.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/16.mp3 rename to book/files/TOELFL-PartC-93/audios/16.mp3 diff --git a/files/TOELFL-PartC-93/audios/17.mp3 b/book/files/TOELFL-PartC-93/audios/17.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/17.mp3 rename to book/files/TOELFL-PartC-93/audios/17.mp3 diff --git a/files/TOELFL-PartC-93/audios/18.mp3 b/book/files/TOELFL-PartC-93/audios/18.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/18.mp3 rename to book/files/TOELFL-PartC-93/audios/18.mp3 diff --git a/files/TOELFL-PartC-93/audios/19.mp3 b/book/files/TOELFL-PartC-93/audios/19.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/19.mp3 rename to book/files/TOELFL-PartC-93/audios/19.mp3 diff --git a/files/TOELFL-PartC-93/audios/2.mp3 b/book/files/TOELFL-PartC-93/audios/2.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/2.mp3 rename to book/files/TOELFL-PartC-93/audios/2.mp3 diff --git a/files/TOELFL-PartC-93/audios/20.mp3 b/book/files/TOELFL-PartC-93/audios/20.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/20.mp3 rename to book/files/TOELFL-PartC-93/audios/20.mp3 diff --git a/files/TOELFL-PartC-93/audios/21.mp3 b/book/files/TOELFL-PartC-93/audios/21.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/21.mp3 rename to book/files/TOELFL-PartC-93/audios/21.mp3 diff --git a/files/TOELFL-PartC-93/audios/22.mp3 b/book/files/TOELFL-PartC-93/audios/22.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/22.mp3 rename to book/files/TOELFL-PartC-93/audios/22.mp3 diff --git a/files/TOELFL-PartC-93/audios/23.mp3 b/book/files/TOELFL-PartC-93/audios/23.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/23.mp3 rename to book/files/TOELFL-PartC-93/audios/23.mp3 diff --git a/files/TOELFL-PartC-93/audios/24.mp3 b/book/files/TOELFL-PartC-93/audios/24.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/24.mp3 rename to book/files/TOELFL-PartC-93/audios/24.mp3 diff --git a/files/TOELFL-PartC-93/audios/25.mp3 b/book/files/TOELFL-PartC-93/audios/25.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/25.mp3 rename to book/files/TOELFL-PartC-93/audios/25.mp3 diff --git a/files/TOELFL-PartC-93/audios/26.mp3 b/book/files/TOELFL-PartC-93/audios/26.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/26.mp3 rename to book/files/TOELFL-PartC-93/audios/26.mp3 diff --git a/files/TOELFL-PartC-93/audios/27.mp3 b/book/files/TOELFL-PartC-93/audios/27.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/27.mp3 rename to book/files/TOELFL-PartC-93/audios/27.mp3 diff --git a/files/TOELFL-PartC-93/audios/28.mp3 b/book/files/TOELFL-PartC-93/audios/28.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/28.mp3 rename to book/files/TOELFL-PartC-93/audios/28.mp3 diff --git a/files/TOELFL-PartC-93/audios/29.mp3 b/book/files/TOELFL-PartC-93/audios/29.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/29.mp3 rename to book/files/TOELFL-PartC-93/audios/29.mp3 diff --git a/files/TOELFL-PartC-93/audios/3.mp3 b/book/files/TOELFL-PartC-93/audios/3.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/3.mp3 rename to book/files/TOELFL-PartC-93/audios/3.mp3 diff --git a/files/TOELFL-PartC-93/audios/30.mp3 b/book/files/TOELFL-PartC-93/audios/30.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/30.mp3 rename to book/files/TOELFL-PartC-93/audios/30.mp3 diff --git a/files/TOELFL-PartC-93/audios/31.mp3 b/book/files/TOELFL-PartC-93/audios/31.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/31.mp3 rename to book/files/TOELFL-PartC-93/audios/31.mp3 diff --git a/files/TOELFL-PartC-93/audios/32.mp3 b/book/files/TOELFL-PartC-93/audios/32.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/32.mp3 rename to book/files/TOELFL-PartC-93/audios/32.mp3 diff --git a/files/TOELFL-PartC-93/audios/33.mp3 b/book/files/TOELFL-PartC-93/audios/33.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/33.mp3 rename to book/files/TOELFL-PartC-93/audios/33.mp3 diff --git a/files/TOELFL-PartC-93/audios/34.mp3 b/book/files/TOELFL-PartC-93/audios/34.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/34.mp3 rename to book/files/TOELFL-PartC-93/audios/34.mp3 diff --git a/files/TOELFL-PartC-93/audios/35.mp3 b/book/files/TOELFL-PartC-93/audios/35.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/35.mp3 rename to book/files/TOELFL-PartC-93/audios/35.mp3 diff --git a/files/TOELFL-PartC-93/audios/36.mp3 b/book/files/TOELFL-PartC-93/audios/36.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/36.mp3 rename to book/files/TOELFL-PartC-93/audios/36.mp3 diff --git a/files/TOELFL-PartC-93/audios/37.mp3 b/book/files/TOELFL-PartC-93/audios/37.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/37.mp3 rename to book/files/TOELFL-PartC-93/audios/37.mp3 diff --git a/files/TOELFL-PartC-93/audios/38.mp3 b/book/files/TOELFL-PartC-93/audios/38.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/38.mp3 rename to book/files/TOELFL-PartC-93/audios/38.mp3 diff --git a/files/TOELFL-PartC-93/audios/39.mp3 b/book/files/TOELFL-PartC-93/audios/39.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/39.mp3 rename to book/files/TOELFL-PartC-93/audios/39.mp3 diff --git a/files/TOELFL-PartC-93/audios/4.mp3 b/book/files/TOELFL-PartC-93/audios/4.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/4.mp3 rename to book/files/TOELFL-PartC-93/audios/4.mp3 diff --git a/files/TOELFL-PartC-93/audios/40.mp3 b/book/files/TOELFL-PartC-93/audios/40.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/40.mp3 rename to book/files/TOELFL-PartC-93/audios/40.mp3 diff --git a/files/TOELFL-PartC-93/audios/41.mp3 b/book/files/TOELFL-PartC-93/audios/41.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/41.mp3 rename to book/files/TOELFL-PartC-93/audios/41.mp3 diff --git a/files/TOELFL-PartC-93/audios/42.mp3 b/book/files/TOELFL-PartC-93/audios/42.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/42.mp3 rename to book/files/TOELFL-PartC-93/audios/42.mp3 diff --git a/files/TOELFL-PartC-93/audios/43.mp3 b/book/files/TOELFL-PartC-93/audios/43.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/43.mp3 rename to book/files/TOELFL-PartC-93/audios/43.mp3 diff --git a/files/TOELFL-PartC-93/audios/44.mp3 b/book/files/TOELFL-PartC-93/audios/44.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/44.mp3 rename to book/files/TOELFL-PartC-93/audios/44.mp3 diff --git a/files/TOELFL-PartC-93/audios/45.mp3 b/book/files/TOELFL-PartC-93/audios/45.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/45.mp3 rename to book/files/TOELFL-PartC-93/audios/45.mp3 diff --git a/files/TOELFL-PartC-93/audios/46.mp3 b/book/files/TOELFL-PartC-93/audios/46.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/46.mp3 rename to book/files/TOELFL-PartC-93/audios/46.mp3 diff --git a/files/TOELFL-PartC-93/audios/47.mp3 b/book/files/TOELFL-PartC-93/audios/47.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/47.mp3 rename to book/files/TOELFL-PartC-93/audios/47.mp3 diff --git a/files/TOELFL-PartC-93/audios/48.mp3 b/book/files/TOELFL-PartC-93/audios/48.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/48.mp3 rename to book/files/TOELFL-PartC-93/audios/48.mp3 diff --git a/files/TOELFL-PartC-93/audios/49.mp3 b/book/files/TOELFL-PartC-93/audios/49.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/49.mp3 rename to book/files/TOELFL-PartC-93/audios/49.mp3 diff --git a/files/TOELFL-PartC-93/audios/5.mp3 b/book/files/TOELFL-PartC-93/audios/5.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/5.mp3 rename to book/files/TOELFL-PartC-93/audios/5.mp3 diff --git a/files/TOELFL-PartC-93/audios/50.mp3 b/book/files/TOELFL-PartC-93/audios/50.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/50.mp3 rename to book/files/TOELFL-PartC-93/audios/50.mp3 diff --git a/files/TOELFL-PartC-93/audios/51.mp3 b/book/files/TOELFL-PartC-93/audios/51.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/51.mp3 rename to book/files/TOELFL-PartC-93/audios/51.mp3 diff --git a/files/TOELFL-PartC-93/audios/52.mp3 b/book/files/TOELFL-PartC-93/audios/52.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/52.mp3 rename to book/files/TOELFL-PartC-93/audios/52.mp3 diff --git a/files/TOELFL-PartC-93/audios/53.mp3 b/book/files/TOELFL-PartC-93/audios/53.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/53.mp3 rename to book/files/TOELFL-PartC-93/audios/53.mp3 diff --git a/files/TOELFL-PartC-93/audios/54.mp3 b/book/files/TOELFL-PartC-93/audios/54.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/54.mp3 rename to book/files/TOELFL-PartC-93/audios/54.mp3 diff --git a/files/TOELFL-PartC-93/audios/55.mp3 b/book/files/TOELFL-PartC-93/audios/55.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/55.mp3 rename to book/files/TOELFL-PartC-93/audios/55.mp3 diff --git a/files/TOELFL-PartC-93/audios/56.mp3 b/book/files/TOELFL-PartC-93/audios/56.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/56.mp3 rename to book/files/TOELFL-PartC-93/audios/56.mp3 diff --git a/files/TOELFL-PartC-93/audios/57.mp3 b/book/files/TOELFL-PartC-93/audios/57.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/57.mp3 rename to book/files/TOELFL-PartC-93/audios/57.mp3 diff --git a/files/TOELFL-PartC-93/audios/58.mp3 b/book/files/TOELFL-PartC-93/audios/58.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/58.mp3 rename to book/files/TOELFL-PartC-93/audios/58.mp3 diff --git a/files/TOELFL-PartC-93/audios/59.mp3 b/book/files/TOELFL-PartC-93/audios/59.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/59.mp3 rename to book/files/TOELFL-PartC-93/audios/59.mp3 diff --git a/files/TOELFL-PartC-93/audios/6.mp3 b/book/files/TOELFL-PartC-93/audios/6.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/6.mp3 rename to book/files/TOELFL-PartC-93/audios/6.mp3 diff --git a/files/TOELFL-PartC-93/audios/60.mp3 b/book/files/TOELFL-PartC-93/audios/60.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/60.mp3 rename to book/files/TOELFL-PartC-93/audios/60.mp3 diff --git a/files/TOELFL-PartC-93/audios/61.mp3 b/book/files/TOELFL-PartC-93/audios/61.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/61.mp3 rename to book/files/TOELFL-PartC-93/audios/61.mp3 diff --git a/files/TOELFL-PartC-93/audios/62.mp3 b/book/files/TOELFL-PartC-93/audios/62.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/62.mp3 rename to book/files/TOELFL-PartC-93/audios/62.mp3 diff --git a/files/TOELFL-PartC-93/audios/63.mp3 b/book/files/TOELFL-PartC-93/audios/63.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/63.mp3 rename to book/files/TOELFL-PartC-93/audios/63.mp3 diff --git a/files/TOELFL-PartC-93/audios/64.mp3 b/book/files/TOELFL-PartC-93/audios/64.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/64.mp3 rename to book/files/TOELFL-PartC-93/audios/64.mp3 diff --git a/files/TOELFL-PartC-93/audios/65.mp3 b/book/files/TOELFL-PartC-93/audios/65.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/65.mp3 rename to book/files/TOELFL-PartC-93/audios/65.mp3 diff --git a/files/TOELFL-PartC-93/audios/66.mp3 b/book/files/TOELFL-PartC-93/audios/66.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/66.mp3 rename to book/files/TOELFL-PartC-93/audios/66.mp3 diff --git a/files/TOELFL-PartC-93/audios/67.mp3 b/book/files/TOELFL-PartC-93/audios/67.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/67.mp3 rename to book/files/TOELFL-PartC-93/audios/67.mp3 diff --git a/files/TOELFL-PartC-93/audios/68.mp3 b/book/files/TOELFL-PartC-93/audios/68.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/68.mp3 rename to book/files/TOELFL-PartC-93/audios/68.mp3 diff --git a/files/TOELFL-PartC-93/audios/69.mp3 b/book/files/TOELFL-PartC-93/audios/69.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/69.mp3 rename to book/files/TOELFL-PartC-93/audios/69.mp3 diff --git a/files/TOELFL-PartC-93/audios/7.mp3 b/book/files/TOELFL-PartC-93/audios/7.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/7.mp3 rename to book/files/TOELFL-PartC-93/audios/7.mp3 diff --git a/files/TOELFL-PartC-93/audios/70.mp3 b/book/files/TOELFL-PartC-93/audios/70.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/70.mp3 rename to book/files/TOELFL-PartC-93/audios/70.mp3 diff --git a/files/TOELFL-PartC-93/audios/71.mp3 b/book/files/TOELFL-PartC-93/audios/71.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/71.mp3 rename to book/files/TOELFL-PartC-93/audios/71.mp3 diff --git a/files/TOELFL-PartC-93/audios/72.mp3 b/book/files/TOELFL-PartC-93/audios/72.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/72.mp3 rename to book/files/TOELFL-PartC-93/audios/72.mp3 diff --git a/files/TOELFL-PartC-93/audios/73.mp3 b/book/files/TOELFL-PartC-93/audios/73.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/73.mp3 rename to book/files/TOELFL-PartC-93/audios/73.mp3 diff --git a/files/TOELFL-PartC-93/audios/74.mp3 b/book/files/TOELFL-PartC-93/audios/74.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/74.mp3 rename to book/files/TOELFL-PartC-93/audios/74.mp3 diff --git a/files/TOELFL-PartC-93/audios/75.mp3 b/book/files/TOELFL-PartC-93/audios/75.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/75.mp3 rename to book/files/TOELFL-PartC-93/audios/75.mp3 diff --git a/files/TOELFL-PartC-93/audios/76.mp3 b/book/files/TOELFL-PartC-93/audios/76.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/76.mp3 rename to book/files/TOELFL-PartC-93/audios/76.mp3 diff --git a/files/TOELFL-PartC-93/audios/77.mp3 b/book/files/TOELFL-PartC-93/audios/77.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/77.mp3 rename to book/files/TOELFL-PartC-93/audios/77.mp3 diff --git a/files/TOELFL-PartC-93/audios/78.mp3 b/book/files/TOELFL-PartC-93/audios/78.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/78.mp3 rename to book/files/TOELFL-PartC-93/audios/78.mp3 diff --git a/files/TOELFL-PartC-93/audios/79.mp3 b/book/files/TOELFL-PartC-93/audios/79.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/79.mp3 rename to book/files/TOELFL-PartC-93/audios/79.mp3 diff --git a/files/TOELFL-PartC-93/audios/8.mp3 b/book/files/TOELFL-PartC-93/audios/8.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/8.mp3 rename to book/files/TOELFL-PartC-93/audios/8.mp3 diff --git a/files/TOELFL-PartC-93/audios/80.mp3 b/book/files/TOELFL-PartC-93/audios/80.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/80.mp3 rename to book/files/TOELFL-PartC-93/audios/80.mp3 diff --git a/files/TOELFL-PartC-93/audios/81.mp3 b/book/files/TOELFL-PartC-93/audios/81.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/81.mp3 rename to book/files/TOELFL-PartC-93/audios/81.mp3 diff --git a/files/TOELFL-PartC-93/audios/82.mp3 b/book/files/TOELFL-PartC-93/audios/82.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/82.mp3 rename to book/files/TOELFL-PartC-93/audios/82.mp3 diff --git a/files/TOELFL-PartC-93/audios/83.mp3 b/book/files/TOELFL-PartC-93/audios/83.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/83.mp3 rename to book/files/TOELFL-PartC-93/audios/83.mp3 diff --git a/files/TOELFL-PartC-93/audios/84.mp3 b/book/files/TOELFL-PartC-93/audios/84.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/84.mp3 rename to book/files/TOELFL-PartC-93/audios/84.mp3 diff --git a/files/TOELFL-PartC-93/audios/85.mp3 b/book/files/TOELFL-PartC-93/audios/85.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/85.mp3 rename to book/files/TOELFL-PartC-93/audios/85.mp3 diff --git a/files/TOELFL-PartC-93/audios/86.mp3 b/book/files/TOELFL-PartC-93/audios/86.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/86.mp3 rename to book/files/TOELFL-PartC-93/audios/86.mp3 diff --git a/files/TOELFL-PartC-93/audios/87.mp3 b/book/files/TOELFL-PartC-93/audios/87.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/87.mp3 rename to book/files/TOELFL-PartC-93/audios/87.mp3 diff --git a/files/TOELFL-PartC-93/audios/88.mp3 b/book/files/TOELFL-PartC-93/audios/88.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/88.mp3 rename to book/files/TOELFL-PartC-93/audios/88.mp3 diff --git a/files/TOELFL-PartC-93/audios/89.mp3 b/book/files/TOELFL-PartC-93/audios/89.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/89.mp3 rename to book/files/TOELFL-PartC-93/audios/89.mp3 diff --git a/files/TOELFL-PartC-93/audios/9.mp3 b/book/files/TOELFL-PartC-93/audios/9.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/9.mp3 rename to book/files/TOELFL-PartC-93/audios/9.mp3 diff --git a/files/TOELFL-PartC-93/audios/90.mp3 b/book/files/TOELFL-PartC-93/audios/90.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/90.mp3 rename to book/files/TOELFL-PartC-93/audios/90.mp3 diff --git a/files/TOELFL-PartC-93/audios/91.mp3 b/book/files/TOELFL-PartC-93/audios/91.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/91.mp3 rename to book/files/TOELFL-PartC-93/audios/91.mp3 diff --git a/files/TOELFL-PartC-93/audios/92.mp3 b/book/files/TOELFL-PartC-93/audios/92.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/92.mp3 rename to book/files/TOELFL-PartC-93/audios/92.mp3 diff --git a/files/TOELFL-PartC-93/audios/93.mp3 b/book/files/TOELFL-PartC-93/audios/93.mp3 similarity index 100% rename from files/TOELFL-PartC-93/audios/93.mp3 rename to book/files/TOELFL-PartC-93/audios/93.mp3 diff --git a/files/TOELFL-PartC-93/text/text-1-93.md b/book/files/TOELFL-PartC-93/text/text-1-93.md similarity index 100% rename from files/TOELFL-PartC-93/text/text-1-93.md rename to book/files/TOELFL-PartC-93/text/text-1-93.md diff --git a/images/cover.jpg b/book/images/cover.jpg similarity index 100% rename from images/cover.jpg rename to book/images/cover.jpg diff --git a/images/figure01.png b/book/images/figure01.png similarity index 100% rename from images/figure01.png rename to book/images/figure01.png diff --git a/images/figure02.png b/book/images/figure02.png similarity index 100% rename from images/figure02.png rename to book/images/figure02.png diff --git a/images/figure03.png b/book/images/figure03.png similarity index 100% rename from images/figure03.png rename to book/images/figure03.png diff --git a/images/figure04.png b/book/images/figure04.png similarity index 100% rename from images/figure04.png rename to book/images/figure04.png diff --git a/images/figure05.png b/book/images/figure05.png similarity index 100% rename from images/figure05.png rename to book/images/figure05.png diff --git a/images/figure06.png b/book/images/figure06.png similarity index 100% rename from images/figure06.png rename to book/images/figure06.png diff --git a/images/figure07.png b/book/images/figure07.png similarity index 100% rename from images/figure07.png rename to book/images/figure07.png diff --git a/images/figure08.png b/book/images/figure08.png similarity index 100% rename from images/figure08.png rename to book/images/figure08.png diff --git a/images/figure09.png b/book/images/figure09.png similarity index 100% rename from images/figure09.png rename to book/images/figure09.png diff --git a/images/figure10.png b/book/images/figure10.png similarity index 100% rename from images/figure10.png rename to book/images/figure10.png diff --git a/images/figure11.png b/book/images/figure11.png similarity index 100% rename from images/figure11.png rename to book/images/figure11.png diff --git a/images/figure12.png b/book/images/figure12.png similarity index 100% rename from images/figure12.png rename to book/images/figure12.png diff --git a/images/figure13.png b/book/images/figure13.png similarity index 100% rename from images/figure13.png rename to book/images/figure13.png diff --git a/images/figure14.png b/book/images/figure14.png similarity index 100% rename from images/figure14.png rename to book/images/figure14.png diff --git a/images/figure15.png b/book/images/figure15.png similarity index 100% rename from images/figure15.png rename to book/images/figure15.png diff --git a/images/figure16.png b/book/images/figure16.png similarity index 100% rename from images/figure16.png rename to book/images/figure16.png diff --git a/images/figure17.png b/book/images/figure17.png similarity index 100% rename from images/figure17.png rename to book/images/figure17.png diff --git a/images/figure18.png b/book/images/figure18.png similarity index 100% rename from images/figure18.png rename to book/images/figure18.png diff --git a/images/figure19.png b/book/images/figure19.png similarity index 100% rename from images/figure19.png rename to book/images/figure19.png diff --git a/images/figure20.png b/book/images/figure20.png similarity index 100% rename from images/figure20.png rename to book/images/figure20.png diff --git a/images/figure21.png b/book/images/figure21.png similarity index 100% rename from images/figure21.png rename to book/images/figure21.png diff --git a/images/figure22.png b/book/images/figure22.png similarity index 100% rename from images/figure22.png rename to book/images/figure22.png diff --git a/images/figure23.png b/book/images/figure23.png similarity index 100% rename from images/figure23.png rename to book/images/figure23.png diff --git a/images/figure24.png b/book/images/figure24.png similarity index 100% rename from images/figure24.png rename to book/images/figure24.png diff --git a/images/figure25.png b/book/images/figure25.png similarity index 100% rename from images/figure25.png rename to book/images/figure25.png diff --git a/images/figure26.png b/book/images/figure26.png similarity index 100% rename from images/figure26.png rename to book/images/figure26.png diff --git a/images/figure27.png b/book/images/figure27.png similarity index 100% rename from images/figure27.png rename to book/images/figure27.png diff --git a/images/figure272.png b/book/images/figure272.png similarity index 100% rename from images/figure272.png rename to book/images/figure272.png diff --git a/images/figure28.png b/book/images/figure28.png similarity index 100% rename from images/figure28.png rename to book/images/figure28.png diff --git a/images/figure29.png b/book/images/figure29.png similarity index 100% rename from images/figure29.png rename to book/images/figure29.png diff --git a/images/figure30.png b/book/images/figure30.png similarity index 100% rename from images/figure30.png rename to book/images/figure30.png diff --git a/images/figure31.png b/book/images/figure31.png similarity index 100% rename from images/figure31.png rename to book/images/figure31.png diff --git a/images/figure32.png b/book/images/figure32.png similarity index 100% rename from images/figure32.png rename to book/images/figure32.png diff --git a/images/figure33.png b/book/images/figure33.png similarity index 100% rename from images/figure33.png rename to book/images/figure33.png diff --git a/images/figure34.png b/book/images/figure34.png similarity index 100% rename from images/figure34.png rename to book/images/figure34.png diff --git a/images/figure35.png b/book/images/figure35.png similarity index 100% rename from images/figure35.png rename to book/images/figure35.png diff --git a/images/figure36.png b/book/images/figure36.png similarity index 100% rename from images/figure36.png rename to book/images/figure36.png diff --git a/images/figure37.png b/book/images/figure37.png similarity index 100% rename from images/figure37.png rename to book/images/figure37.png diff --git a/images/figure38.png b/book/images/figure38.png similarity index 100% rename from images/figure38.png rename to book/images/figure38.png diff --git a/images/figure39.png b/book/images/figure39.png similarity index 100% rename from images/figure39.png rename to book/images/figure39.png diff --git a/images/figure40.png b/book/images/figure40.png similarity index 100% rename from images/figure40.png rename to book/images/figure40.png diff --git a/images/figure41.png b/book/images/figure41.png similarity index 100% rename from images/figure41.png rename to book/images/figure41.png diff --git a/images/figure42.png b/book/images/figure42.png similarity index 100% rename from images/figure42.png rename to book/images/figure42.png diff --git a/images/figure43.png b/book/images/figure43.png similarity index 100% rename from images/figure43.png rename to book/images/figure43.png diff --git a/images/figure44.png b/book/images/figure44.png similarity index 100% rename from images/figure44.png rename to book/images/figure44.png diff --git a/images/figure45.png b/book/images/figure45.png similarity index 100% rename from images/figure45.png rename to book/images/figure45.png diff --git a/images/figure46.png b/book/images/figure46.png similarity index 100% rename from images/figure46.png rename to book/images/figure46.png diff --git a/images/figure47.png b/book/images/figure47.png similarity index 100% rename from images/figure47.png rename to book/images/figure47.png diff --git a/images/figure48.png b/book/images/figure48.png similarity index 100% rename from images/figure48.png rename to book/images/figure48.png diff --git a/images/figure49.png b/book/images/figure49.png similarity index 100% rename from images/figure49.png rename to book/images/figure49.png diff --git a/images/figure50.png b/book/images/figure50.png similarity index 100% rename from images/figure50.png rename to book/images/figure50.png diff --git a/images/figure51.png b/book/images/figure51.png similarity index 100% rename from images/figure51.png rename to book/images/figure51.png diff --git a/images/figure52.png b/book/images/figure52.png similarity index 100% rename from images/figure52.png rename to book/images/figure52.png diff --git a/images/figure53.png b/book/images/figure53.png similarity index 100% rename from images/figure53.png rename to book/images/figure53.png diff --git a/images/figure54.png b/book/images/figure54.png similarity index 100% rename from images/figure54.png rename to book/images/figure54.png diff --git a/images/figure55.png b/book/images/figure55.png similarity index 100% rename from images/figure55.png rename to book/images/figure55.png diff --git a/images/figure56.png b/book/images/figure56.png similarity index 100% rename from images/figure56.png rename to book/images/figure56.png diff --git a/images/figure57.png b/book/images/figure57.png similarity index 100% rename from images/figure57.png rename to book/images/figure57.png diff --git a/images/figure58.png b/book/images/figure58.png similarity index 100% rename from images/figure58.png rename to book/images/figure58.png diff --git a/images/figure59.png b/book/images/figure59.png similarity index 100% rename from images/figure59.png rename to book/images/figure59.png diff --git a/images/figure60.png b/book/images/figure60.png similarity index 100% rename from images/figure60.png rename to book/images/figure60.png diff --git a/images/figure61.png b/book/images/figure61.png similarity index 100% rename from images/figure61.png rename to book/images/figure61.png diff --git a/images/figure62.png b/book/images/figure62.png similarity index 100% rename from images/figure62.png rename to book/images/figure62.png diff --git a/images/figure63.png b/book/images/figure63.png similarity index 100% rename from images/figure63.png rename to book/images/figure63.png diff --git a/enjoy/.eslintrc.json b/enjoy/.eslintrc.json new file mode 100644 index 00000000..b1321444 --- /dev/null +++ b/enjoy/.eslintrc.json @@ -0,0 +1,24 @@ +{ + "env": { + "browser": true, + "es6": true, + "node": true + }, + "extends": [ + "eslint:recommended", + "plugin:@typescript-eslint/eslint-recommended", + "plugin:@typescript-eslint/recommended", + "plugin:import/recommended", + "plugin:import/electron", + "plugin:import/typescript" + ], + "parser": "@typescript-eslint/parser", + "settings": { + "import/resolver": { + "typescript": {} + } + }, + "rules": { + "@typescript-eslint/no-explicit-any": "off" + } +} diff --git a/enjoy/.gitattributes b/enjoy/.gitattributes new file mode 100644 index 00000000..647fbc09 --- /dev/null +++ b/enjoy/.gitattributes @@ -0,0 +1,2 @@ +# ffmpeg.exe filter=lfs diff=lfs merge=lfs -text +# ffprobe.exe filter=lfs diff=lfs merge=lfs -text diff --git a/enjoy/.github/dependabot.yml b/enjoy/.github/dependabot.yml new file mode 100644 index 00000000..3a3cce57 --- /dev/null +++ b/enjoy/.github/dependabot.yml @@ -0,0 +1,11 @@ +# To get started with Dependabot version updates, you'll need to specify which +# package ecosystems to update and where the package manifests are located. +# Please see the documentation for all configuration options: +# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates + +version: 2 +updates: + - package-ecosystem: "npm" # See documentation for possible values + directory: "/" # Location of package manifests + schedule: + interval: "weekly" diff --git a/enjoy/.github/workflows/build.yml b/enjoy/.github/workflows/build.yml new file mode 100644 index 00000000..1264920d --- /dev/null +++ b/enjoy/.github/workflows/build.yml @@ -0,0 +1,18 @@ +name: Build +on: workflow_dispatch + +jobs: + build: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [macos-11, macos-13, macos-latest, windows-latest, ubuntu-latest] + steps: + - uses: actions/checkout@v3 + - uses: actions/setup-node@v3 + with: + node-version: 20 + - name: install dependencies + run: yarn install + - name: build + run: yarn package diff --git a/enjoy/.github/workflows/release.yml b/enjoy/.github/workflows/release.yml new file mode 100644 index 00000000..3416c7c5 --- /dev/null +++ b/enjoy/.github/workflows/release.yml @@ -0,0 +1,24 @@ +name: Release +on: workflow_dispatch + +jobs: + publish: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [macos-latest, windows-latest, ubuntu-latest] + steps: + - uses: actions/checkout@v3 + - uses: actions/setup-node@v3 + with: + node-version: 20 + - name: install dependencies + run: yarn install + - name: publish + env: + GITHUB_TOKEN: ${{ secrets.PUBLISH_TOKEN }} + run: yarn run publish + - if: matrix.os == 'macos-latest' + env: + GITHUB_TOKEN: ${{ secrets.PUBLISH_TOKEN }} + run: yarn run publish --arch=arm64 diff --git a/enjoy/.gitignore b/enjoy/.gitignore new file mode 100644 index 00000000..3cb2ddef --- /dev/null +++ b/enjoy/.gitignore @@ -0,0 +1,102 @@ +# Logs +logs +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* +lerna-debug.log* + +# Diagnostic reports (https://nodejs.org/api/report.html) +report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json + +# Runtime data +pids +*.pid +*.seed +*.pid.lock +.DS_Store + +# Directory for instrumented libs generated by jscoverage/JSCover +lib-cov + +# Coverage directory used by tools like istanbul +coverage +*.lcov + +# nyc test coverage +.nyc_output + +# node-waf configuration +.lock-wscript + +# Compiled binary addons (https://nodejs.org/api/addons.html) +build/Release + +# Dependency directories +node_modules/ +jspm_packages/ + +# TypeScript v1 declaration files +typings/ + +# TypeScript cache +*.tsbuildinfo + +# Optional npm cache directory +.npm + +# Optional eslint cache +.eslintcache + +# Optional REPL history +.node_repl_history + +# Output of 'npm pack' +*.tgz + +# Yarn Integrity file +.yarn-integrity + +# dotenv environment variables file +.env +.env.test + +# parcel-bundler cache (https://parceljs.org/) +.cache + +# next.js build output +.next + +# nuxt.js build output +.nuxt + +# vuepress build output +.vuepress/dist + +# Serverless directories +.serverless/ + +# FuseBox cache +.fusebox/ + +# DynamoDB Local files +.dynamodb/ + +# Webpack +.webpack/ + +# Vite +.vite/ + +# Electron-Forge +out/ + +# bun & npm lock file +bun.lockb +package-lock.json + +# vscode +.vscode/ + +# ffmpeg +lib/ffmpeg/* diff --git a/enjoy/.gitmodules b/enjoy/.gitmodules new file mode 100644 index 00000000..e69de29b diff --git a/enjoy/README.md b/enjoy/README.md new file mode 100644 index 00000000..ff85c582 --- /dev/null +++ b/enjoy/README.md @@ -0,0 +1,6 @@ +# Enjoy + +```bash +yarn install +yarn start +``` diff --git a/enjoy/assets/ffmpeg-logo.svg b/enjoy/assets/ffmpeg-logo.svg new file mode 100644 index 00000000..5b0573b7 --- /dev/null +++ b/enjoy/assets/ffmpeg-logo.svg @@ -0,0 +1,34 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/enjoy/assets/github-mark.png b/enjoy/assets/github-mark.png new file mode 100644 index 00000000..6cb3b705 Binary files /dev/null and b/enjoy/assets/github-mark.png differ diff --git a/enjoy/assets/icon.icns b/enjoy/assets/icon.icns new file mode 100644 index 00000000..9104d98d Binary files /dev/null and b/enjoy/assets/icon.icns differ diff --git a/enjoy/assets/icon.ico b/enjoy/assets/icon.ico new file mode 100644 index 00000000..febbdb8d Binary files /dev/null and b/enjoy/assets/icon.ico differ diff --git a/enjoy/assets/icon.png b/enjoy/assets/icon.png new file mode 100644 index 00000000..049c7ce2 Binary files /dev/null and b/enjoy/assets/icon.png differ diff --git a/enjoy/assets/logo-light.svg b/enjoy/assets/logo-light.svg new file mode 100644 index 00000000..dfa77e0b --- /dev/null +++ b/enjoy/assets/logo-light.svg @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/enjoy/assets/logo.png b/enjoy/assets/logo.png new file mode 100644 index 00000000..f024d53a Binary files /dev/null and b/enjoy/assets/logo.png differ diff --git a/enjoy/assets/mixin-logo.png b/enjoy/assets/mixin-logo.png new file mode 100644 index 00000000..28ef2ab6 Binary files /dev/null and b/enjoy/assets/mixin-logo.png differ diff --git a/enjoy/assets/sound-waves.png b/enjoy/assets/sound-waves.png new file mode 100644 index 00000000..3f22ec4b Binary files /dev/null and b/enjoy/assets/sound-waves.png differ diff --git a/enjoy/assets/video.png b/enjoy/assets/video.png new file mode 100644 index 00000000..fbed1e0f Binary files /dev/null and b/enjoy/assets/video.png differ diff --git a/enjoy/components.json b/enjoy/components.json new file mode 100644 index 00000000..84987e9b --- /dev/null +++ b/enjoy/components.json @@ -0,0 +1,16 @@ +{ + "$schema": "https://ui.shadcn.com/schema.json", + "style": "new-york", + "rsc": false, + "tsx": true, + "tailwind": { + "config": "tailwind.config.js", + "css": "src/index.css", + "baseColor": "zinc", + "cssVariables": true + }, + "aliases": { + "components": "src/renderer/components", + "utils": "src/renderer/lib/utils" + } +} diff --git a/enjoy/forge.config.ts b/enjoy/forge.config.ts new file mode 100644 index 00000000..a4b31256 --- /dev/null +++ b/enjoy/forge.config.ts @@ -0,0 +1,165 @@ +import type { ForgeConfig } from "@electron-forge/shared-types"; +import { MakerSquirrel } from "@electron-forge/maker-squirrel"; +import { MakerZIP } from "@electron-forge/maker-zip"; +import { MakerDeb } from "@electron-forge/maker-deb"; +import { VitePlugin } from "@electron-forge/plugin-vite"; +import { dirname } from "node:path"; +import { Walker, DepType, type Module } from "flora-colossus"; + +// any packages that you must mark as "external" in vite +const NATIVE_MODULES_TO_PACKAGE = [ + "sequelize", + "umzug", + "sqlite3", + "fluent-ffmpeg", + "electron-squirrel-startup", +]; +const INCLUDE_NESTED_DEPS = true as const; +let nativeModuleDependenciesToPackage: Set; + +const config: ForgeConfig = { + packagerConfig: { + icon: "./assets/icon", + executableName: "enjoy", + protocols: [ + { + name: "Enjoy", + schemes: ["enjoy"], + }, + ], + }, + rebuildConfig: {}, + makers: [ + new MakerSquirrel({ + name: "Enjoy", + setupIcon: "./assets/icon.ico", + }), + new MakerZIP({}, ["darwin", "win32"]), + new MakerDeb({ + options: { + name: "enjoy", + productName: "Enjoy", + icon: "./assets/icon.png", + mimeType: ["x-scheme-handler/enjoy"], + }, + }), + ], + publishers: [ + { + name: "@electron-forge/publisher-github", + config: { + repository: { + owner: "an-lee", + name: "enjoy", + }, + draft: true, + }, + }, + ], + plugins: [ + new VitePlugin({ + // `build` can specify multiple entry builds, which can be Main process, Preload scripts, Worker process, etc. + // If you are familiar with Vite configuration, it will look really familiar. + build: [ + { + // `entry` is just an alias for `build.lib.entry` in the corresponding file of `config`. + entry: "src/main.ts", + config: "vite.main.config.mts", + }, + { + entry: "src/preload.ts", + config: "vite.preload.config.mts", + }, + ], + renderer: [ + { + name: "main_window", + config: "vite.renderer.config.mts", + }, + ], + }), + ], + hooks: { + prePackage: async (forgeConfig) => { + if (forgeConfig.packagerConfig.ignore !== undefined) { + throw new Error( + "forgeConfig.packagerConfig.ignore is already defined. Please remove it from your forge config and instead use the prePackage hook to dynamically set it." + ); + } + + const getExternalNestedDependencies = async ( + nodeModuleNames: string[], + includeNestedDeps = true + ) => { + const foundModules = new Set(nodeModuleNames); + if (includeNestedDeps) { + for (const external of nodeModuleNames) { + type MyPublicClass = { + [P in keyof T]: T[P]; + }; + type MyPublicWalker = MyPublicClass & { + modules: Module[]; + walkDependenciesForModule: ( + moduleRoot: string, + depType: DepType + ) => Promise; + }; + const moduleRoot = dirname( + require.resolve(`${external}/package.json`, { + paths: [__dirname], + }) + ); + const walker = new Walker(moduleRoot) as unknown as MyPublicWalker; + walker.modules = []; + await walker.walkDependenciesForModule(moduleRoot, DepType.PROD); + walker.modules + .filter( + (dep) => (dep.nativeModuleType as number) === DepType.PROD + ) + .map((dep) => dep.name) + .forEach((name) => foundModules.add(name)); + } + } + return foundModules; + }; + + nativeModuleDependenciesToPackage = await getExternalNestedDependencies( + NATIVE_MODULES_TO_PACKAGE, + INCLUDE_NESTED_DEPS + ); + + forgeConfig.packagerConfig.ignore = (path) => { + // .vite bundled build files + if (path.startsWith("/.vite")) { + return false; + } + // main package.json file + if (path === "/package.json") { + return false; + } + if (!path) { + return false; + } + // need to first NOT ignore the root node_modules folder + if (path === "/node_modules") { + return false; + } + // if path is in nativeModuleDependenciesToPackage, return false (to package it) + const foundModules: Set = nativeModuleDependenciesToPackage; + for (const module of foundModules) { + if ( + path.startsWith(`/node_modules/${module}`) || + path.startsWith(`/node_modules/${module.split("/")[0]}`) + ) { + return false; + } + } + + // for everything else, ignore it + return true; + }; + }, + }, +}; + +export default config; diff --git a/enjoy/index.html b/enjoy/index.html new file mode 100644 index 00000000..2d169b67 --- /dev/null +++ b/enjoy/index.html @@ -0,0 +1,12 @@ + + + + + + Enjoy + + + +
+ + diff --git a/enjoy/lib/whisper.cpp/arm64/darwin/bench b/enjoy/lib/whisper.cpp/arm64/darwin/bench new file mode 100755 index 00000000..dc6e5e22 Binary files /dev/null and b/enjoy/lib/whisper.cpp/arm64/darwin/bench differ diff --git a/enjoy/lib/whisper.cpp/arm64/darwin/ggml-metal.metal b/enjoy/lib/whisper.cpp/arm64/darwin/ggml-metal.metal new file mode 100644 index 00000000..5d1357cd --- /dev/null +++ b/enjoy/lib/whisper.cpp/arm64/darwin/ggml-metal.metal @@ -0,0 +1,2929 @@ +#include + +using namespace metal; + +#define MAX(x, y) ((x) > (y) ? (x) : (y)) + +#define QK4_0 32 +#define QR4_0 2 +typedef struct { + half d; // delta + uint8_t qs[QK4_0 / 2]; // nibbles / quants +} block_q4_0; + +#define QK4_1 32 +typedef struct { + half d; // delta + half m; // min + uint8_t qs[QK4_1 / 2]; // nibbles / quants +} block_q4_1; + +#define QK5_0 32 +typedef struct { + half d; // delta + uint8_t qh[4]; // 5-th bit of quants + uint8_t qs[QK5_0 / 2]; // nibbles / quants +} block_q5_0; + +#define QK5_1 32 +typedef struct { + half d; // delta + half m; // min + uint8_t qh[4]; // 5-th bit of quants + uint8_t qs[QK5_1 / 2]; // nibbles / quants +} block_q5_1; + +#define QK8_0 32 +typedef struct { + half d; // delta + int8_t qs[QK8_0]; // quants +} block_q8_0; + +// general-purpose kernel for addition of two tensors +// pros: works for non-contiguous tensors, supports broadcast across dims 1, 2 and 3 +// cons: not very efficient +kernel void kernel_add( + device const char * src0, + device const char * src1, + device char * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant int64_t & nb00, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & nb03, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & nb13, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant int64_t & nb0, + constant int64_t & nb1, + constant int64_t & nb2, + constant int64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig.z; + const int64_t i02 = tgpig.y; + const int64_t i01 = tgpig.x; + + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + + device const char * src0_ptr = src0 + i03*nb03 + i02*nb02 + i01*nb01 + tpitg.x*nb00; + device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11 + tpitg.x*nb10; + device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1 + tpitg.x*nb0; + + for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { + ((device float *)dst_ptr)[0] = ((device float *)src0_ptr)[0] + ((device float *)src1_ptr)[0]; + + src0_ptr += ntg.x*nb00; + src1_ptr += ntg.x*nb10; + dst_ptr += ntg.x*nb0; + } +} + +// assumption: src1 is a row +// broadcast src1 into src0 +kernel void kernel_add_row( + device const float4 * src0, + device const float4 * src1, + device float4 * dst, + constant int64_t & nb [[buffer(27)]], + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = src0[tpig] + src1[tpig % nb]; +} + +kernel void kernel_mul( + device const float4 * src0, + device const float4 * src1, + device float4 * dst, + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = src0[tpig] * src1[tpig]; +} + +// assumption: src1 is a row +// broadcast src1 into src0 +kernel void kernel_mul_row( + device const float4 * src0, + device const float4 * src1, + device float4 * dst, + constant int64_t & nb, + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = src0[tpig] * src1[tpig % nb]; +} + +kernel void kernel_scale( + device const float * src0, + device float * dst, + constant float & scale, + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = src0[tpig] * scale; +} + +kernel void kernel_scale_4( + device const float4 * src0, + device float4 * dst, + constant float & scale, + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = src0[tpig] * scale; +} + +kernel void kernel_silu( + device const float4 * src0, + device float4 * dst, + uint tpig[[thread_position_in_grid]]) { + device const float4 & x = src0[tpig]; + dst[tpig] = x / (1.0f + exp(-x)); +} + +kernel void kernel_relu( + device const float * src0, + device float * dst, + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = max(0.0f, src0[tpig]); +} + +kernel void kernel_sqr( + device const float * src0, + device float * dst, + uint tpig[[thread_position_in_grid]]) { + dst[tpig] = src0[tpig] * src0[tpig]; +} + +constant float GELU_COEF_A = 0.044715f; +constant float SQRT_2_OVER_PI = 0.79788456080286535587989211986876f; + +kernel void kernel_gelu( + device const float4 * src0, + device float4 * dst, + uint tpig[[thread_position_in_grid]]) { + device const float4 & x = src0[tpig]; + + // BEWARE !!! + // Simply using "tanh" instead of "precise::tanh" will sometimes results in NaNs! + // This was observed with Falcon 7B and 40B models + // + dst[tpig] = 0.5f*x*(1.0f + precise::tanh(SQRT_2_OVER_PI*x*(1.0f + GELU_COEF_A*x*x))); +} + +kernel void kernel_soft_max( + device const float * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + threadgroup float * buf [[threadgroup(0)]], + uint tgpig[[threadgroup_position_in_grid]], + uint tpitg[[thread_position_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint ntg[[threads_per_threadgroup]]) { + const int64_t i03 = (tgpig) / (ne02*ne01); + const int64_t i02 = (tgpig - i03*ne02*ne01) / ne01; + const int64_t i01 = (tgpig - i03*ne02*ne01 - i02*ne01); + + device const float * psrc0 = src0 + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + device float * pdst = dst + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + // parallel max + float lmax = tpitg < ne00 ? psrc0[tpitg] : -INFINITY; + + for (int i00 = tpitg + ntg; i00 < ne00; i00 += ntg) { + lmax = MAX(lmax, psrc0[i00]); + } + + float max = simd_max(lmax); + if (tiisg == 0) { + buf[sgitg] = max; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + // broadcast, simd group number is ntg / 32 + for (uint i = ntg / 32 / 2; i > 0; i /= 2) { + if (tpitg < i) { + buf[tpitg] = MAX(buf[tpitg], buf[tpitg + i]); + } + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + max = buf[0]; + + // parallel sum + float lsum = 0.0f; + for (int i00 = tpitg; i00 < ne00; i00 += ntg) { + const float exp_psrc0 = exp(psrc0[i00] - max); + lsum += exp_psrc0; + // Remember the result of exp here. exp is expensive, so we really do not + // wish to compute it twice. + pdst[i00] = exp_psrc0; + } + + float sum = simd_sum(lsum); + if (tiisg == 0) { + buf[sgitg] = sum; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + // broadcast, simd group number is ntg / 32 + for (uint i = ntg / 32 / 2; i > 0; i /= 2) { + if (tpitg < i) { + buf[tpitg] += buf[tpitg + i]; + } + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + sum = buf[0]; + + for (int i00 = tpitg; i00 < ne00; i00 += ntg) { + pdst[i00] /= sum; + } +} + +kernel void kernel_soft_max_4( + device const float * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + threadgroup float * buf [[threadgroup(0)]], + uint tgpig[[threadgroup_position_in_grid]], + uint tpitg[[thread_position_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint ntg[[threads_per_threadgroup]]) { + const int64_t i03 = (tgpig) / (ne02*ne01); + const int64_t i02 = (tgpig - i03*ne02*ne01) / ne01; + const int64_t i01 = (tgpig - i03*ne02*ne01 - i02*ne01); + + device const float4 * psrc4 = (device const float4 *)(src0 + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00); + device float4 * pdst4 = (device float4 *)(dst + i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00); + + // parallel max + float4 lmax4 = tpitg < ne00/4 ? psrc4[tpitg] : -INFINITY; + + for (int i00 = tpitg + ntg; i00 < ne00/4; i00 += ntg) { + lmax4 = fmax(lmax4, psrc4[i00]); + } + + const float lmax = MAX(MAX(lmax4[0], lmax4[1]), MAX(lmax4[2], lmax4[3])); + float max = simd_max(lmax); + if (tiisg == 0) { + buf[sgitg] = max; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + // broadcast, simd group number is ntg / 32 + for (uint i = ntg / 32 / 2; i > 0; i /= 2) { + if (tpitg < i) { + buf[tpitg] = MAX(buf[tpitg], buf[tpitg + i]); + } + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + max = buf[0]; + + // parallel sum + float4 lsum4 = 0.0f; + for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { + const float4 exp_psrc4 = exp(psrc4[i00] - max); + lsum4 += exp_psrc4; + pdst4[i00] = exp_psrc4; + } + + const float lsum = lsum4[0] + lsum4[1] + lsum4[2] + lsum4[3]; + float sum = simd_sum(lsum); + if (tiisg == 0) { + buf[sgitg] = sum; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + // broadcast, simd group number is ntg / 32 + for (uint i = ntg / 32 / 2; i > 0; i /= 2) { + if (tpitg < i) { + buf[tpitg] += buf[tpitg + i]; + } + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + sum = buf[0]; + + for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { + pdst4[i00] /= sum; + } +} + +kernel void kernel_diag_mask_inf( + device const float * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int & n_past, + uint3 tpig[[thread_position_in_grid]]) { + const int64_t i02 = tpig[2]; + const int64_t i01 = tpig[1]; + const int64_t i00 = tpig[0]; + + if (i00 > n_past + i01) { + dst[i02*ne01*ne00 + i01*ne00 + i00] = -INFINITY; + } else { + dst[i02*ne01*ne00 + i01*ne00 + i00] = src0[i02*ne01*ne00 + i01*ne00 + i00]; + } +} + +kernel void kernel_diag_mask_inf_8( + device const float4 * src0, + device float4 * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int & n_past, + uint3 tpig[[thread_position_in_grid]]) { + + const int64_t i = 2*tpig[0]; + + dst[i+0] = src0[i+0]; + dst[i+1] = src0[i+1]; + int64_t i4 = 4*i; + const int64_t i02 = i4/(ne00*ne01); i4 -= i02*ne00*ne01; + const int64_t i01 = i4/(ne00); i4 -= i01*ne00; + const int64_t i00 = i4; + for (int k = 3; k >= 0; --k) { + if (i00 + 4 + k <= n_past + i01) { + break; + } + dst[i+1][k] = -INFINITY; + if (i00 + k > n_past + i01) { + dst[i][k] = -INFINITY; + } + } +} + +kernel void kernel_norm( + device const void * src0, + device float * dst, + constant int64_t & ne00, + constant uint64_t & nb01, + constant float & eps, + threadgroup float * sum [[threadgroup(0)]], + uint tgpig[[threadgroup_position_in_grid]], + uint tpitg[[thread_position_in_threadgroup]], + uint ntg[[threads_per_threadgroup]]) { + device const float * x = (device const float *) ((device const char *) src0 + tgpig*nb01); + // MEAN + // parallel sum + sum[tpitg] = 0.0f; + for (int i00 = tpitg; i00 < ne00; i00 += ntg) { + sum[tpitg] += x[i00]; + } + // reduce + threadgroup_barrier(mem_flags::mem_threadgroup); + for (uint i = ntg/2; i > 0; i /= 2) { + if (tpitg < i) { + sum[tpitg] += sum[tpitg + i]; + } + threadgroup_barrier(mem_flags::mem_threadgroup); + } + const float mean = sum[0] / ne00; + + // recenter and VARIANCE + threadgroup_barrier(mem_flags::mem_threadgroup); + device float * y = dst + tgpig*ne00; + sum[tpitg] = 0.0f; + for (int i00 = tpitg; i00 < ne00; i00 += ntg) { + y[i00] = x[i00] - mean; + sum[tpitg] += y[i00] * y[i00]; + } + + // reduce + threadgroup_barrier(mem_flags::mem_threadgroup); + for (uint i = ntg/2; i > 0; i /= 2) { + if (tpitg < i) { + sum[tpitg] += sum[tpitg + i]; + } + threadgroup_barrier(mem_flags::mem_threadgroup); + } + const float variance = sum[0] / ne00; + + const float scale = 1.0f/sqrt(variance + eps); + for (int i00 = tpitg; i00 < ne00; i00 += ntg) { + y[i00] = y[i00] * scale; + } +} + +kernel void kernel_rms_norm( + device const void * src0, + device float * dst, + constant int64_t & ne00, + constant uint64_t & nb01, + constant float & eps, + threadgroup float * sum [[threadgroup(0)]], + uint tgpig[[threadgroup_position_in_grid]], + uint tpitg[[thread_position_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]], + uint tiisg[[thread_index_in_simdgroup]], + uint ntg[[threads_per_threadgroup]]) { + device const float4 * x = (device const float4 *) ((device const char *) src0 + tgpig*nb01); + device const float * x_scalar = (device const float *) x; + + float4 sumf = 0; + float all_sum = 0; + + // parallel sum + for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { + sumf += x[i00] * x[i00]; + } + all_sum = sumf[0] + sumf[1] + sumf[2] + sumf[3]; + all_sum = simd_sum(all_sum); + if (tiisg == 0) { + sum[sgitg] = all_sum; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + // broadcast, simd group number is ntg / 32 + for (uint i = ntg / 32 / 2; i > 0; i /= 2) { + if (tpitg < i) { + sum[tpitg] += sum[tpitg + i]; + } + } + if (tpitg == 0) { + for (int i = 4 * (ne00 / 4); i < ne00; i++) { + sum[0] += x_scalar[i]; + } + sum[0] /= ne00; + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + const float mean = sum[0]; + const float scale = 1.0f/sqrt(mean + eps); + + device float4 * y = (device float4 *) (dst + tgpig*ne00); + device float * y_scalar = (device float *) y; + for (int i00 = tpitg; i00 < ne00/4; i00 += ntg) { + y[i00] = x[i00] * scale; + } + if (tpitg == 0) { + for (int i00 = 4 * (ne00 / 4); i00 < ne00; i00++) { + y_scalar[i00] = x_scalar[i00] * scale; + } + } +} + +// function for calculate inner product between half a q4_0 block and 16 floats (yl), sumy is SUM(yl[i]) +// il indicates where the q4 quants begin (0 or QK4_0/4) +// we assume that the yl's have been multiplied with the appropriate scale factor +// that corresponds to the missing bit shifts (1, 1/16, 1/256, 1/4096) +inline float block_q_n_dot_y(device const block_q4_0 * qb_curr, float sumy, thread float * yl, int il) { + float d = qb_curr->d; + + float2 acc = 0.f; + + device const uint16_t * qs = ((device const uint16_t *)qb_curr + 1 + il/2); + + for (int i = 0; i < 8; i+=2) { + acc[0] += yl[i + 0] * (qs[i / 2] & 0x000F) + + yl[i + 1] * (qs[i / 2] & 0x0F00); + acc[1] += yl[i + 8] * (qs[i / 2] & 0x00F0) + + yl[i + 9] * (qs[i / 2] & 0xF000); + } + return d * (sumy * -8.f + acc[0] + acc[1]); +} + +// function for calculate inner product between half a q4_1 block and 16 floats (yl), sumy is SUM(yl[i]) +// il indicates where the q4 quants begin (0 or QK4_0/4) +// we assume that the yl's have been multiplied with the appropriate scale factor +// that corresponds to the missing bit shifts (1, 1/16, 1/256, 1/4096) +inline float block_q_n_dot_y(device const block_q4_1 * qb_curr, float sumy, thread float * yl, int il) { + float d = qb_curr->d; + float m = qb_curr->m; + + float2 acc = 0.f; + + device const uint16_t * qs = ((device const uint16_t *)qb_curr + 2 + il/2); + + for (int i = 0; i < 8; i+=2) { + acc[0] += yl[i + 0] * (qs[i / 2] & 0x000F) + + yl[i + 1] * (qs[i / 2] & 0x0F00); + acc[1] += yl[i + 8] * (qs[i / 2] & 0x00F0) + + yl[i + 9] * (qs[i / 2] & 0xF000); + } + return d * (acc[0] + acc[1]) + sumy * m; +} + +// function for calculate inner product between half a q5_0 block and 16 floats (yl), sumy is SUM(yl[i]) +// il indicates where the q5 quants begin (0 or QK5_0/4) +// we assume that the yl's have been multiplied with the appropriate scale factor +// that corresponds to the missing bit shifts (1, 1/16, 1/256, 1/4096) +inline float block_q_n_dot_y(device const block_q5_0 * qb_curr, float sumy, thread float * yl, int il) { + float d = qb_curr->d; + + float2 acc = 0.f; + + device const uint16_t * qs = ((device const uint16_t *)qb_curr + 3 + il/2); + const uint32_t qh = *((device const uint32_t *)qb_curr->qh); + + for (int i = 0; i < 8; i+=2) { + acc[0] += yl[i + 0] * ((qs[i / 2] & 0x000F) | ((qh >> (i+0+il ) << 4 ) & 0x00010)) + + yl[i + 1] * ((qs[i / 2] & 0x0F00) | ((qh >> (i+1+il ) << 12) & 0x01000)); + acc[1] += yl[i + 8] * ((qs[i / 2] & 0x00F0) | ((qh >> (i+0+il+QK5_0/2) << 8 ) & 0x00100)) + + yl[i + 9] * ((qs[i / 2] & 0xF000) | ((qh >> (i+1+il+QK5_0/2) << 16) & 0x10000)); + } + return d * (sumy * -16.f + acc[0] + acc[1]); +} + +// function for calculate inner product between half a q5_1 block and 16 floats (yl), sumy is SUM(yl[i]) +// il indicates where the q5 quants begin (0 or QK5_1/4) +// we assume that the yl's have been multiplied with the appropriate scale factor +// that corresponds to the missing bit shifts (1, 1/16, 1/256, 1/4096) +inline float block_q_n_dot_y(device const block_q5_1 * qb_curr, float sumy, thread float * yl, int il) { + float d = qb_curr->d; + float m = qb_curr->m; + + float2 acc = 0.f; + + device const uint16_t * qs = ((device const uint16_t *)qb_curr + 4 + il/2); + const uint32_t qh = *((device const uint32_t *)qb_curr->qh); + + for (int i = 0; i < 8; i+=2) { + acc[0] += yl[i + 0] * ((qs[i / 2] & 0x000F) | ((qh >> (i+0+il ) << 4 ) & 0x00010)) + + yl[i + 1] * ((qs[i / 2] & 0x0F00) | ((qh >> (i+1+il ) << 12) & 0x01000)); + acc[1] += yl[i + 8] * ((qs[i / 2] & 0x00F0) | ((qh >> (i+0+il+QK5_0/2) << 8 ) & 0x00100)) + + yl[i + 9] * ((qs[i / 2] & 0xF000) | ((qh >> (i+1+il+QK5_0/2) << 16) & 0x10000)); + } + return d * (acc[0] + acc[1]) + sumy * m; +} + +// putting them in the kernel cause a significant performance penalty +#define N_DST 4 // each SIMD group works on 4 rows +#define N_SIMDGROUP 2 // number of SIMD groups in a thread group +#define N_SIMDWIDTH 32 // assuming SIMD group size is 32 +//Note: This is a template, but strictly speaking it only applies to +// quantizations where the block size is 32. It also does not +// giard against the number of rows not being divisible by +// N_DST, so this is another explicit assumption of the implementation. +template +void mul_vec_q_n_f32(device const void * src0, device const float * src1, device float * dst, + int64_t ne00, int64_t ne01, int64_t ne02, int64_t ne10, int64_t ne12, int64_t ne0, int64_t ne1, uint gqa, + uint3 tgpig, uint tiisg, uint sgitg) { + const int nb = ne00/QK4_0; + + const int r0 = tgpig.x; + const int r1 = tgpig.y; + const int im = tgpig.z; + + const int first_row = (r0 * nsg + sgitg) * nr; + + const uint offset0 = first_row * nb + im/gqa*(nb*ne0); + + device const block_q_type * x = (device const block_q_type *) src0 + offset0; + device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; + + float yl[16]; // src1 vector cache + float sumf[nr] = {0.f}; + + const int ix = (tiisg/2); + const int il = (tiisg%2)*8; + + device const float * yb = y + ix * QK4_0 + il; + + // each thread in a SIMD group deals with half a block. + for (int ib = ix; ib < nb; ib += nw/2) { + float sumy = 0; + for (int i = 0; i < 8; i += 2) { + sumy += yb[i] + yb[i+1]; + yl[i+0] = yb[i+ 0]; + yl[i+1] = yb[i+ 1]/256.f; + + sumy += yb[i+16] + yb[i+17]; + yl[i+8] = yb[i+16]/16.f; + yl[i+9] = yb[i+17]/4096.f; + } + + for (int row = 0; row < nr; row++) { + sumf[row] += block_q_n_dot_y(x+ib+row*nb, sumy, yl, il); + } + + yb += QK4_0 * 16; + } + + for (int row = 0; row < nr; ++row) { + const float tot = simd_sum(sumf[row]); + if (tiisg == 0 && first_row + row < ne01) { + dst[im*ne0*ne1 + r1*ne0 + first_row + row] = tot; + } + } +} + +kernel void kernel_mul_mv_q4_0_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); +} + +kernel void kernel_mul_mv_q4_1_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); +} + +kernel void kernel_mul_mv_q5_0_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); +} + +kernel void kernel_mul_mv_q5_1_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + mul_vec_q_n_f32(src0,src1,dst,ne00,ne01,ne02,ne10,ne12,ne0,ne1,gqa,tgpig,tiisg,sgitg); +} + + +#define NB_Q8_0 8 + +kernel void kernel_mul_mv_q8_0_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + const int nr = N_DST; + const int nsg = N_SIMDGROUP; + const int nw = N_SIMDWIDTH; + + const int nb = ne00/QK8_0; + const int r0 = tgpig.x; + const int r1 = tgpig.y; + const int im = tgpig.z; + const int first_row = (r0 * nsg + sgitg) * nr; + const uint offset0 = first_row * nb + im/gqa*(nb*ne0); + device const block_q8_0 * x = (device const block_q8_0 *) src0 + offset0; + device const float * y = (device const float *) src1 + r1*ne10 + im*ne00*ne1; + + float yl[NB_Q8_0]; + float sumf[nr]={0.f}; + + const int ix = tiisg/4; + const int il = tiisg%4; + + device const float * yb = y + ix * QK8_0 + NB_Q8_0*il; + + // each thread in a SIMD group deals with NB_Q8_0 quants at a time + for (int ib = ix; ib < nb; ib += nw/4) { + for (int i = 0; i < NB_Q8_0; ++i) { + yl[i] = yb[i]; + } + + for (int row = 0; row < nr; row++) { + device const int8_t * qs = x[ib+row*nb].qs + NB_Q8_0*il; + float sumq = 0.f; + for (int iq = 0; iq < NB_Q8_0; ++iq) { + sumq += qs[iq] * yl[iq]; + } + sumf[row] += sumq*x[ib+row*nb].d; + } + + yb += NB_Q8_0 * nw; + } + + for (int row = 0; row < nr; ++row) { + const float tot = simd_sum(sumf[row]); + if (tiisg == 0 && first_row + row < ne01) { + dst[r1*ne0 + im*ne0*ne1 + first_row + row] = tot; + } + } +} + +#define N_F32_F32 4 + +kernel void kernel_mul_mv_f32_f32( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + + const int64_t r0 = tgpig.x; + const int64_t rb = tgpig.y*N_F32_F32; + const int64_t im = tgpig.z; + + device const float * x = (device const float *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + + if (ne00 < 128) { + for (int row = 0; row < N_F32_F32; ++row) { + int r1 = rb + row; + if (r1 >= ne11) { + break; + } + + device const float * y = (device const float *) (src1 + r1*nb11 + im*nb12); + + float sumf = 0; + for (int i = tiisg; i < ne00; i += 32) { + sumf += (float) x[i] * (float) y[i]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + } else { + device const float4 * x4 = (device const float4 *)x; + for (int row = 0; row < N_F32_F32; ++row) { + int r1 = rb + row; + if (r1 >= ne11) { + break; + } + + device const float * y = (device const float *) (src1 + r1*nb11 + im*nb12); + device const float4 * y4 = (device const float4 *) y; + + float sumf = 0; + for (int i = tiisg; i < ne00/4; i += 32) { + for (int k = 0; k < 4; ++k) sumf += (float) x4[i][k] * y4[i][k]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + for (int i = 4*(ne00/4); i < ne00; ++i) all_sum += (float) x[i] * y[i]; + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + } +} + +#define N_F16_F16 4 + +kernel void kernel_mul_mv_f16_f16( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + + const int64_t r0 = tgpig.x; + const int64_t rb = tgpig.y*N_F16_F16; + const int64_t im = tgpig.z; + + device const half * x = (device const half *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + + if (ne00 < 128) { + for (int row = 0; row < N_F16_F16; ++row) { + int r1 = rb + row; + if (r1 >= ne11) { + break; + } + + device const half * y = (device const half *) (src1 + r1*nb11 + im*nb12); + + float sumf = 0; + for (int i = tiisg; i < ne00; i += 32) { + sumf += (half) x[i] * (half) y[i]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + } else { + device const half4 * x4 = (device const half4 *)x; + for (int row = 0; row < N_F16_F16; ++row) { + int r1 = rb + row; + if (r1 >= ne11) { + break; + } + + device const half * y = (device const half *) (src1 + r1*nb11 + im*nb12); + device const half4 * y4 = (device const half4 *) y; + + float sumf = 0; + for (int i = tiisg; i < ne00/4; i += 32) { + for (int k = 0; k < 4; ++k) sumf += (half) x4[i][k] * y4[i][k]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + for (int i = 4*(ne00/4); i < ne00; ++i) all_sum += (half) x[i] * y[i]; + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + } +} + +kernel void kernel_mul_mv_f16_f32_1row( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + + const int64_t r0 = tgpig.x; + const int64_t r1 = tgpig.y; + const int64_t im = tgpig.z; + + device const half * x = (device const half *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + device const float * y = (device const float *) (src1 + r1*nb11 + im*nb12); + + float sumf = 0; + if (ne00 < 128) { + for (int i = tiisg; i < ne00; i += 32) { + sumf += (float) x[i] * (float) y[i]; + } + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } else { + device const half4 * x4 = (device const half4 *) x; + device const float4 * y4 = (device const float4 *) y; + for (int i = tiisg; i < ne00/4; i += 32) { + for (int k = 0; k < 4; ++k) sumf += (float)x4[i][k] * y4[i][k]; + } + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + for (int i = 4*(ne00/4); i < ne00; ++i) all_sum += (float) x[i] * y[i]; + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + +} + +#define N_F16_F32 4 + +kernel void kernel_mul_mv_f16_f32( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + + const int64_t r0 = tgpig.x; + const int64_t rb = tgpig.y*N_F16_F32; + const int64_t im = tgpig.z; + + device const half * x = (device const half *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + + if (ne00 < 128) { + for (int row = 0; row < N_F16_F32; ++row) { + int r1 = rb + row; + if (r1 >= ne11) { + break; + } + + device const float * y = (device const float *) (src1 + r1*nb11 + im*nb12); + + float sumf = 0; + for (int i = tiisg; i < ne00; i += 32) { + sumf += (float) x[i] * (float) y[i]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + } else { + device const half4 * x4 = (device const half4 *)x; + for (int row = 0; row < N_F16_F32; ++row) { + int r1 = rb + row; + if (r1 >= ne11) { + break; + } + + device const float * y = (device const float *) (src1 + r1*nb11 + im*nb12); + device const float4 * y4 = (device const float4 *) y; + + float sumf = 0; + for (int i = tiisg; i < ne00/4; i += 32) { + for (int k = 0; k < 4; ++k) sumf += (float) x4[i][k] * y4[i][k]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + for (int i = 4*(ne00/4); i < ne00; ++i) all_sum += (float) x[i] * y[i]; + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } + } +} + +// Assumes row size (ne00) is a multiple of 4 +kernel void kernel_mul_mv_f16_f32_l4( + device const char * src0, + device const char * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]]) { + + const int nrows = ne11; + const int64_t r0 = tgpig.x; + const int64_t im = tgpig.z; + + device const half4 * x4 = (device const half4 *) (src0 + r0*nb01 + im/(ne12/ne02)*nb02); + + for (int r1 = 0; r1 < nrows; ++r1) { + device const float4 * y4 = (device const float4 *) (src1 + r1*nb11 + im*nb12); + + float sumf = 0; + for (int i = tiisg; i < ne00/4; i += 32) { + for (int k = 0; k < 4; ++k) sumf += (float) x4[i][k] * y4[i][k]; + } + + float all_sum = simd_sum(sumf); + if (tiisg == 0) { + dst[im*ne1*ne0 + r1*ne0 + r0] = all_sum; + } + } +} + +kernel void kernel_alibi_f32( + device const float * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + constant float & m0, + constant float & m1, + constant int & n_heads_log2_floor, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0); + + device float * dst_data = (device float *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + float m_k; + if (i2 < n_heads_log2_floor) { + m_k = pow(m0, i2 + 1); + } else { + m_k = pow(m1, 2 * (i2 - n_heads_log2_floor) + 1); + } + for (int64_t i00 = tpitg.x; i00 < ne00; i00 += ntg.x) { + device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + dst_data[i00] = src[0] + m_k * (i00 - ne00 + 1); + } +} + +static float rope_yarn_ramp(const float low, const float high, const int i0) { + const float y = (i0 / 2 - low) / max(0.001f, high - low); + return 1.0f - min(1.0f, max(0.0f, y)); +} + +// YaRN algorithm based on LlamaYaRNScaledRotaryEmbedding.py from https://github.com/jquesnelle/yarn +// MIT licensed. Copyright (c) 2023 Jeffrey Quesnelle and Bowen Peng. +static void rope_yarn( + float theta_extrap, float freq_scale, float corr_dims[2], int64_t i0, float ext_factor, float mscale, + thread float * cos_theta, thread float * sin_theta +) { + // Get n-d rotational scaling corrected for extrapolation + float theta_interp = freq_scale * theta_extrap; + float theta = theta_interp; + if (ext_factor != 0.0f) { + float ramp_mix = rope_yarn_ramp(corr_dims[0], corr_dims[1], i0) * ext_factor; + theta = theta_interp * (1 - ramp_mix) + theta_extrap * ramp_mix; + + // Get n-d magnitude scaling corrected for interpolation + mscale *= 1.0f + 0.1f * log(1.0f / freq_scale); + } + *cos_theta = cos(theta) * mscale; + *sin_theta = sin(theta) * mscale; +} + +// Apparently solving `n_rot = 2pi * x * base^((2 * max_pos_emb) / n_dims)` for x, we get +// `corr_fac(n_rot) = n_dims * log(max_pos_emb / (n_rot * 2pi)) / (2 * log(base))` +static float rope_yarn_corr_factor(int n_dims, int n_orig_ctx, float n_rot, float base) { + return n_dims * log(n_orig_ctx / (n_rot * 2 * M_PI_F)) / (2 * log(base)); +} + +static void rope_yarn_corr_dims( + int n_dims, int n_orig_ctx, float freq_base, float beta_fast, float beta_slow, float dims[2] +) { + // start and end correction dims + dims[0] = max(0.0f, floor(rope_yarn_corr_factor(n_dims, n_orig_ctx, beta_fast, freq_base))); + dims[1] = min(n_dims - 1.0f, ceil(rope_yarn_corr_factor(n_dims, n_orig_ctx, beta_slow, freq_base))); +} + +typedef void (rope_t)( + device const void * src0, + device const int32_t * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + constant int & n_past, + constant int & n_dims, + constant int & mode, + constant int & n_orig_ctx, + constant float & freq_base, + constant float & freq_scale, + constant float & ext_factor, + constant float & attn_factor, + constant float & beta_fast, + constant float & beta_slow, + uint tiitg[[thread_index_in_threadgroup]], + uint3 tptg[[threads_per_threadgroup]], + uint3 tgpig[[threadgroup_position_in_grid]]); + +template +kernel void kernel_rope( + device const void * src0, + device const int32_t * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + constant int & n_past, + constant int & n_dims, + constant int & mode, + constant int & n_orig_ctx, + constant float & freq_base, + constant float & freq_scale, + constant float & ext_factor, + constant float & attn_factor, + constant float & beta_fast, + constant float & beta_slow, + uint tiitg[[thread_index_in_threadgroup]], + uint3 tptg[[threads_per_threadgroup]], + uint3 tgpig[[threadgroup_position_in_grid]]) { + const int64_t i3 = tgpig[2]; + const int64_t i2 = tgpig[1]; + const int64_t i1 = tgpig[0]; + + const bool is_neox = mode & 2; + + float corr_dims[2]; + rope_yarn_corr_dims(n_dims, n_orig_ctx, freq_base, beta_fast, beta_slow, corr_dims); + + device const int32_t * pos = src1; + + const int64_t p = pos[i2]; + + const float theta_0 = (float)p; + const float inv_ndims = -1.f/n_dims; + + if (!is_neox) { + for (int64_t i0 = 2*tiitg; i0 < ne0; i0 += 2*tptg.x) { + + const float theta = theta_0 * pow(freq_base, inv_ndims*i0); + float cos_theta, sin_theta; + rope_yarn(theta, freq_scale, corr_dims, i0, ext_factor, attn_factor, &cos_theta, &sin_theta); + + device const T * const src = (device T *)((device char *) src0 + i3*nb03 + i2*nb02 + i1*nb01 + i0*nb00); + device T * dst_data = (device T *)((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + const T x0 = src[0]; + const T x1 = src[1]; + + dst_data[0] = x0*cos_theta - x1*sin_theta; + dst_data[1] = x0*sin_theta + x1*cos_theta; + } + } else { + for (int64_t ib = 0; ib < ne0/n_dims; ++ib) { + for (int64_t ic = 2*tiitg; ic < n_dims; ic += 2*tptg.x) { + + // simplified from `(ib * n_dims + ic) * inv_ndims` + const float cur_rot = inv_ndims*ic - ib; + + const float theta = theta_0 * pow(freq_base, cur_rot); + float cos_theta, sin_theta; + rope_yarn(theta, freq_scale, corr_dims, cur_rot, ext_factor, attn_factor, &cos_theta, &sin_theta); + + const int64_t i0 = ib*n_dims + ic/2; + + device const T * const src = (device T *)((device char *) src0 + i3*nb03 + i2*nb02 + i1*nb01 + i0*nb00); + device T * dst_data = (device T *)((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + const float x0 = src[0]; + const float x1 = src[n_dims/2]; + + dst_data[0] = x0*cos_theta - x1*sin_theta; + dst_data[n_dims/2] = x0*sin_theta + x1*cos_theta; + } + } + } +} + +template [[host_name("kernel_rope_f32")]] kernel rope_t kernel_rope; +template [[host_name("kernel_rope_f16")]] kernel rope_t kernel_rope; + +kernel void kernel_im2col_f16( + device const float * x, + device half * dst, + constant int32_t & ofs0, + constant int32_t & ofs1, + constant int32_t & IW, + constant int32_t & IH, + constant int32_t & CHW, + constant int32_t & s0, + constant int32_t & s1, + constant int32_t & p0, + constant int32_t & p1, + constant int32_t & d0, + constant int32_t & d1, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tgpg[[threadgroups_per_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int32_t iiw = tgpig[2] * s0 + tpitg[2] * d0 - p0; + const int32_t iih = tgpig[1] * s1 + tpitg[1] * d1 - p1; + + const int32_t offset_dst = + (tpitg[0] * tgpg[1] * tgpg[2] + tgpig[1] * tgpg[2] + tgpig[2]) * CHW + + (tgpig[0] * (ntg[1] * ntg[2]) + tpitg[1] * ntg[2] + tpitg[2]); + + if (iih < 0 || iih >= IH || iiw < 0 || iiw >= IW) { + dst[offset_dst] = 0.0f; + } else { + const int32_t offset_src = tpitg[0] * ofs0 + tgpig[0] * ofs1; + dst[offset_dst] = x[offset_src + iih * IW + iiw]; + } +} + +kernel void kernel_cpy_f16_f16( + device const half * src0, + device half * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0); + + device half * dst_data = (device half *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + for (int64_t i00 = tpitg.x; i00 < ne00; i00 += ntg.x) { + device const half * src = (device half *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + dst_data[i00] = src[0]; + } +} + +kernel void kernel_cpy_f32_f16( + device const float * src0, + device half * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0); + + device half * dst_data = (device half *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + for (int64_t i00 = tpitg.x; i00 < ne00; i00 += ntg.x) { + device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + + dst_data[i00] = src[0]; + } +} + +kernel void kernel_cpy_f32_f32( + device const float * src0, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + const int64_t i03 = tgpig[2]; + const int64_t i02 = tgpig[1]; + const int64_t i01 = tgpig[0]; + + const int64_t n = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; + + const int64_t i3 = n / (ne2*ne1*ne0); + const int64_t i2 = (n - i3*ne2*ne1*ne0) / (ne1*ne0); + const int64_t i1 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0) / ne0; + const int64_t i0 = (n - i3*ne2*ne1*ne0 - i2*ne1*ne0 - i1*ne0); + + device float * dst_data = (device float *) ((device char *) dst + i3*nb3 + i2*nb2 + i1*nb1 + i0*nb0); + + for (int64_t i00 = tpitg.x; i00 < ne00; i00 += ntg.x) { + device const float * src = (device float *)((device char *) src0 + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00); + + dst_data[i00] = src[0]; + } +} + +kernel void kernel_concat( + device const char * src0, + device const char * src1, + device char * dst, + constant int64_t & ne00, + constant int64_t & ne01, + constant int64_t & ne02, + constant int64_t & ne03, + constant uint64_t & nb00, + constant uint64_t & nb01, + constant uint64_t & nb02, + constant uint64_t & nb03, + constant int64_t & ne10, + constant int64_t & ne11, + constant int64_t & ne12, + constant int64_t & ne13, + constant uint64_t & nb10, + constant uint64_t & nb11, + constant uint64_t & nb12, + constant uint64_t & nb13, + constant int64_t & ne0, + constant int64_t & ne1, + constant int64_t & ne2, + constant int64_t & ne3, + constant uint64_t & nb0, + constant uint64_t & nb1, + constant uint64_t & nb2, + constant uint64_t & nb3, + uint3 tgpig[[threadgroup_position_in_grid]], + uint3 tpitg[[thread_position_in_threadgroup]], + uint3 ntg[[threads_per_threadgroup]]) { + + const int64_t i03 = tgpig.z; + const int64_t i02 = tgpig.y; + const int64_t i01 = tgpig.x; + + const int64_t i13 = i03 % ne13; + const int64_t i12 = i02 % ne12; + const int64_t i11 = i01 % ne11; + + device const char * src0_ptr = src0 + i03 * nb03 + i02 * nb02 + i01 * nb01 + tpitg.x*nb00; + device const char * src1_ptr = src1 + i13*nb13 + i12*nb12 + i11*nb11 + tpitg.x*nb10; + device char * dst_ptr = dst + i03*nb3 + i02*nb2 + i01*nb1 + tpitg.x*nb0; + + for (int i0 = tpitg.x; i0 < ne0; i0 += ntg.x) { + if (i02 < ne02) { + ((device float *)dst_ptr)[0] = ((device float *)src0_ptr)[0]; + src0_ptr += ntg.x*nb00; + } else { + ((device float *)dst_ptr)[0] = ((device float *)src1_ptr)[0]; + src1_ptr += ntg.x*nb10; + } + dst_ptr += ntg.x*nb0; + } +} + +//============================================ k-quants ====================================================== + +#ifndef QK_K +#define QK_K 256 +#else +static_assert(QK_K == 256 || QK_K == 64, "QK_K must be 256 or 64"); +#endif + +#if QK_K == 256 +#define K_SCALE_SIZE 12 +#else +#define K_SCALE_SIZE 4 +#endif + +typedef struct { + uint8_t scales[QK_K/16]; // scales and mins, quantized with 4 bits + uint8_t qs[QK_K/4]; // quants + half d; // super-block scale for quantized scales + half dmin; // super-block scale for quantized mins +} block_q2_K; +// 84 bytes / block + +typedef struct { + uint8_t hmask[QK_K/8]; // quants - high bit + uint8_t qs[QK_K/4]; // quants - low 2 bits +#if QK_K == 64 + uint8_t scales[2]; +#else + uint8_t scales[K_SCALE_SIZE]; // scales, quantized with 6 bits +#endif + half d; // super-block scale +} block_q3_K; + +#if QK_K == 64 +typedef struct { + half d[2]; // super-block scales/mins + uint8_t scales[2]; + uint8_t qs[QK_K/2]; // 4-bit quants +} block_q4_K; +#else +typedef struct { + half d; // super-block scale for quantized scales + half dmin; // super-block scale for quantized mins + uint8_t scales[K_SCALE_SIZE]; // scales and mins, quantized with 6 bits + uint8_t qs[QK_K/2]; // 4--bit quants +} block_q4_K; +#endif + +#if QK_K == 64 +typedef struct { + half d; // super-block scales/mins + int8_t scales[QK_K/16]; // 8-bit block scales + uint8_t qh[QK_K/8]; // quants, high bit + uint8_t qs[QK_K/2]; // quants, low 4 bits +} block_q5_K; +#else +typedef struct { + half d; // super-block scale for quantized scales + half dmin; // super-block scale for quantized mins + uint8_t scales[3*QK_K/64]; // scales and mins, quantized with 6 bits + uint8_t qh[QK_K/8]; // quants, high bit + uint8_t qs[QK_K/2]; // quants, low 4 bits +} block_q5_K; +// 176 bytes / block +#endif + +typedef struct { + uint8_t ql[QK_K/2]; // quants, lower 4 bits + uint8_t qh[QK_K/4]; // quants, upper 2 bits + int8_t scales[QK_K/16]; // scales, quantized with 8 bits + half d; // super-block scale +} block_q6_K; +// 210 bytes / block + +static inline uchar4 get_scale_min_k4(int j, device const uint8_t * q) { + uchar4 r; + if (j < 4) { + r[0] = q[j+0] & 63; + r[2] = q[j+1] & 63; + r[1] = q[j+4] & 63; + r[3] = q[j+5] & 63; + } else { + r[0] = (q[j+4] & 0xF) | ((q[j-4] >> 6) << 4); + r[2] = (q[j+5] & 0xF) | ((q[j-3] >> 6) << 4); + r[1] = (q[j+4] >> 4) | ((q[j-0] >> 6) << 4); + r[3] = (q[j+5] >> 4) | ((q[j+1] >> 6) << 4); + } + return r; +} + +//====================================== dot products ========================= + +kernel void kernel_mul_mv_q2_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const int nb = ne00/QK_K; + const int r0 = tgpig.x; + const int r1 = tgpig.y; + const int r2 = tgpig.z; + + const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST; + const int ib_row = first_row * nb; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q2_K * x = (device const block_q2_K *) src0 + ib_row + offset0; + device const float * y = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + float yl[32]; + float sumf[N_DST]={0.f}, all_sum; + + const int step = sizeof(block_q2_K) * nb; + +#if QK_K == 256 + const int ix = tiisg/8; // 0...3 + const int it = tiisg%8; // 0...7 + const int im = it/4; // 0 or 1 + const int ir = it%4; // 0...3 + const int is = (8*ir)/16;// 0 or 1 + + device const float * y4 = y + ix * QK_K + 128 * im + 8 * ir; + + for (int ib = ix; ib < nb; ib += 4) { + + float4 sumy = {0.f, 0.f, 0.f, 0.f}; + for (int i = 0; i < 8; ++i) { + yl[i+ 0] = y4[i+ 0]; sumy[0] += yl[i+ 0]; + yl[i+ 8] = y4[i+32]; sumy[1] += yl[i+ 8]; + yl[i+16] = y4[i+64]; sumy[2] += yl[i+16]; + yl[i+24] = y4[i+96]; sumy[3] += yl[i+24]; + } + + device const uint8_t * sc = (device const uint8_t *)x[ib].scales + 8*im + is; + device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 16 * im + 4 * ir; + device const half * dh = &x[ib].d; + + for (int row = 0; row < N_DST; row++) { + + float4 acc1 = {0.f, 0.f, 0.f, 0.f}; + float4 acc2 = {0.f, 0.f, 0.f, 0.f}; + for (int i = 0; i < 8; i += 2) { + acc1[0] += yl[i+ 0] * (qs[i/2] & 0x0003); + acc2[0] += yl[i+ 1] * (qs[i/2] & 0x0300); + acc1[1] += yl[i+ 8] * (qs[i/2] & 0x000c); + acc2[1] += yl[i+ 9] * (qs[i/2] & 0x0c00); + acc1[2] += yl[i+16] * (qs[i/2] & 0x0030); + acc2[2] += yl[i+17] * (qs[i/2] & 0x3000); + acc1[3] += yl[i+24] * (qs[i/2] & 0x00c0); + acc2[3] += yl[i+25] * (qs[i/2] & 0xc000); + } + float dall = dh[0]; + float dmin = dh[1] * 1.f/16.f; + sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc2[0]) * (sc[0] & 0xF) * 1.f/ 1.f + + (acc1[1] + 1.f/256.f * acc2[1]) * (sc[2] & 0xF) * 1.f/ 4.f + + (acc1[2] + 1.f/256.f * acc2[2]) * (sc[4] & 0xF) * 1.f/16.f + + (acc1[3] + 1.f/256.f * acc2[3]) * (sc[6] & 0xF) * 1.f/64.f) - + dmin * (sumy[0] * (sc[0] & 0xF0) + sumy[1] * (sc[2] & 0xF0) + sumy[2] * (sc[4] & 0xF0) + sumy[3] * (sc[6] & 0xF0)); + + qs += step/2; + sc += step; + dh += step/2; + } + + y4 += 4 * QK_K; + } +#else + const int ix = tiisg/2; // 0...15 + const int it = tiisg%2; // 0...1 + + device const float * y4 = y + ix * QK_K + 8 * it; + + for (int ib = ix; ib < nb; ib += 16) { + + float4 sumy = {0.f, 0.f, 0.f, 0.f}; + for (int i = 0; i < 8; ++i) { + yl[i+ 0] = y4[i+ 0]; sumy[0] += yl[i+ 0]; + yl[i+ 8] = y4[i+16]; sumy[1] += yl[i+ 8]; + yl[i+16] = y4[i+32]; sumy[2] += yl[i+16]; + yl[i+24] = y4[i+48]; sumy[3] += yl[i+24]; + } + + device const uint8_t * sc = (device const uint8_t *)x[ib].scales; + device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 4 * it; + device const half * dh = &x[ib].d; + + for (int row = 0; row < N_DST; row++) { + + float4 acc1 = {0.f, 0.f, 0.f, 0.f}; + float4 acc2 = {0.f, 0.f, 0.f, 0.f}; + for (int i = 0; i < 8; i += 2) { + acc1[0] += yl[i+ 0] * (qs[i/2] & 0x0003); + acc2[0] += yl[i+ 1] * (qs[i/2] & 0x0300); + acc1[1] += yl[i+ 8] * (qs[i/2] & 0x000c); + acc2[1] += yl[i+ 9] * (qs[i/2] & 0x0c00); + acc1[2] += yl[i+16] * (qs[i/2] & 0x0030); + acc2[2] += yl[i+17] * (qs[i/2] & 0x3000); + acc1[3] += yl[i+24] * (qs[i/2] & 0x00c0); + acc2[3] += yl[i+25] * (qs[i/2] & 0xc000); + } + + float dall = dh[0]; + float dmin = dh[1]; + sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc2[0]) * (sc[0] & 0xF) * 1.f/ 1.f + + (acc1[1] + 1.f/256.f * acc2[1]) * (sc[1] & 0xF) * 1.f/ 4.f + + (acc1[2] + 1.f/256.f * acc2[2]) * (sc[2] & 0xF) * 1.f/16.f + + (acc1[3] + 1.f/256.f * acc2[3]) * (sc[3] & 0xF) * 1.f/64.f) - + dmin * (sumy[0] * (sc[0] >> 4) + sumy[1] * (sc[1] >> 4) + sumy[2] * (sc[2] >> 4) + sumy[3] * (sc[3] >> 4)); + + qs += step/2; + sc += step; + dh += step/2; + } + + y4 += 16 * QK_K; + } +#endif + + for (int row = 0; row < N_DST; ++row) { + all_sum = simd_sum(sumf[row]); + if (tiisg == 0) { + dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = all_sum; + } + } +} + +#if QK_K == 256 +kernel void kernel_mul_mv_q3_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const int nb = ne00/QK_K; + + const int64_t r0 = tgpig.x; + const int64_t r1 = tgpig.y; + const int64_t r2 = tgpig.z; + + const int first_row = (r0 * N_SIMDGROUP + sgitg) * 2; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q3_K * x = (device const block_q3_K *) src0 + first_row*nb + offset0; + device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + + float yl[32]; + + //const uint16_t kmask1 = 0x3030; + //const uint16_t kmask2 = 0x0f0f; + + const int tid = tiisg/4; + const int ix = tiisg%4; + const int ip = tid/4; // 0 or 1 + const int il = 2*((tid%4)/2); // 0 or 2 + const int ir = tid%2; + const int n = 8; + const int l0 = n*ir; + + // One would think that the Metal compiler would figure out that ip and il can only have + // 4 possible states, and optimize accordingly. Well, no. It needs help, and we do it + // with these two tales. + // + // Possible masks for the high bit + const ushort4 mm[4] = {{0x0001, 0x0100, 0x0002, 0x0200}, // ip = 0, il = 0 + {0x0004, 0x0400, 0x0008, 0x0800}, // ip = 0, il = 2 + {0x0010, 0x1000, 0x0020, 0x2000}, // ip = 1, il = 0 + {0x0040, 0x4000, 0x0080, 0x8000}}; // ip = 1, il = 2 + + // Possible masks for the low 2 bits + const int4 qm[2] = {{0x0003, 0x0300, 0x000c, 0x0c00}, {0x0030, 0x3000, 0x00c0, 0xc000}}; + + const ushort4 hm = mm[2*ip + il/2]; + + const int shift = 2*il; + const float v1 = il == 0 ? 4.f : 64.f; + const float v2 = 4.f * v1; + + const uint16_t s_shift1 = 4*ip; + const uint16_t s_shift2 = s_shift1 + il; + + const int q_offset = 32*ip + l0; + const int y_offset = 128*ip + 32*il + l0; + + const int step = sizeof(block_q3_K) * nb / 2; + + device const float * y1 = yy + ix*QK_K + y_offset; + + uint32_t scales32, aux32; + thread uint16_t * scales16 = (thread uint16_t *)&scales32; + thread const int8_t * scales = (thread const int8_t *)&scales32; + + float sumf1[2] = {0.f}; + float sumf2[2] = {0.f}; + for (int i = ix; i < nb; i += 4) { + + for (int l = 0; l < 8; ++l) { + yl[l+ 0] = y1[l+ 0]; + yl[l+ 8] = y1[l+16]; + yl[l+16] = y1[l+32]; + yl[l+24] = y1[l+48]; + } + + device const uint16_t * q = (device const uint16_t *)(x[i].qs + q_offset); + device const uint16_t * h = (device const uint16_t *)(x[i].hmask + l0); + device const uint16_t * a = (device const uint16_t *)(x[i].scales); + device const half * dh = &x[i].d; + + for (int row = 0; row < 2; ++row) { + + const float d_all = (float)dh[0]; + + scales16[0] = a[4]; + scales16[1] = a[5]; + aux32 = ((scales32 >> s_shift2) << 4) & 0x30303030; + scales16[0] = a[il+0]; + scales16[1] = a[il+1]; + scales32 = ((scales32 >> s_shift1) & 0x0f0f0f0f) | aux32; + + float s1 = 0, s2 = 0, s3 = 0, s4 = 0, s5 = 0, s6 = 0; + for (int l = 0; l < n; l += 2) { + const int32_t qs = q[l/2]; + s1 += yl[l+0] * (qs & qm[il/2][0]); + s2 += yl[l+1] * (qs & qm[il/2][1]); + s3 += ((h[l/2] & hm[0]) ? 0.f : yl[l+0]) + ((h[l/2] & hm[1]) ? 0.f : yl[l+1]); + s4 += yl[l+16] * (qs & qm[il/2][2]); + s5 += yl[l+17] * (qs & qm[il/2][3]); + s6 += ((h[l/2] & hm[2]) ? 0.f : yl[l+16]) + ((h[l/2] & hm[3]) ? 0.f : yl[l+17]); + } + float d1 = d_all * (s1 + 1.f/256.f * s2 - s3*v1); + float d2 = d_all * (s4 + 1.f/256.f * s5 - s6*v2); + sumf1[row] += d1 * (scales[0] - 32); + sumf2[row] += d2 * (scales[2] - 32); + + s1 = s2 = s3 = s4 = s5 = s6 = 0; + for (int l = 0; l < n; l += 2) { + const int32_t qs = q[l/2+8]; + s1 += yl[l+8] * (qs & qm[il/2][0]); + s2 += yl[l+9] * (qs & qm[il/2][1]); + s3 += ((h[l/2+8] & hm[0]) ? 0.f : yl[l+8]) + ((h[l/2+8] & hm[1]) ? 0.f : yl[l+9]); + s4 += yl[l+24] * (qs & qm[il/2][2]); + s5 += yl[l+25] * (qs & qm[il/2][3]); + s6 += ((h[l/2+8] & hm[2]) ? 0.f : yl[l+24]) + ((h[l/2+8] & hm[3]) ? 0.f : yl[l+25]); + } + d1 = d_all * (s1 + 1.f/256.f * s2 - s3*v1); + d2 = d_all * (s4 + 1.f/256.f * s5 - s6*v2); + sumf1[row] += d1 * (scales[1] - 32); + sumf2[row] += d2 * (scales[3] - 32); + + q += step; + h += step; + a += step; + dh += step; + + } + + y1 += 4 * QK_K; + + } + + for (int row = 0; row < 2; ++row) { + const float sumf = (sumf1[row] + 0.25f * sumf2[row]) / (1 << shift); + sumf1[row] = simd_sum(sumf); + } + if (tiisg == 0) { + for (int row = 0; row < 2; ++row) { + dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = sumf1[row]; + } + } +} +#else +kernel void kernel_mul_mv_q3_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const int nb = ne00/QK_K; + + const int64_t r0 = tgpig.x; + const int64_t r1 = tgpig.y; + const int64_t r2 = tgpig.z; + + const int row = 2 * r0 + sgitg; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q3_K * x = (device const block_q3_K *) src0 + row*nb + offset0; + device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + const int ix = tiisg/4; + const int il = 4 * (tiisg%4);// 0, 4, 8, 12 + const int im = il/8; // 0, 0, 1, 1 + const int in = il%8; // 0, 4, 0, 4 + + float2 sum = {0.f, 0.f}; + + for (int i = ix; i < nb; i += 8) { + + const float d_all = (float)(x[i].d); + + device const uint16_t * q = (device const uint16_t *)(x[i].qs + il); + device const uint16_t * h = (device const uint16_t *)(x[i].hmask + in); + device const uint16_t * s = (device const uint16_t *)(x[i].scales); + device const float * y = yy + i * QK_K + il; + + const float d1 = d_all * ((int32_t)(s[0] & 0x000F) - 8); + const float d2 = d_all * ((int32_t)(s[0] & 0x00F0) - 128) * 1.f/64.f; + const float d3 = d_all * ((int32_t)(s[0] & 0x0F00) - 2048) * 1.f/4096.f; + const float d4 = d_all * ((int32_t)(s[0] & 0xF000) - 32768) * 1.f/262144.f; + + for (int l = 0; l < 4; l += 2) { + const uint16_t hm = h[l/2] >> im; + sum[0] += y[l+ 0] * d1 * ((int32_t)(q[l/2] & 0x0003) - ((hm & 0x0001) ? 0 : 4)) + + y[l+16] * d2 * ((int32_t)(q[l/2] & 0x000c) - ((hm & 0x0004) ? 0 : 16)) + + y[l+32] * d3 * ((int32_t)(q[l/2] & 0x0030) - ((hm & 0x0010) ? 0 : 64)) + + y[l+48] * d4 * ((int32_t)(q[l/2] & 0x00c0) - ((hm & 0x0040) ? 0 : 256)); + sum[1] += y[l+ 1] * d1 * ((int32_t)(q[l/2] & 0x0300) - ((hm & 0x0100) ? 0 : 1024)) + + y[l+17] * d2 * ((int32_t)(q[l/2] & 0x0c00) - ((hm & 0x0400) ? 0 : 4096)) + + y[l+33] * d3 * ((int32_t)(q[l/2] & 0x3000) - ((hm & 0x1000) ? 0 : 16384)) + + y[l+49] * d4 * ((int32_t)(q[l/2] & 0xc000) - ((hm & 0x4000) ? 0 : 65536)); + } + + } + const float sumf = sum[0] + sum[1] * 1.f/256.f; + + const float tot = simd_sum(sumf); + if (tiisg == 0) { + dst[r1*ne0 + r2*ne0*ne1 + row] = tot; + } + +} +#endif + +#if QK_K == 256 +kernel void kernel_mul_mv_q4_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01 [[buffer(4)]], + constant int64_t & ne02 [[buffer(5)]], + constant int64_t & ne10 [[buffer(9)]], + constant int64_t & ne12 [[buffer(11)]], + constant int64_t & ne0 [[buffer(15)]], + constant int64_t & ne1 [[buffer(16)]], + constant uint & gqa [[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const uint16_t kmask1 = 0x3f3f; + const uint16_t kmask2 = 0x0f0f; + const uint16_t kmask3 = 0xc0c0; + + const int ix = tiisg/8; // 0...3 + const int it = tiisg%8; // 0...7 + const int im = it/4; // 0 or 1 + const int ir = it%4; // 0...3 + + const int nb = ne00/QK_K; + const int r0 = tgpig.x; + const int r1 = tgpig.y; + const int r2 = tgpig.z; + //const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST; + const int first_row = r0 * N_DST; + const int ib_row = first_row * nb; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row + offset0; + device const float * y = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + float yl[16]; + float yh[16]; + float sumf[N_DST]={0.f}, all_sum; + + const int step = sizeof(block_q4_K) * nb / 2; + + device const float * y4 = y + ix * QK_K + 64 * im + 8 * ir; + + uint16_t sc16[4]; + thread const uint8_t * sc8 = (thread const uint8_t *)sc16; + + for (int ib = ix; ib < nb; ib += 4) { + + float4 sumy = {0.f, 0.f, 0.f, 0.f}; + for (int i = 0; i < 8; ++i) { + yl[i+0] = y4[i+ 0]; sumy[0] += yl[i+0]; + yl[i+8] = y4[i+ 32]; sumy[1] += yl[i+8]; + yh[i+0] = y4[i+128]; sumy[2] += yh[i+0]; + yh[i+8] = y4[i+160]; sumy[3] += yh[i+8]; + } + + device const uint16_t * sc = (device const uint16_t *)x[ib].scales + im; + device const uint16_t * q1 = (device const uint16_t *)x[ib].qs + 16 * im + 4 * ir; + device const half * dh = &x[ib].d; + + for (int row = 0; row < N_DST; row++) { + + sc16[0] = sc[0] & kmask1; + sc16[1] = sc[2] & kmask1; + sc16[2] = ((sc[4] >> 0) & kmask2) | ((sc[0] & kmask3) >> 2); + sc16[3] = ((sc[4] >> 4) & kmask2) | ((sc[2] & kmask3) >> 2); + + device const uint16_t * q2 = q1 + 32; + + float4 acc1 = {0.f, 0.f, 0.f, 0.f}; + float4 acc2 = {0.f, 0.f, 0.f, 0.f}; + for (int i = 0; i < 8; i += 2) { + acc1[0] += yl[i+0] * (q1[i/2] & 0x000F); + acc1[1] += yl[i+1] * (q1[i/2] & 0x0F00); + acc1[2] += yl[i+8] * (q1[i/2] & 0x00F0); + acc1[3] += yl[i+9] * (q1[i/2] & 0xF000); + acc2[0] += yh[i+0] * (q2[i/2] & 0x000F); + acc2[1] += yh[i+1] * (q2[i/2] & 0x0F00); + acc2[2] += yh[i+8] * (q2[i/2] & 0x00F0); + acc2[3] += yh[i+9] * (q2[i/2] & 0xF000); + } + + float dall = dh[0]; + float dmin = dh[1]; + sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc1[1]) * sc8[0] + + (acc1[2] + 1.f/256.f * acc1[3]) * sc8[1] * 1.f/16.f + + (acc2[0] + 1.f/256.f * acc2[1]) * sc8[4] + + (acc2[2] + 1.f/256.f * acc2[3]) * sc8[5] * 1.f/16.f) - + dmin * (sumy[0] * sc8[2] + sumy[1] * sc8[3] + sumy[2] * sc8[6] + sumy[3] * sc8[7]); + + q1 += step; + sc += step; + dh += step; + } + + y4 += 4 * QK_K; + } + + for (int row = 0; row < N_DST; ++row) { + all_sum = simd_sum(sumf[row]); + if (tiisg == 0) { + dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = all_sum; + } + } +} +#else +kernel void kernel_mul_mv_q4_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const int ix = tiisg/4; // 0...7 + const int it = tiisg%4; // 0...3 + + const int nb = ne00/QK_K; + const int r0 = tgpig.x; + const int r1 = tgpig.y; + const int r2 = tgpig.z; + const int first_row = (r0 * N_SIMDGROUP + sgitg) * N_DST; + const int ib_row = first_row * nb; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q4_K * x = (device const block_q4_K *) src0 + ib_row + offset0; + device const float * y = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + float yl[8]; + float yh[8]; + float sumf[N_DST]={0.f}, all_sum; + + const int step = sizeof(block_q4_K) * nb / 2; + + device const float * y4 = y + ix * QK_K + 8 * it; + + uint16_t sc16[4]; + + for (int ib = ix; ib < nb; ib += 8) { + + float2 sumy = {0.f, 0.f}; + for (int i = 0; i < 8; ++i) { + yl[i] = y4[i+ 0]; sumy[0] += yl[i]; + yh[i] = y4[i+32]; sumy[1] += yh[i]; + } + + device const uint16_t * sc = (device const uint16_t *)x[ib].scales; + device const uint16_t * qs = (device const uint16_t *)x[ib].qs + 4 * it; + device const half * dh = x[ib].d; + + for (int row = 0; row < N_DST; row++) { + + sc16[0] = sc[0] & 0x000f; + sc16[1] = sc[0] & 0x0f00; + sc16[2] = sc[0] & 0x00f0; + sc16[3] = sc[0] & 0xf000; + + float2 acc1 = {0.f, 0.f}; + float2 acc2 = {0.f, 0.f}; + for (int i = 0; i < 8; i += 2) { + acc1[0] += yl[i+0] * (qs[i/2] & 0x000F); + acc1[1] += yl[i+1] * (qs[i/2] & 0x0F00); + acc2[0] += yh[i+0] * (qs[i/2] & 0x00F0); + acc2[1] += yh[i+1] * (qs[i/2] & 0xF000); + } + + float dall = dh[0]; + float dmin = dh[1]; + sumf[row] += dall * ((acc1[0] + 1.f/256.f * acc1[1]) * sc16[0] + + (acc2[0] + 1.f/256.f * acc2[1]) * sc16[1] * 1.f/4096.f) - + dmin * 1.f/16.f * (sumy[0] * sc16[2] + sumy[1] * sc16[3] * 1.f/256.f); + + qs += step; + sc += step; + dh += step; + } + + y4 += 8 * QK_K; + } + + for (int row = 0; row < N_DST; ++row) { + all_sum = simd_sum(sumf[row]); + if (tiisg == 0) { + dst[r1*ne0+ r2*ne0*ne1 + first_row + row] = all_sum; + } + } +} +#endif + +kernel void kernel_mul_mv_q5_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const int nb = ne00/QK_K; + + const int64_t r0 = tgpig.x; + const int64_t r1 = tgpig.y; + const int r2 = tgpig.z; + + const int first_row = (r0 * N_SIMDGROUP + sgitg) * 2; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q5_K * x = (device const block_q5_K *) src0 + first_row*nb + offset0; + device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + + float sumf[2]={0.f}; + + const int step = sizeof(block_q5_K) * nb; + +#if QK_K == 256 +# + float yl[16], yh[16]; + + const uint16_t kmask1 = 0x3f3f; + const uint16_t kmask2 = 0x0f0f; + const uint16_t kmask3 = 0xc0c0; + + const int tid = tiisg/4; + const int ix = tiisg%4; + const int im = tid/4; + const int ir = tid%4; + const int n = 8; + + const int l0 = n*ir; + const int q_offset = 32*im + l0; + const int y_offset = 64*im + l0; + + const uint8_t hm1 = 1u << (2*im); + const uint8_t hm2 = hm1 << 1; + const uint8_t hm3 = hm1 << 4; + const uint8_t hm4 = hm2 << 4; + + uint16_t sc16[4]; + thread const uint8_t * sc8 = (thread const uint8_t *)sc16; + + device const float * y1 = yy + ix*QK_K + y_offset; + + for (int i = ix; i < nb; i += 4) { + + device const uint8_t * q1 = x[i].qs + q_offset; + device const uint8_t * qh = x[i].qh + l0; + device const half * dh = &x[i].d; + device const uint16_t * a = (device const uint16_t *)x[i].scales + im; + + device const float * y2 = y1 + 128; + float4 sumy = {0.f, 0.f, 0.f, 0.f}; + for (int l = 0; l < 8; ++l) { + yl[l+0] = y1[l+ 0]; sumy[0] += yl[l+0]; + yl[l+8] = y1[l+32]; sumy[1] += yl[l+8]; + yh[l+0] = y2[l+ 0]; sumy[2] += yh[l+0]; + yh[l+8] = y2[l+32]; sumy[3] += yh[l+8]; + } + + for (int row = 0; row < 2; ++row) { + + device const uint8_t * q2 = q1 + 64; + + sc16[0] = a[0] & kmask1; + sc16[1] = a[2] & kmask1; + sc16[2] = ((a[4] >> 0) & kmask2) | ((a[0] & kmask3) >> 2); + sc16[3] = ((a[4] >> 4) & kmask2) | ((a[2] & kmask3) >> 2); + + float4 acc1 = {0.f}; + float4 acc2 = {0.f}; + for (int l = 0; l < n; ++l) { + uint8_t h = qh[l]; + acc1[0] += yl[l+0] * (q1[l] & 0x0F); + acc1[1] += yl[l+8] * (q1[l] & 0xF0); + acc1[2] += yh[l+0] * (q2[l] & 0x0F); + acc1[3] += yh[l+8] * (q2[l] & 0xF0); + acc2[0] += h & hm1 ? yl[l+0] : 0.f; + acc2[1] += h & hm2 ? yl[l+8] : 0.f; + acc2[2] += h & hm3 ? yh[l+0] : 0.f; + acc2[3] += h & hm4 ? yh[l+8] : 0.f; + } + const float dall = dh[0]; + const float dmin = dh[1]; + sumf[row] += dall * (sc8[0] * (acc1[0] + 16.f*acc2[0]) + + sc8[1] * (acc1[1]/16.f + 16.f*acc2[1]) + + sc8[4] * (acc1[2] + 16.f*acc2[2]) + + sc8[5] * (acc1[3]/16.f + 16.f*acc2[3])) - + dmin * (sumy[0] * sc8[2] + sumy[1] * sc8[3] + sumy[2] * sc8[6] + sumy[3] * sc8[7]); + + q1 += step; + qh += step; + dh += step/2; + a += step/2; + + } + + y1 += 4 * QK_K; + + } +#else + float yl[8], yh[8]; + + const int il = 4 * (tiisg/8); // 0, 4, 8, 12 + const int ix = tiisg%8; + const int im = il/8; // 0, 0, 1, 1 + const int in = il%8; // 0, 4, 0, 4 + + device const float * y = yy + ix*QK_K + il; + + for (int i = ix; i < nb; i += 8) { + + for (int l = 0; l < 4; ++l) { + yl[l+0] = y[l+ 0]; + yl[l+4] = y[l+16]; + yh[l+0] = y[l+32]; + yh[l+4] = y[l+48]; + } + + device const half * dh = &x[i].d; + device const uint8_t * q = x[i].qs + il; + device const uint8_t * h = x[i].qh + in; + device const int8_t * s = x[i].scales; + + for (int row = 0; row < 2; ++row) { + + const float d = dh[0]; + + float2 acc = {0.f, 0.f}; + for (int l = 0; l < 4; ++l) { + const uint8_t hl = h[l] >> im; + acc[0] += yl[l+0] * s[0] * ((int16_t)(q[l+ 0] & 0x0F) - (hl & 0x01 ? 0 : 16)) + + yl[l+4] * s[1] * ((int16_t)(q[l+16] & 0x0F) - (hl & 0x04 ? 0 : 16)); + acc[1] += yh[l+0] * s[2] * ((int16_t)(q[l+ 0] & 0xF0) - (hl & 0x10 ? 0 : 256)) + + yh[l+4] * s[3] * ((int16_t)(q[l+16] & 0xF0) - (hl & 0x40 ? 0 : 256)); + } + sumf[row] += d * (acc[0] + 1.f/16.f * acc[1]); + + q += step; + h += step; + s += step; + dh += step/2; + + } + + y += 8 * QK_K; + } +#endif + + for (int row = 0; row < 2; ++row) { + const float tot = simd_sum(sumf[row]); + if (tiisg == 0) { + dst[r1*ne0 + r2*ne0*ne1 + first_row + row] = tot; + } + } + +} + +kernel void kernel_mul_mv_q6_K_f32( + device const void * src0, + device const float * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne01[[buffer(4)]], + constant int64_t & ne02[[buffer(5)]], + constant int64_t & ne10[[buffer(9)]], + constant int64_t & ne12[[buffer(11)]], + constant int64_t & ne0[[buffer(15)]], + constant int64_t & ne1[[buffer(16)]], + constant uint & gqa[[buffer(17)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiisg[[thread_index_in_simdgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + const uint8_t kmask1 = 0x03; + const uint8_t kmask2 = 0x0C; + const uint8_t kmask3 = 0x30; + const uint8_t kmask4 = 0xC0; + + const int nb = ne00/QK_K; + + const int64_t r0 = tgpig.x; + const int64_t r1 = tgpig.y; + const int r2 = tgpig.z; + + const int row = 2 * r0 + sgitg; + const uint offset0 = r2/gqa*(nb*ne0); + device const block_q6_K * x = (device const block_q6_K *) src0 + row * nb + offset0; + device const float * yy = (device const float *) src1 + r1*ne10 + r2*ne00*ne1; + + float sumf = 0; + +#if QK_K == 256 + const int tid = tiisg/2; + const int ix = tiisg%2; + const int ip = tid/8; // 0 or 1 + const int il = tid%8; + const int n = 4; + const int l0 = n*il; + const int is = 8*ip + l0/16; + + const int y_offset = 128*ip + l0; + const int q_offset_l = 64*ip + l0; + const int q_offset_h = 32*ip + l0; + + for (int i = ix; i < nb; i += 2) { + + device const uint8_t * q1 = x[i].ql + q_offset_l; + device const uint8_t * q2 = q1 + 32; + device const uint8_t * qh = x[i].qh + q_offset_h; + device const int8_t * sc = x[i].scales + is; + + device const float * y = yy + i * QK_K + y_offset; + + const float dall = x[i].d; + + float4 sums = {0.f, 0.f, 0.f, 0.f}; + for (int l = 0; l < n; ++l) { + sums[0] += y[l+ 0] * ((int8_t)((q1[l] & 0xF) | ((qh[l] & kmask1) << 4)) - 32); + sums[1] += y[l+32] * ((int8_t)((q2[l] & 0xF) | ((qh[l] & kmask2) << 2)) - 32); + sums[2] += y[l+64] * ((int8_t)((q1[l] >> 4) | ((qh[l] & kmask3) << 0)) - 32); + sums[3] += y[l+96] * ((int8_t)((q2[l] >> 4) | ((qh[l] & kmask4) >> 2)) - 32); + } + + sumf += dall * (sums[0] * sc[0] + sums[1] * sc[2] + sums[2] * sc[4] + sums[3] * sc[6]); + + } + +#else + const int ix = tiisg/4; + const int il = 4*(tiisg%4); + + for (int i = ix; i < nb; i += 8) { + device const float * y = yy + i * QK_K + il; + device const uint8_t * ql = x[i].ql + il; + device const uint8_t * qh = x[i].qh + il; + device const int8_t * s = x[i].scales; + + const float d = x[i].d; + + float4 sums = {0.f, 0.f, 0.f, 0.f}; + for (int l = 0; l < 4; ++l) { + sums[0] += y[l+ 0] * ((int8_t)((ql[l+ 0] & 0xF) | ((qh[l] & kmask1) << 4)) - 32); + sums[1] += y[l+16] * ((int8_t)((ql[l+16] & 0xF) | ((qh[l] & kmask2) << 2)) - 32); + sums[2] += y[l+32] * ((int8_t)((ql[l+ 0] >> 4) | ((qh[l] & kmask3) >> 0)) - 32); + sums[3] += y[l+48] * ((int8_t)((ql[l+16] >> 4) | ((qh[l] & kmask4) >> 2)) - 32); + } + sumf += d * (sums[0] * s[0] + sums[1] * s[1] + sums[2] * s[2] + sums[3] * s[3]); + } + +#endif + + const float tot = simd_sum(sumf); + if (tiisg == 0) { + dst[r1*ne0 + r2*ne0*ne1 + row] = tot; + } +} + +//============================= templates and their specializations ============================= + +// NOTE: this is not dequantizing - we are simply fitting the template +template +void dequantize_f32(device const float4x4 * src, short il, thread type4x4 & reg) { + float4x4 temp = *(((device float4x4 *)src)); + for (int i = 0; i < 16; i++){ + reg[i/4][i%4] = temp[i/4][i%4]; + } +} + +template +void dequantize_f16(device const half4x4 * src, short il, thread type4x4 & reg) { + half4x4 temp = *(((device half4x4 *)src)); + for (int i = 0; i < 16; i++){ + reg[i/4][i%4] = temp[i/4][i%4]; + } +} + +template +void dequantize_q4_0(device const block_q4_0 *xb, short il, thread type4x4 & reg) { + device const uint16_t * qs = ((device const uint16_t *)xb + 1); + const float d1 = il ? (xb->d / 16.h) : xb->d; + const float d2 = d1 / 256.f; + const float md = -8.h * xb->d; + const ushort mask0 = il ? 0x00F0 : 0x000F; + const ushort mask1 = mask0 << 8; + + for (int i=0;i<8;i++) { + reg[i/2][2*(i%2)+0] = d1 * (qs[i] & mask0) + md; + reg[i/2][2*(i%2)+1] = d2 * (qs[i] & mask1) + md; + } +} + +template +void dequantize_q4_1(device const block_q4_1 *xb, short il, thread type4x4 & reg) { + device const uint16_t * qs = ((device const uint16_t *)xb + 2); + const float d1 = il ? (xb->d / 16.h) : xb->d; + const float d2 = d1 / 256.f; + const float m = xb->m; + const ushort mask0 = il ? 0x00F0 : 0x000F; + const ushort mask1 = mask0 << 8; + + for (int i=0;i<8;i++) { + reg[i/2][2*(i%2)+0] = ((qs[i] & mask0) * d1) + m; + reg[i/2][2*(i%2)+1] = ((qs[i] & mask1) * d2) + m; + } +} + +template +void dequantize_q5_0(device const block_q5_0 *xb, short il, thread type4x4 & reg) { + device const uint16_t * qs = ((device const uint16_t *)xb + 3); + const float d = xb->d; + const float md = -16.h * xb->d; + const ushort mask = il ? 0x00F0 : 0x000F; + + const uint32_t qh = *((device const uint32_t *)xb->qh); + + const int x_mv = il ? 4 : 0; + + const int gh_mv = il ? 12 : 0; + const int gh_bk = il ? 0 : 4; + + for (int i = 0; i < 8; i++) { + // extract the 5-th bits for x0 and x1 + const uint8_t xh_0 = ((qh >> (gh_mv + 2*i )) << gh_bk) & 0x10; + const uint8_t xh_1 = ((qh >> (gh_mv + 2*i+1)) << gh_bk) & 0x10; + + // combine the 4-bits from qs with the 5th bit + const int32_t x0 = ((((qs[i] ) & mask) >> x_mv) | xh_0); + const int32_t x1 = ((((qs[i] >> 8) & mask) >> x_mv) | xh_1); + + reg[i/2][2*(i%2)+0] = d * x0 + md; + reg[i/2][2*(i%2)+1] = d * x1 + md; + } +} + +template +void dequantize_q5_1(device const block_q5_1 *xb, short il, thread type4x4 & reg) { + device const uint16_t * qs = ((device const uint16_t *)xb + 4); + const float d = xb->d; + const float m = xb->m; + const ushort mask = il ? 0x00F0 : 0x000F; + + const uint32_t qh = *((device const uint32_t *)xb->qh); + + const int x_mv = il ? 4 : 0; + + const int gh_mv = il ? 12 : 0; + const int gh_bk = il ? 0 : 4; + + for (int i = 0; i < 8; i++) { + // extract the 5-th bits for x0 and x1 + const uint8_t xh_0 = ((qh >> (gh_mv + 2*i )) << gh_bk) & 0x10; + const uint8_t xh_1 = ((qh >> (gh_mv + 2*i+1)) << gh_bk) & 0x10; + + // combine the 4-bits from qs with the 5th bit + const int32_t x0 = ((((qs[i] ) & mask) >> x_mv) | xh_0); + const int32_t x1 = ((((qs[i] >> 8) & mask) >> x_mv) | xh_1); + + reg[i/2][2*(i%2)+0] = d * x0 + m; + reg[i/2][2*(i%2)+1] = d * x1 + m; + } +} + +template +void dequantize_q8_0(device const block_q8_0 *xb, short il, thread type4x4 & reg) { + device const int8_t * qs = ((device const int8_t *)xb->qs); + const half d = xb->d; + + for (int i=0;i<16;i++) { + reg[i/4][i%4] = (qs[i + 16*il] * d); + } +} + +template +void dequantize_q2_K(device const block_q2_K *xb, short il, thread type4x4 & reg) { + const half d = xb->d; + const half min = xb->dmin; + device const uint8_t * q = (device const uint8_t *)xb->qs; + half dl, ml; + uint8_t sc = xb->scales[il]; + +#if QK_K == 256 + q = q + 32*(il/8) + 16*(il&1); + il = (il/2)%4; +#endif + half coef = il>1 ? (il>2 ? 1/64.h : 1/16.h) : (il>0 ? 1/4.h : 1.h); + uchar mask = il>1 ? (il>2 ? 192 : 48) : (il>0 ? 12 : 3); + dl = d * (sc & 0xF) * coef, ml = min * (sc >> 4); + for (int i = 0; i < 16; ++i) { + reg[i/4][i%4] = dl * (q[i] & mask) - ml; + } +} + +template +void dequantize_q3_K(device const block_q3_K *xb, short il, thread type4x4 & reg) { + const half d_all = xb->d; + device const uint8_t * q = (device const uint8_t *)xb->qs; + device const uint8_t * h = (device const uint8_t *)xb->hmask; + device const int8_t * scales = (device const int8_t *)xb->scales; + +#if QK_K == 256 + q = q + 32 * (il/8) + 16 * (il&1); + h = h + 16 * (il&1); + uint8_t m = 1 << (il/2); + uint16_t kmask1 = (il/4)>1 ? ((il/4)>2 ? 192 : 48) : \ + ((il/4)>0 ? 12 : 3); + uint16_t kmask2 = il/8 ? 0xF0 : 0x0F; + uint16_t scale_2 = scales[il%8], scale_1 = scales[8 + il%4]; + int16_t dl_int = (il/4)&1 ? (scale_2&kmask2) | ((scale_1&kmask1) << 2) + : (scale_2&kmask2) | ((scale_1&kmask1) << 4); + half dl = il<8 ? d_all * (dl_int - 32.h) : d_all * (dl_int / 16.h - 32.h); + const half ml = 4.h * dl; + + il = (il/2) & 3; + const half coef = il>1 ? (il>2 ? 1/64.h : 1/16.h) : (il>0 ? 1/4.h : 1.h); + const uint8_t mask = il>1 ? (il>2 ? 192 : 48) : (il>0 ? 12 : 3); + dl *= coef; + + for (int i = 0; i < 16; ++i) { + reg[i/4][i%4] = dl * (q[i] & mask) - (h[i] & m ? 0 : ml); + } +#else + float kcoef = il&1 ? 1.f/16.f : 1.f; + uint16_t kmask = il&1 ? 0xF0 : 0x0F; + float dl = d_all * ((scales[il/2] & kmask) * kcoef - 8); + float coef = il>1 ? (il>2 ? 1/64.h : 1/16.h) : (il>0 ? 1/4.h : 1.h); + uint8_t mask = il>1 ? (il>2 ? 192 : 48) : (il>0 ? 12 : 3); + uint8_t m = 1<<(il*2); + for (int i = 0; i < 16; ++i) { + reg[i/4][i%4] = coef * dl * ((q[i] & mask) - ((h[i%8] & (m * (1 + i/8))) ? 0 : 4.f/coef)); + } +#endif +} + +static inline uchar2 get_scale_min_k4_just2(int j, int k, device const uchar * q) { + return j < 4 ? uchar2{uchar(q[j+0+k] & 63), uchar(q[j+4+k] & 63)} + : uchar2{uchar((q[j+4+k] & 0xF) | ((q[j-4+k] & 0xc0) >> 2)), uchar((q[j+4+k] >> 4) | ((q[j-0+k] & 0xc0) >> 2))}; +} + +template +void dequantize_q4_K(device const block_q4_K *xb, short il, thread type4x4 & reg) { + device const uchar * q = xb->qs; + +#if QK_K == 256 + short is = (il/4) * 2; + q = q + (il/4) * 32 + 16 * (il&1); + il = il & 3; + const uchar2 sc = get_scale_min_k4_just2(is, il/2, xb->scales); + const half d = il < 2 ? xb->d : xb->d / 16.h; + const half min = xb->dmin; + const half dl = d * sc[0]; + const half ml = min * sc[1]; +#else + q = q + 16 * (il&1); + device const uint8_t * s = xb->scales; + device const half2 * dh = (device const half2 *)xb->d; + const float2 d = (float2)dh[0]; + const float dl = il<2 ? d[0] * (s[0]&0xF) : d[0] * (s[1]&0xF)/16.h; + const float ml = il<2 ? d[1] * (s[0]>>4) : d[1] * (s[1]>>4); +#endif + const ushort mask = il<2 ? 0x0F : 0xF0; + for (int i = 0; i < 16; ++i) { + reg[i/4][i%4] = dl * (q[i] & mask) - ml; + } +} + +template +void dequantize_q5_K(device const block_q5_K *xb, short il, thread type4x4 & reg) { + device const uint8_t * q = xb->qs; + device const uint8_t * qh = xb->qh; + +#if QK_K == 256 + short is = (il/4) * 2; + q = q + 32 * (il/4) + 16 * (il&1); + qh = qh + 16 * (il&1); + uint8_t ul = 1 << (il/2); + il = il & 3; + const uchar2 sc = get_scale_min_k4_just2(is, il/2, xb->scales); + const half d = il < 2 ? xb->d : xb->d / 16.h; + const half min = xb->dmin; + const half dl = d * sc[0]; + const half ml = min * sc[1]; + + const ushort mask = il<2 ? 0x0F : 0xF0; + const half qh_val = il<2 ? 16.h : 256.h; + for (int i = 0; i < 16; ++i) { + reg[i/4][i%4] = dl * ((q[i] & mask) + (qh[i] & ul ? qh_val : 0)) - ml; + } +#else + q = q + 16 * (il&1); + device const int8_t * s = xb->scales; + const float dl = xb->d * s[il]; + uint8_t m = 1<<(il*2); + const float coef = il<2 ? 1.f : 1.f/16.f; + const ushort mask = il<2 ? 0x0F : 0xF0; + for (int i = 0; i < 16; ++i) { + reg[i/4][i%4] = coef * dl * ((q[i] & mask) - (qh[i%8] & (m*(1+i/8)) ? 0.f : 16.f/coef)); + } +#endif +} + +template +void dequantize_q6_K(device const block_q6_K *xb, short il, thread type4x4 & reg) { + const half d_all = xb->d; + device const uint8_t * ql = (device const uint8_t *)xb->ql; + device const uint8_t * qh = (device const uint8_t *)xb->qh; + device const int8_t * scales = (device const int8_t *)xb->scales; + +#if QK_K == 256 + ql = ql + 64*(il/8) + 32*((il/2)&1) + 16*(il&1); + qh = qh + 32*(il/8) + 16*(il&1); + half sc = scales[(il%2) + 2 * ((il/2))]; + il = (il/2) & 3; +#else + ql = ql + 16 * (il&1); + half sc = scales[il]; +#endif + const uint16_t kmask1 = il>1 ? (il>2 ? 192 : 48) : (il>0 ? 12 : 3); + const uint16_t kmask2 = il>1 ? 0xF0 : 0x0F; + const half coef = il>1 ? 1.f/16.h : 1.h; + const half ml = d_all * sc * 32.h; + const half dl = d_all * sc * coef; + for (int i = 0; i < 16; ++i) { + const half q = il&1 ? ((ql[i] & kmask2) | ((qh[i] & kmask1) << 2)) + : ((ql[i] & kmask2) | ((qh[i] & kmask1) << 4)); + reg[i/4][i%4] = dl * q - ml; + } +} + +template +kernel void kernel_get_rows( + device const void * src0, + device const int * src1, + device float * dst, + constant int64_t & ne00, + constant uint64_t & nb01, + constant uint64_t & nb1, + uint tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint tptg[[threads_per_threadgroup]]) { + const int i = tgpig; + const int r = ((device int32_t *) src1)[i]; + + for (int ind = tiitg; ind < ne00/16; ind += tptg) { + float4x4 temp; + dequantize_func( + ((device const block_q *) ((device char *) src0 + r*nb01)) + ind/nl, ind%nl, temp); + *(((device float4x4 *) ((device char *) dst + i*nb1)) + ind) = temp; + } +} + +#define BLOCK_SIZE_M 64 // 8 simdgroup matrices from matrix A +#define BLOCK_SIZE_N 32 // 4 simdgroup matrices from matrix B +#define BLOCK_SIZE_K 32 +#define THREAD_MAT_M 4 // each thread take 4 simdgroup matrices from matrix A +#define THREAD_MAT_N 2 // each thread take 2 simdgroup matrices from matrix B +#define THREAD_PER_BLOCK 128 +#define THREAD_PER_ROW 2 // 2 thread for each row in matrix A to load numbers +#define THREAD_PER_COL 4 // 4 thread for each row in matrix B to load numbers +#define SG_MAT_SIZE 64 // simdgroup matrix is of shape 8x8 +#define SG_MAT_ROW 8 + +// each block_q contains 16*nl weights +template +kernel void kernel_mul_mm(device const uchar * src0, + device const uchar * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne02, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & ne12, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & gqa, + threadgroup uchar * shared_memory [[threadgroup(0)]], + uint3 tgpig[[threadgroup_position_in_grid]], + uint tiitg[[thread_index_in_threadgroup]], + uint sgitg[[simdgroup_index_in_threadgroup]]) { + + threadgroup half * sa = (threadgroup half *)(shared_memory); + threadgroup float * sb = (threadgroup float *)(shared_memory + 4096); + + const uint r0 = tgpig.y; + const uint r1 = tgpig.x; + const uint im = tgpig.z; + + // if this block is of 64x32 shape or smaller + short n_rows = (ne0 - r0 * BLOCK_SIZE_M < BLOCK_SIZE_M) ? (ne0 - r0 * BLOCK_SIZE_M) : BLOCK_SIZE_M; + short n_cols = (ne1 - r1 * BLOCK_SIZE_N < BLOCK_SIZE_N) ? (ne1 - r1 * BLOCK_SIZE_N) : BLOCK_SIZE_N; + + // a thread shouldn't load data outside of the matrix + short thread_row = ((short)tiitg/THREAD_PER_ROW) < n_rows ? ((short)tiitg/THREAD_PER_ROW) : n_rows - 1; + short thread_col = ((short)tiitg/THREAD_PER_COL) < n_cols ? ((short)tiitg/THREAD_PER_COL) : n_cols - 1; + + simdgroup_half8x8 ma[4]; + simdgroup_float8x8 mb[2]; + simdgroup_float8x8 c_res[8]; + for (int i = 0; i < 8; i++){ + c_res[i] = make_filled_simdgroup_matrix(0.f); + } + + short il = (tiitg % THREAD_PER_ROW); + + uint offset0 = im/gqa*nb02; + ushort offset1 = il/nl; + + device const block_q * x = (device const block_q *)(src0 + (r0 * BLOCK_SIZE_M + thread_row) * nb01 + offset0) + offset1; + device const float * y = (device const float *)(src1 + + nb12 * im + + nb11 * (r1 * BLOCK_SIZE_N + thread_col) + + nb10 * (BLOCK_SIZE_K / THREAD_PER_COL * (tiitg % THREAD_PER_COL))); + + for (int loop_k = 0; loop_k < ne00; loop_k += BLOCK_SIZE_K) { + // load data and store to threadgroup memory + half4x4 temp_a; + dequantize_func(x, il, temp_a); + threadgroup_barrier(mem_flags::mem_threadgroup); + + #pragma unroll(16) + for (int i = 0; i < 16; i++) { + *(sa + SG_MAT_SIZE * ((tiitg / THREAD_PER_ROW / 8) \ + + (tiitg % THREAD_PER_ROW) * 16 + (i / 8) * 8) \ + + (tiitg / THREAD_PER_ROW) % 8 + (i & 7) * 8) = temp_a[i/4][i%4]; + } + + *(threadgroup float2x4 *)(sb + (tiitg % THREAD_PER_COL) * 8 * 32 + 8 * (tiitg / THREAD_PER_COL)) = *((device float2x4 *)y); + + il = (il + 2 < nl) ? il + 2 : il % 2; + x = (il < 2) ? x + (2+nl-1)/nl : x; + y += BLOCK_SIZE_K; + + threadgroup_barrier(mem_flags::mem_threadgroup); + + // load matrices from threadgroup memory and conduct outer products + threadgroup half * lsma = (sa + THREAD_MAT_M * SG_MAT_SIZE * (sgitg % 2)); + threadgroup float * lsmb = (sb + THREAD_MAT_N * SG_MAT_SIZE * (sgitg / 2)); + + #pragma unroll(4) + for (int ik = 0; ik < BLOCK_SIZE_K / 8; ik++) { + #pragma unroll(4) + for (int i = 0; i < 4; i++) { + simdgroup_load(ma[i],lsma + SG_MAT_SIZE * i); + } + simdgroup_barrier(mem_flags::mem_none); + #pragma unroll(2) + for (int i = 0; i < 2; i++) { + simdgroup_load(mb[i],lsmb + SG_MAT_SIZE * i); + } + + lsma += BLOCK_SIZE_M / SG_MAT_ROW * SG_MAT_SIZE; + lsmb += BLOCK_SIZE_N / SG_MAT_ROW * SG_MAT_SIZE; + + #pragma unroll(8) + for (int i = 0; i < 8; i++){ + simdgroup_multiply_accumulate(c_res[i], mb[i/4], ma[i%4], c_res[i]); + } + } + } + + if ((r0 + 1) * BLOCK_SIZE_M <= ne0 && (r1 + 1) * BLOCK_SIZE_N <= ne1) { + device float * C = dst + (BLOCK_SIZE_M * r0 + 32 * (sgitg & 1)) \ + + (BLOCK_SIZE_N * r1 + 16 * (sgitg >> 1)) * ne0 + im*ne1*ne0; + for (int i = 0; i < 8; i++) { + simdgroup_store(c_res[i], C + 8 * (i%4) + 8 * ne0 * (i/4), ne0); + } + } else { + // block is smaller than 64x32, we should avoid writing data outside of the matrix + threadgroup_barrier(mem_flags::mem_threadgroup); + threadgroup float * temp_str = ((threadgroup float *)shared_memory) \ + + 32 * (sgitg&1) + (16 * (sgitg>>1)) * BLOCK_SIZE_M; + for (int i = 0; i < 8; i++) { + simdgroup_store(c_res[i], temp_str + 8 * (i%4) + 8 * BLOCK_SIZE_M * (i/4), BLOCK_SIZE_M); + } + + threadgroup_barrier(mem_flags::mem_threadgroup); + + device float * C = dst + (BLOCK_SIZE_M * r0) + (BLOCK_SIZE_N * r1) * ne0 + im*ne1*ne0; + if (sgitg == 0) { + for (int i = 0; i < n_rows; i++) { + for (int j = tiitg; j < n_cols; j += BLOCK_SIZE_N) { + *(C + i + j * ne0) = *(temp_str + i + j * BLOCK_SIZE_M); + } + } + } + } +} + +#if QK_K == 256 +#define QK_NL 16 +#else +#define QK_NL 4 +#endif + +typedef void (get_rows_t)(device const void *, device const int *, device float *, constant int64_t &, \ + constant uint64_t &, constant uint64_t &, uint, uint, uint); + +template [[host_name("kernel_get_rows_f32")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_f16")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q4_0")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q4_1")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q5_0")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q5_1")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q8_0")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q2_K")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q3_K")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q4_K")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q5_K")]] kernel get_rows_t kernel_get_rows; +template [[host_name("kernel_get_rows_q6_K")]] kernel get_rows_t kernel_get_rows; + +typedef void (mat_mm_t)( + device const uchar * src0, + device const uchar * src1, + device float * dst, + constant int64_t & ne00, + constant int64_t & ne02, + constant int64_t & nb01, + constant int64_t & nb02, + constant int64_t & ne12, + constant int64_t & nb10, + constant int64_t & nb11, + constant int64_t & nb12, + constant int64_t & ne0, + constant int64_t & ne1, + constant uint & gqa, + threadgroup uchar *, uint3, uint, uint); + +template [[host_name("kernel_mul_mm_f32_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_f16_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q4_0_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q4_1_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q5_0_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q5_1_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q8_0_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q2_K_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q3_K_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q4_K_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q5_K_f32")]] kernel mat_mm_t kernel_mul_mm; +template [[host_name("kernel_mul_mm_q6_K_f32")]] kernel mat_mm_t kernel_mul_mm; diff --git a/enjoy/lib/whisper.cpp/arm64/darwin/main b/enjoy/lib/whisper.cpp/arm64/darwin/main new file mode 100755 index 00000000..642b5f06 Binary files /dev/null and b/enjoy/lib/whisper.cpp/arm64/darwin/main differ diff --git a/enjoy/lib/whisper.cpp/arm64/darwin/quantize b/enjoy/lib/whisper.cpp/arm64/darwin/quantize new file mode 100755 index 00000000..ec7ff498 Binary files /dev/null and b/enjoy/lib/whisper.cpp/arm64/darwin/quantize differ diff --git a/enjoy/lib/whisper.cpp/x64/darwin/bench b/enjoy/lib/whisper.cpp/x64/darwin/bench new file mode 100755 index 00000000..689596c1 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/darwin/bench differ diff --git a/enjoy/lib/whisper.cpp/x64/darwin/main b/enjoy/lib/whisper.cpp/x64/darwin/main new file mode 100755 index 00000000..42426045 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/darwin/main differ diff --git a/enjoy/lib/whisper.cpp/x64/darwin/quantize b/enjoy/lib/whisper.cpp/x64/darwin/quantize new file mode 100755 index 00000000..0531a529 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/darwin/quantize differ diff --git a/enjoy/lib/whisper.cpp/x64/linux/bench b/enjoy/lib/whisper.cpp/x64/linux/bench new file mode 100755 index 00000000..09dcafa8 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/linux/bench differ diff --git a/enjoy/lib/whisper.cpp/x64/linux/main b/enjoy/lib/whisper.cpp/x64/linux/main new file mode 100755 index 00000000..7c4b3ba8 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/linux/main differ diff --git a/enjoy/lib/whisper.cpp/x64/linux/quantize b/enjoy/lib/whisper.cpp/x64/linux/quantize new file mode 100755 index 00000000..2880bb5f Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/linux/quantize differ diff --git a/enjoy/lib/whisper.cpp/x64/win32/SDL2.dll b/enjoy/lib/whisper.cpp/x64/win32/SDL2.dll new file mode 100644 index 00000000..98b4d6b8 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/win32/SDL2.dll differ diff --git a/enjoy/lib/whisper.cpp/x64/win32/bench.exe b/enjoy/lib/whisper.cpp/x64/win32/bench.exe new file mode 100644 index 00000000..61534e8d Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/win32/bench.exe differ diff --git a/enjoy/lib/whisper.cpp/x64/win32/main.exe b/enjoy/lib/whisper.cpp/x64/win32/main.exe new file mode 100644 index 00000000..ba81cd26 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/win32/main.exe differ diff --git a/enjoy/lib/whisper.cpp/x64/win32/quantize.exe b/enjoy/lib/whisper.cpp/x64/win32/quantize.exe new file mode 100644 index 00000000..30dfd4fc Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/win32/quantize.exe differ diff --git a/enjoy/lib/whisper.cpp/x64/win32/whisper.dll b/enjoy/lib/whisper.cpp/x64/win32/whisper.dll new file mode 100644 index 00000000..e94b0f11 Binary files /dev/null and b/enjoy/lib/whisper.cpp/x64/win32/whisper.dll differ diff --git a/enjoy/lib/youtubedr/LICENSE b/enjoy/lib/youtubedr/LICENSE new file mode 100644 index 00000000..d252f533 --- /dev/null +++ b/enjoy/lib/youtubedr/LICENSE @@ -0,0 +1,22 @@ +The MIT License (MIT) + +Copyright (c) 2015 Evan Lin + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + diff --git a/enjoy/lib/youtubedr/README.md b/enjoy/lib/youtubedr/README.md new file mode 100644 index 00000000..df5cb5e2 --- /dev/null +++ b/enjoy/lib/youtubedr/README.md @@ -0,0 +1,161 @@ +Download Youtube Video in Golang +================== + +[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/kkdai/youtube/master/LICENSE) +[![Go Reference](https://pkg.go.dev/badge/github.com/kkdai/youtube.svg)](https://pkg.go.dev/github.com/kkdai/youtube/v2) +[![Build Status](https://github.com/kkdai/youtube/workflows/go/badge.svg?branch=master)](https://github.com/kkdai/youtube/actions) +[![Coverage](https://codecov.io/gh/kkdai/youtube/branch/master/graph/badge.svg)](https://codecov.io/gh/kkdai/youtube) +[![](https://goreportcard.com/badge/github.com/kkdai/youtube)](https://goreportcard.com/badge/github.com/kkdai/youtube) + + +This package is a Youtube video download package, for more detail refer [https://github.com/ytdl-org/youtube-dl](https://github.com/ytdl-org/youtube-dl) for more download options. + +This tool is meant to be used to download CC0 licenced content, we do not support nor recommend using it for illegal activities. + +## Overview + * [Install](#installation) + * [Usage](#usage) + * [Example: Download video from \[dotGo 2015 - Rob Pike - Simplicity is Complicated\]](#download-dotGo-2015-rob-pike-video) + +## Installation + +### Install via go get + +Please ensure you have installed Go 1.18 or later. + +```shell +go get github.com/kkdai/youtube/v2 +``` + +### From source code + +```shell +git clone https://github.com/kkdai/youtube.git +cd youtube +go run ./cmd/youtubedr +``` + +### Mac + +```shell +brew install youtubedr +``` + +### in Termux +```shell +pkg install youtubedr +``` +### You can also find this package in +- [archlinux](https://aur.archlinux.org/packages/youtubedr/) (thanks to [cjsthompson](https://github.com/cjsthompson)) +- [Termux package](https://github.com/termux/termux-packages/tree/master/packages/youtubedr) (thanks to [kcubeterm](https://github.com/kcubeterm)) +- [Homebrew](https://formulae.brew.sh/formula/youtubedr) (thanks to [kkc](https://github.com/kkc)) + +## Usage + +### Use the binary directly +It's really simple to use, just get the video id from youtube url - ex: `https://www.youtube.com/watch?v=rFejpH_tAHM`, the video id is `rFejpH_tAHM` + +```shell +$ youtubedr download QAGDGja7kbs +$ youtubedr download https://www.youtube.com/watch?v=rFejpH_tAHM +``` + + +### Use this package in your golang program + +Please check out the [example_test.go](example_test.go) for example code. + + +## Example: + * ### Get information of dotGo-2015-rob-pike video for downloading + + `go get github.com/kkdai/youtube/v2/youtubedr` + + Download video from [dotGo 2015 - Rob Pike - Simplicity is Complicated](https://www.youtube.com/watch?v=rFejpH_tAHM) + + ``` + youtubedr info https://www.youtube.com/watch?v=rFejpH_tAHM + + Title: dotGo 2015 - Rob Pike - Simplicity is Complicated + Author: dotconferences + -----available streams----- + itag: 18 , quality: medium , type: video/mp4; codecs="avc1.42001E, mp4a.40.2" + itag: 22 , quality: hd720 , type: video/mp4; codecs="avc1.64001F, mp4a.40.2" + itag: 137 , quality: hd1080 , type: video/mp4; codecs="avc1.640028" + itag: 248 , quality: hd1080 , type: video/webm; codecs="vp9" + ........ + ``` + * ### Download dotGo-2015-rob-pike-video + + `go get github.com/kkdai/youtube/v2/youtubedr` + + Download video from [dotGo 2015 - Rob Pike - Simplicity is Complicated](https://www.youtube.com/watch?v=rFejpH_tAHM) + + ``` + youtubedr download https://www.youtube.com/watch?v=rFejpH_tAHM + ``` + + * ### Download video to specific folder and name + + `go get github.com/kkdai/youtube/v2/youtubedr` + + Download video from [dotGo 2015 - Rob Pike - Simplicity is Complicated](https://www.youtube.com/watch?v=rFejpH_tAHM) to current directory and name the file to simplicity-is-complicated.mp4 + + ``` + youtubedr download -d ./ -o simplicity-is-complicated.mp4 https://www.youtube.com/watch?v=rFejpH_tAHM + ``` + + * ### Download video with specific quality + + `go get github.com/kkdai/youtube/v2/youtubedr` + + Download video from [dotGo 2015 - Rob Pike - Simplicity is Complicated](https://www.youtube.com/watch?v=rFejpH_tAHM) with specific quality + + ``` + youtubedr download -q medium https://www.youtube.com/watch?v=rFejpH_tAHM + ``` + + #### Special case by quality hd1080: + Installation of ffmpeg is necessary for hd1080 + ``` + ffmpeg //check ffmpeg is installed, if not please download ffmpeg and set to your PATH. + youtubedr download -q hd1080 https://www.youtube.com/watch?v=rFejpH_tAHM + ``` + + + * ### Download video with specific itag + + `go get github.com/kkdai/youtube/v2/youtubedr` + + Download video from [dotGo 2015 - Rob Pike - Simplicity is Complicated](https://www.youtube.com/watch?v=rFejpH_tAHM) + + ``` + youtubedr download -q 18 https://www.youtube.com/watch?v=rFejpH_tAHM + ``` + +## How it works + +- Parse the video ID you input in URL + - ex: `https://www.youtube.com/watch?v=rFejpH_tAHM`, the video id is `rFejpH_tAHM` +- Get video information via video id. + - Use URL: `http://youtube.com/get_video_info?video_id=` +- Parse and decode video information. + - Download URL in "url=" + - title in "title=" +- Download video from URL + - Need the string combination of "url" + +## Inspired +- [https://github.com/ytdl-org/youtube-dl](https://github.com/ytdl-org/youtube-dl) +- [https://github.com/lepidosteus/youtube-dl](https://github.com/lepidosteus/youtube-dl) +- [拆解 Youtube 影片下載位置](http://hkgoldenmra.blogspot.tw/2013/05/youtube.html) +- [iawia002/annie](https://github.com/iawia002/annie) +- [How to get url from obfuscate video info: youtube video downloader with php](https://stackoverflow.com/questions/60607291/youtube-video-downloader-with-php) + + +## Project52 +It is one of my [project 52](https://github.com/kkdai/project52). + + +## License +This package is licensed under MIT license. See LICENSE for details. diff --git a/enjoy/lib/youtubedr/arm64/darwin/youtubedr b/enjoy/lib/youtubedr/arm64/darwin/youtubedr new file mode 100755 index 00000000..c69e13c7 Binary files /dev/null and b/enjoy/lib/youtubedr/arm64/darwin/youtubedr differ diff --git a/enjoy/lib/youtubedr/x64/darwin/youtubedr b/enjoy/lib/youtubedr/x64/darwin/youtubedr new file mode 100755 index 00000000..c8bd906e Binary files /dev/null and b/enjoy/lib/youtubedr/x64/darwin/youtubedr differ diff --git a/enjoy/lib/youtubedr/x64/linux/youtubedr b/enjoy/lib/youtubedr/x64/linux/youtubedr new file mode 100755 index 00000000..65de2596 Binary files /dev/null and b/enjoy/lib/youtubedr/x64/linux/youtubedr differ diff --git a/enjoy/lib/youtubedr/x64/win32/youtubedr.exe b/enjoy/lib/youtubedr/x64/win32/youtubedr.exe new file mode 100755 index 00000000..24de4a79 Binary files /dev/null and b/enjoy/lib/youtubedr/x64/win32/youtubedr.exe differ diff --git a/enjoy/package.json b/enjoy/package.json new file mode 100644 index 00000000..4255be1b --- /dev/null +++ b/enjoy/package.json @@ -0,0 +1,140 @@ +{ + "private": true, + "name": "enjoy", + "productName": "Enjoy", + "version": "0.1.0", + "description": "Enjoy desktop app", + "main": ".vite/build/main.js", + "types": "./src/types.d.ts", + "scripts": { + "dev": "WEB_API_URL=http://localhost:3000 electron-forge start", + "start": "electron-forge start", + "package": "electron-forge package", + "make": "electron-forge make", + "publish": "electron-forge publish", + "lint": "eslint --ext .ts,.tsx .", + "create-migration": "zx ./src/main/db/create-migration.mjs" + }, + "keywords": [], + "author": { + "name": "an-lee", + "email": "an.lee.work@gmail.com" + }, + "license": "MIT", + "devDependencies": { + "@electron-forge/cli": "^7.2.0", + "@electron-forge/maker-deb": "^7.2.0", + "@electron-forge/maker-rpm": "^7.2.0", + "@electron-forge/maker-squirrel": "^7.2.0", + "@electron-forge/maker-zip": "^7.2.0", + "@electron-forge/plugin-auto-unpack-natives": "^7.2.0", + "@electron-forge/plugin-vite": "^7.2.0", + "@tailwindcss/typography": "^0.5.10", + "@types/adm-zip": "^0.5.5", + "@types/autosize": "^4.0.3", + "@types/command-exists": "^1.2.3", + "@types/fluent-ffmpeg": "^2.1.24", + "@types/html-to-text": "^9.0.4", + "@types/lodash": "^4.14.202", + "@types/mark.js": "^8.11.12", + "@types/node": "^20.10.6", + "@types/react": "^18.2.46", + "@types/react-dom": "^18.2.18", + "@types/validator": "^13.11.7", + "@types/wavesurfer.js": "^6.0.12", + "@typescript-eslint/eslint-plugin": "^6.17.0", + "@typescript-eslint/parser": "^6.17.0", + "@vitejs/plugin-react": "^4.2.1", + "autoprefixer": "^10.4.16", + "electron": "^28.1.1", + "eslint": "^8.56.0", + "eslint-import-resolver-typescript": "^3.6.1", + "eslint-plugin-import": "^2.29.1", + "flora-colossus": "^2.0.0", + "octokit": "^3.1.2", + "tailwind-merge": "^2.2.0", + "tailwindcss": "^3.4.1", + "tailwindcss-animate": "^1.0.7", + "ts-node": "^10.9.2", + "tslib": "^2.6.2", + "typescript": "^5.3.3", + "vite-plugin-static-copy": "^1.0.0", + "zx": "^7.2.3" + }, + "dependencies": { + "@electron-forge/publisher-github": "^7.2.0", + "@hookform/resolvers": "^3.3.4", + "@langchain/google-genai": "^0.0.7", + "@mozilla/readability": "^0.5.0", + "@radix-ui/react-accordion": "^1.1.2", + "@radix-ui/react-alert-dialog": "^1.0.5", + "@radix-ui/react-aspect-ratio": "^1.0.3", + "@radix-ui/react-avatar": "^1.0.4", + "@radix-ui/react-dropdown-menu": "^2.0.6", + "@radix-ui/react-hover-card": "^1.0.7", + "@radix-ui/react-icons": "^1.3.0", + "@radix-ui/react-label": "^2.0.2", + "@radix-ui/react-menubar": "^1.0.4", + "@radix-ui/react-popover": "^1.0.7", + "@radix-ui/react-progress": "^1.0.3", + "@radix-ui/react-radio-group": "^1.1.3", + "@radix-ui/react-scroll-area": "^1.0.5", + "@radix-ui/react-select": "^2.0.0", + "@radix-ui/react-separator": "^1.0.3", + "@radix-ui/react-slider": "^1.1.2", + "@radix-ui/react-slot": "^1.0.2", + "@radix-ui/react-switch": "^1.0.3", + "@radix-ui/react-tabs": "^1.0.4", + "@radix-ui/react-toast": "^1.1.5", + "@radix-ui/react-toggle": "^1.0.3", + "@radix-ui/react-tooltip": "^1.0.7", + "@uidotdev/usehooks": "^2.4.1", + "@vidstack/react": "^1.9.8", + "adm-zip": "^0.5.10", + "autosize": "^6.0.1", + "axios": "^1.6.5", + "camelcase": "^8.0.0", + "camelcase-keys": "^9.1.2", + "cheerio": "^1.0.0-rc.12", + "class-variance-authority": "^0.7.0", + "clsx": "^2.1.0", + "command-exists": "^1.2.9", + "compromise": "^14.11.0", + "compromise-paragraphs": "^0.1.0", + "compromise-stats": "^0.1.0", + "dayjs": "^1.11.10", + "decamelize": "^6.0.0", + "decamelize-keys": "^2.0.1", + "electron-log": "^5.0.3", + "electron-settings": "^4.0.2", + "electron-squirrel-startup": "^1.0.0", + "fluent-ffmpeg": "^2.1.2", + "fs-extra": "^11.2.0", + "html-to-text": "^9.0.5", + "i18next": "^23.7.16", + "langchain": "^0.0.214", + "lodash": "^4.17.21", + "lucide-react": "^0.306.0", + "mark.js": "^8.11.1", + "microsoft-cognitiveservices-speech-sdk": "^1.34.0", + "openai": "^4.24.1", + "pitchfinder": "^2.3.2", + "postcss": "^8.4.33", + "react": "^18.2.0", + "react-activity-calendar": "^2.2.1", + "react-dom": "^18.2.0", + "react-hook-form": "^7.49.2", + "react-i18next": "^14.0.0", + "react-markdown": "^9.0.1", + "react-router-dom": "^6.21.1", + "react-tooltip": "^5.25.1", + "reflect-metadata": "^0.2.1", + "sequelize": "^6.35.2", + "sequelize-typescript": "^2.1.6", + "sqlite3": "^5.1.7", + "tailwind-scrollbar-hide": "^1.1.7", + "umzug": "^3.5.0", + "wavesurfer.js": "^7.6.1", + "zod": "^3.22.4" + } +} diff --git a/enjoy/postcss.config.js b/enjoy/postcss.config.js new file mode 100644 index 00000000..33ad091d --- /dev/null +++ b/enjoy/postcss.config.js @@ -0,0 +1,6 @@ +module.exports = { + plugins: { + tailwindcss: {}, + autoprefixer: {}, + }, +} diff --git a/enjoy/src/constants.ts b/enjoy/src/constants.ts new file mode 100644 index 00000000..0b3a869b --- /dev/null +++ b/enjoy/src/constants.ts @@ -0,0 +1,80 @@ +export const DATABASE_NAME = "enjoy_database"; +export const LIBRARY_PATH_SUFFIX = "EnjoyLibrary"; + +export const STORAGE_WORKER_ENDPOINT = "https://enjoy-storage.baizhiheizi.com"; +export const WEB_API_URL = "https://enjoy-web.fly.dev"; + +// https://huggingface.co/ggerganov/whisper.cpp/tree/main +export const WHISPER_MODELS_OPTIONS = [ + { + type: "tiny", + name: "ggml-tiny.en.bin", + size: "77.7 MB", + url: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin", + }, + { + type: "base", + name: "ggml-base.en.bin", + size: "148 MB", + url: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin", + }, + { + type: "small", + name: "ggml-small.en.bin", + size: "488 MB", + url: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en.bin", + }, + { + type: "medium", + name: "ggml-medium.en.bin", + size: "1.53 GB", + url: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.en.bin", + }, + { + type: "large", + name: "ggml-large.bin", + size: "3.09 GB", + url: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large.bin", + }, +]; + +export const AudioFormats = ["mp3", "wav", "ogg", "flac", "m4a", "wma", "aac"]; + +export const VideoFormats = ["mp4", "mkv", "avi", "mov", "wmv", "flv", "webm"]; + +export const PROCESS_TIMEOUT = 1000 * 60 * 15; + +export const AI_GATEWAY_ENDPOINT = + "https://gateway.ai.cloudflare.com/v1/11d43ab275eb7e1b271ba4089ecc3864/enjoy"; + +export const CONVERSATION_PRESET_SCENARIOS: { + scenario: string; + autoSpeech: boolean; + prompt: string; +}[] = [ + { + scenario: "translation", + autoSpeech: false, + prompt: `Act as a translation machine that converts any language input I provide into fluent, idiomatic American English. If the input is already in English, refine it to sound like native American English. + +Suggestions: + +Ensure that the translation maintains the original meaning and tone of the input as much as possible. + +In case of English inputs, focus on enhancing clarity, grammar, and style to match American English standards. + +Return the translation only, no other words needed. + `, + }, + { + scenario: "vocal_coach", + autoSpeech: true, + prompt: `As an AI English vocal coach with an American accent, engage in a conversation with me to help improve my spoken English skills. Use the appropriate tone and expressions that a native American English speaker would use, keeping in mind that your responses will be converted to audio. + +Suggestions: + +Use common American idioms and phrases to give a more authentic experience of American English. +Provide corrections and suggestions for improvement in a supportive and encouraging manner. +Use a variety of sentence structures and vocabulary to expose me to different aspects of the language.`, + }, +]; diff --git a/enjoy/src/i18n/en.json b/enjoy/src/i18n/en.json new file mode 100644 index 00000000..6f9ab3b2 --- /dev/null +++ b/enjoy/src/i18n/en.json @@ -0,0 +1,316 @@ +{ + "models": { + "user": { + "id": "ID", + "name": "Name" + }, + "audio": { + "name": "name", + "namePlaceholder": "name your audio", + "type": "type", + "description": "description", + "descriptionPlaceholder": "describe your audio", + "format": "format", + "duration": "duration", + "size": "size", + "source": "source", + "createdAt": "created at", + "recordingsCount": "recordings count", + "recordingsDuration": "recordings duration", + "isTranscribed": "transcribed", + "added": "Successfully added audio", + "removed": "Successfully removed audio", + "notFound": "Video not found", + "failedToAdd": "Failed to add audio, {{ error }}", + "fileNotFound": "File not found {{file}}", + "fileNotSupported": "File not supported {{file}}", + "failedToCopyFile": "Failed to copy file {{file}}", + "failedToDownloadFile": "Failed to download file {{file}}", + "transcriptionFinished": "Transcription finished" + }, + "video": { + "name": "name", + "namePlaceholder": "name your video", + "type": "type", + "description": "description", + "descriptionPlaceholder": "describe your video", + "format": "format", + "duration": "duration", + "size": "size", + "source": "source", + "createdAt": "created at", + "recordingsCount": "recordings count", + "recordingsDuration": "recordings duration", + "isTranscribed": "transcribed", + "added": "Successfully added video", + "removed": "Successfully removed video", + "notFound": "Video not found", + "failedToAdd": "Failed to add video, {{ error }}", + "fileNotFound": "File not found {{file}}", + "fileNotSupported": "File not supported {{file}}", + "failedToCopyFile": "Failed to copy file {{file}}", + "failedToDownloadFile": "Failed to download file {{file}}", + "transcriptionFinished": "Transcription finished" + }, + "recording": { + "segmentIndex": "Segment Index", + "segmentText": "Segment Text", + "duration": "Duration", + "durationTooShort": "Duration too short", + "failedToSave": "Failed to save recording", + "notFound": "Recording not found" + }, + "conversation": { + "name": "Name", + "engine": "AI engine", + "baseUrl": "Request endpoint", + "configuration": "Configuration", + "model": "AI model", + "roleDefinition": "Role definition", + "temperature": "Temperature", + "temperatureDescription": "The higher the temperature, the more creative the result", + "maxTokens": "Max tokens", + "maxTokensDescription": "The maximum number of tokens to generate", + "presencePenalty": "Presence penalty", + "presencePenaltyDescription": "Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.", + "frequencyPenalty": "Frequency penalty", + "frequencyPenaltyDescription": " Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.", + "historyBufferSize": "History buffer size", + "historyBufferSizeDescription": "The number of message history to include in the context", + "numberOfChoices": "Number of choices", + "numberOfChoicesDescription": "The number of results to generate", + "ttsEngine": "TTS engine", + "ttsModel": "TTS model", + "ttsVoice": "TTS voice", + "notFound": "Conversation not found", + "contentRequired": "Content required" + }, + "pronunciationAssessment": { + "pronunciationScore": "Pronunciation Score", + "fluencyScore": "Fluency Score", + "completenessScore": "Completeness Score", + "accuracyScore": "Accuracy Score", + "prosodyScore": "Prosody Score", + "grammarScore": "Grammar Score", + "vocabularyScore": "Vocabulary Score", + "topicScore": "Topic Score", + "errors": { + "omission": "Omission", + "insertion": "Insertion", + "misspronunciation": "Misspronunciation", + "unexpectedBreak": "Unexpected Break", + "missingBreak": "Missing Break", + "monotone": "Monotone" + }, + "explainations": { + "pronunciationScore": "Pronunciation score is the overall score of the pronunciation, fluency, completeness, and accuracy of the speech. The score is calculated by the weighted average of the four scores.", + "accuracyScore": "Accuracy score is the accuracy of the speech. The accuracy indicates how close the speech is to the reference text.", + "fluencyScore": "Fluency score is the fluency of the speech. The fluency indicates how smooth the speech is.", + "completenessScore": "Completeness score is the completeness of the speech. The completeness indicates how complete the speech is.", + "prosodyScore": "Prosody score is the prosody of the speech. The prosody indicates how natural the speech is.", + "omission": "Omission is the words that are in the reference text but not in the speech.", + "insertion": "Insertion is the words that are in the speech but not in the reference text.", + "misspronunciation": "The pronunciation of the word is not correct.", + "unexpectedBreak": "There is a break between two words when there is no punctuation between them.", + "missingBreak": "There is no break between two words when there is a punctuation between them.", + "monotone": "The word is monotone." + } + } + }, + "sidebar": { + "home": "Home", + "audios": "Audios", + "videos": "Videos", + "stories": "Stories", + "books": "Books", + "vocabulary": "Vocabulary", + "library": "Library", + "practice": "Practice", + "reading": "Reading", + "aiAssistant": "AI Assistant", + "aiCoaches": "AI Coaches", + "translator": "Translator", + "mine": "Mine", + "preferences": "Preferences", + "profile": "My Profile" + }, + "form": { + "lengthMustBeAtLeast": "{{field}} must be at least {{length}} characters", + "lengthMustBeLessThan": "{{field}} must be less than {{length}} characters" + }, + "today": "today", + "yesterday": "yesterday", + "play": "play", + "pause": "pause", + "loop": "loop", + "stopLoop": "stop loop", + "playbackSpeed": "playback speed", + "zoomIn": "zoom in", + "zoomOut": "zoom out", + "zoomToFit": "zoom to fit", + "autoCenter": "auto center", + "inlineCaption": "inline caption", + "autoScroll": "auto scroll", + "detail": "detail", + "remove": "remove", + "loadMore": "Load more", + "databaseError": "Failed to connect to database {{url}}", + "somethingWentWrong": "Something went wrong", + "actions": "actions", + "info": "Info", + "success": "Success", + "warning": "Warning", + "error": "Error", + "errors": "Errors", + "cancel": "Cancel", + "confirm": "Confirm", + "continue": "continue", + "save": "Save", + "delete": "Delete", + "edit": "Edit", + "retry": "Retry", + "failedToLogin": "Failed to login", + "invalidRedirectUrl": "Invalid redirect url", + "transcribe": "Transcribe", + "unableToSetLibraryPath": "Unable to set library path to {{path}}", + "nthStep": "{{current}}/{{totalSteps}} Step", + "open": "Open", + "select": "Select", + "libraryPath": "Library Path", + "login": "Login", + "loginBeforeYouStart": "Login before you start", + "loginSuccess": "Login Success", + "whereYourResourcesAreStored": "Where your resources are stored", + "AIModel": "AI Model", + "chooseAIModelToDownload": "Choose AI Model to download", + "ffmpegCheck": "FFmpeg Check", + "checkIfFfmpegIsInstalled": "Check if FFmpeg is installed", + "ffmpegInstalled": "FFmpeg is installed", + "ffmpegNotInstalled": "FFmpeg is not installed.", + "downloadFfmpeg": "Download FFmpeg", + "youAreReadyToGo": "You are ready to go", + "welcomeBack": "Welcome back! {{name}}", + "download": "Download", + "chooseAIModelDependingOnYourHardware": "Choose AI Model depending on your hardware", + "areYouSureToDownload": "Are you sure to download {{name}}?", + "yourModelsWillBeDownloadedTo": "Your models will be downloaded to {{path}}", + "logout": "Logout", + "logoutConfirmation": "Are you sure you want to logout?", + "reset": "Reset", + "resetAll": "Reset All", + "resetAllConfirmation": "It will remove all of your personal data, are you sure?", + "logoutAndRemoveAllPersonalData": "Logout and remove all personal data", + "about": "About", + "currentVersion": "Current version", + "checkUpdate": "Check update", + "alreadyLatestVersion": "Already latest version", + "initializingApp": "Initializing APP", + "welcomeTo": "Welcome to", + "startToUse": "Start", + "goBack": "Go back", + "nextStep": "Next", + "previousStep": "Previous", + "finish": "Finish", + "notReadyYet": "Not ready yet", + "commingSoon": "Comming soon", + "pageNotFound": "Page not found", + "audio": "audio", + "video": "video", + "text": "text", + "addRecourse": "add resource", + "addResourseFromUrlOrLocal": "add resource from url or local", + "editRecourse": "edit resource", + "deleteRecourse": "delete resource", + "deleteRecourseConfirmation": "Are you sure to delete {{name}}?", + "transcribeAudioConfirmation": "It will remove the old transcription. Are you sure to transcribe {{name}}", + "transcribeVideoConfirmation": "It will remove the old transcription. Are you sure to transcribe {{name}}", + "localFile": "local file", + "resourcesYouAddedRecently": "resources you added recently", + "recentlyAdded": "recently added", + "recommended": "recommended", + "resourcesRecommendedByEnjoy": "resources recommended by Enjoy Bot", + "fromCommunity": "from commnuity", + "videoResources": "video resources", + "audioResources": "audio resources", + "seeMore": "see more", + "resourcesFromTheCommunity": "resources from the community", + "noResourcesFound": "no resources found", + "allResources": "all resources", + "playbackRate": "playback rate", + "transcription": "transcription", + "regenerate": "regenerate", + "holdAndSpeak": "Hold and speak", + "releaseToStop": "Release to stop", + "deleteRecording": "delete recording", + "deleteRecordingConfirmation": "Are you sure to delete this recording?", + "myRecordings": "my recordings", + "lastYear": "last year", + "less": "less", + "more": "more", + "total": "total", + "totalRecordingsIn": "{{total}} recordings in {{duration}}", + "totalRecordings": "{{total}} recordings", + "totalDuration": "duration {{duration}}", + "recordingActivity": "recording activity", + "recordingDetail": "Recording detail", + "noRecordingActivities": "no recording activities", + "basicSettings": "basic", + "advancedSettings": "advanced", + "sttAiModel": "STT AI model", + "relaunchIsNeededAfterChanged": "Relaunch is needed after changed", + "openaiKeySaved": "OpenAI key saved", + "openaiKeyRequired": "OpenAI key required", + "newConversation": "New conversation", + "startConversation": "Start conversation", + "editConversation": "Edit conversation", + "deleteConversation": "Delete conversation", + "deleteConversationConfirmation": "Are you sure to delete this conversation inclcuding all messages?", + "translation": "Translation", + "pressEnterToSend": "Press enter to send", + "send": "Send", + "sending": "Sending", + "sent": "Sent", + "copy": "Copy", + "copyText": "Copy text", + "resend": "Resend", + "anotherRequestIsPending": "Another request is pending", + "selectScenario": "Select scenario", + "selectAiEngine": "Select AI engine", + "selectAiModel": "Select AI model", + "youNeedToSetupApiKeyBeforeUsingOpenAI": "You need to setup API key before using OpenAI", + "ensureYouHaveOllamaRunningLocallyAndHasAtLeastOneModel": "Ensure you have Ollama running locally and has at least one model", + "creatingSpeech": "Speech is creating", + "textToSpeech": "Convert text to speech", + "shadowing": "Shadowing", + "shadowingAudio": "Shadowing audio", + "shadowingVideo": "Shadowing video", + "shadowingExercise": "Shadowing exercise", + "addingResource": "Adding resource", + "pronunciationAssessment": "Pronunciation assessment", + "score": "score", + "inputUrlToStartReading": "Input url to start reading", + "read": "read", + "add_story": "add story", + "context": "context", + "keyVocabulary": "key vocabulary", + "addedStories": "added stories", + "addedAudios": "added audios", + "addedVideos": "added videos", + "frontSide": "front side", + "backSide": "back side", + "aiExtractVocabulary": "AI extract vocabulary", + "toggleReadable": "Toggle readable", + "lookingUp": "Looking up", + "thereAreLookupsPending": "There are {{count}} lookups pending", + "noRecordsFound": "No records found", + "pleaseTryLater": "Please try later", + "author": "author", + "narrator": "narrator", + "downloadSample": "Download sample", + "buy": "Buy", + "from": "from", + "presenter": "presenter", + "downloadAudio": "Download audio", + "downloadVideo": "Download video", + "recordTooShort": "Record too short" +} diff --git a/enjoy/src/i18n/zh-CN.json b/enjoy/src/i18n/zh-CN.json new file mode 100644 index 00000000..86e6bc6e --- /dev/null +++ b/enjoy/src/i18n/zh-CN.json @@ -0,0 +1,316 @@ +{ + "models": { + "user": { + "id": "ID", + "name": "用户名" + }, + "audio": { + "name": "名称", + "namePlaceholder": "音频名称", + "type": "类型", + "description": "描述", + "descriptionPlaceholder": "音频描述", + "format": "格式", + "duration": "时长", + "size": "大小", + "source": "来源", + "createdAt": "创建时间", + "recordingsCount": "练习次数", + "recordingsDuration": "练习时长", + "isTranscribed": "语音文本", + "added": "成功添加音频", + "removed": "成功删除音频", + "notFound": "未找到音频", + "failedToAdd": "添加音频失败, {{error}}", + "fileNotFound": "无法访问文件 {{file}}", + "fileNotSupported": "文件不支持 {{file}}", + "failedToCopyFile": "无法复制文件 {{file}}", + "failedToDownloadFile": "无法下载文件 {{file}}", + "transcriptionFinished": "语音转文本完成" + }, + "video": { + "name": "名称", + "namePlaceholder": "视频名称", + "type": "类型", + "description": "描述", + "descriptionPlaceholder": "视频描述", + "format": "格式", + "duration": "时长", + "size": "大小", + "source": "来源", + "createdAt": "创建时间", + "recordingsCount": "练习次数", + "recordingsDuration": "练习时长", + "isTranscribed": "语音文本", + "added": "成功添加视频", + "removed": "成功删除视频", + "notFound": "未找到视频", + "failedToAdd": "添加视频失败, {{error}}", + "fileNotFound": "无法访问文件 {{file}}", + "fileNotSupported": "文件不支持 {{file}}", + "failedToCopyFile": "无法复制文件 {{file}}", + "failedToDownloadFile": "无法下载文件 {{file}}", + "transcriptionFinished": "语音转文本完成" + }, + "recording": { + "segmentIndex": "原文序号", + "segmentText": "原文", + "duration": "时长", + "durationTooShort": "录音时长太短", + "failedToSave": "保存录音失败", + "notFound": "未找到录音" + }, + "conversation": { + "name": "对话标题", + "engine": "AI 引擎", + "baseUrl": "请求地址", + "configuration": "AI 配置", + "model": "AI 模型", + "roleDefinition": "角色定义", + "temperature": "随机性 (temperature)", + "temperatureDescription": "值越高,生成的文本越具创造性,反之则越稳定", + "maxTokens": "单次回复限制", + "maxTokensDescription": "单次交互消耗的最大 Token 数,-1 表示无限制", + "presencePenalty": "存在惩罚 (presence_penalty)", + "presencePenaltyDescription": "-2.0 ~ 2.0, 值越大,越有可能扩展到新的话题", + "frequencyPenalty": "频率惩罚(frequency_penalty)", + "frequencyPenaltyDescription": "-2.0 ~ 2.0, 值越大,越有可能降低重复性", + "historyBufferSize": "上下文消息数量", + "historyBufferSizeDescription": "上下文越多,生成的文本越具连贯性,消耗的资源也越多", + "numberOfChoices": "生成版本数量", + "numberOfChoicesDescription": "大于 1 时,将每次生成多版本的文本", + "ttsEngine": "TTS 引擎", + "ttsModel": "TTS 模型", + "ttsVoice": "TTS 声音", + "notFound": "未找到对话", + "contentRequired": "对话内容不能为空" + }, + "pronunciationAssessment": { + "pronunciationScore": "发音得分", + "fluencyScore": "流利度得分", + "completenessScore": "完整度得分", + "accuracyScore": "准确度得分", + "prosodyScore": "韵律得分", + "grammarScore": "语法得分", + "vocabularyScore": "词汇得分", + "topicScore": "主题得分", + "errors": { + "omission": "遗漏", + "insertion": "多余", + "misspronunciation": "发音错误", + "unexpectedBreak": "意外停顿", + "missingBreak": "缺少停顿", + "monotone": "单调" + }, + "explainations": { + "pronunciationScore": "表示给定语音发音质量的总体分数。它是从 准确度、流利度、完整度 按权重聚合的。", + "accuracyScore": "语音的发音准确性。准确性表示音素与母语说话人的发音的匹配程度。字词和全文的准确性得分是由音素级的准确度得分汇总而来。", + "fluencyScore": "给定语音的流畅性。流畅性表示语音与母语说话人在单词间的停顿上有多接近。", + "completenessScore": "语音的完整性,按发音单词与输入引用文本的比率计算。", + "prosodyScore": "给定语音的韵律。韵律指示给定语音的性质,包括重音、语调、语速和节奏。", + "omission": "原文中有的字词,但是在语音中没有发音。", + "insertion": "语音中有的字词,但是在原文中没有。", + "misspronunciation": "说得不正确的字词", + "unexpectedBreak": "同一句子中的单词之间未正确停顿", + "missingBreak": "当两个单词之前存在标点符号时,单词之间没有停顿", + "monotone": "这些单词以平淡且不兴奋的语调发音,没有任何节奏或表达" + } + } + }, + "sidebar": { + "home": "主页", + "audios": "音频", + "videos": "视频", + "stories": "文章", + "books": "电子书", + "vocabulary": "生词本", + "library": "资料库", + "practice": "练习记录", + "reading": "阅读", + "aiAssistant": "智能助手", + "aiCoaches": "AI 教练", + "translator": "翻译助手", + "mine": "我的", + "preferences": "软件设置", + "profile": "个人主页" + }, + "form": { + "lengthMustBeAtLeast": "{{field}} 长度不可超过 {{length}} 个字符", + "lengthMustBeLessThan": "{{field}} 长度必须少于 {{length}} 个字符" + }, + "today": "今天", + "yesterday": "昨天", + "play": "播放", + "pause": "暂停", + "loop": "循环", + "stopLoop": "停止循环", + "playbackSpeed": "播放速度", + "zoomIn": "放大", + "zoomOut": "缩小", + "zoomToFit": "适应窗口", + "autoCenter": "自动居中", + "inlineCaption": "内联字幕", + "autoScroll": "自动滚动", + "detail": "详情", + "remove": "删除", + "loadMore": "加载更多", + "databaseError": "数据库错误 {{url}}", + "somethingWentWrong": "出错了", + "actions": "操作", + "info": "通知", + "success": "操作成功", + "warning": "警告", + "error": "错误", + "errors": "错误", + "cancel": "取消", + "confirm": "确认", + "continue": "继续", + "save": "保存", + "edit": "修改", + "retry": "重试", + "failedToLogin": "登录失败", + "invalidRedirectUrl": "无效的重定向 URL", + "delete": "删除", + "transcribe": "语音转文本", + "unableToSetLibraryPath": "无法设置资源库保存路径 {{path}}", + "nthStep": "第 {{current}}/{{totalSteps}} 步", + "open": "打开", + "select": "选择", + "libraryPath": "资源库保存路径", + "login": "登录", + "loginBeforeYouStart": "登录后开始使用", + "loginSuccess": "登录成功", + "whereYourResourcesAreStored": "选择资源库保存路径", + "AIModel": "AI 模型", + "chooseAIModelToDownload": "选择 AI 模型下载", + "ffmpegCheck": "FFmpeg 检查", + "checkIfFfmpegIsInstalled": "检查 FFmpeg 是否已正确安装", + "ffmpegInstalled": "FFmpeg 已经安装", + "ffmpegNotInstalled": "FFmpeg 未安装,软件部分功能依赖于 FFmpeg。", + "downloadFfmpeg": "下载 FFmpeg", + "youAreReadyToGo": "您已准备就绪", + "welcomeBack": "欢迎回来, {{name}}", + "download": "下载", + "chooseAIModelDependingOnYourHardware": "根据您的硬件选择合适的 AI 模型", + "areYouSureToDownload": "您确定要下载 {{name}} 吗?", + "yourModelsWillBeDownloadedTo": "您的模型将下载到目录 {{path}}", + "logout": "退出登录", + "logoutConfirmation": "您确定要退出登录吗?", + "reset": "重置", + "resetAll": "重置所有", + "resetAllConfirmation": "这将删除您的所有个人数据, 您确定要重置吗?", + "logoutAndRemoveAllPersonalData": "退出登录并删除所有个人数据", + "about": "关于", + "currentVersion": "当前版本", + "checkUpdate": "检查更新", + "alreadyLatestVersion": "已经是最新版本", + "initializingApp": "正在初始化应用", + "welcomeTo": "欢迎来到", + "startToUse": "开始使用", + "goBack": "返回", + "nextStep": "下一步", + "previousStep": "上一步", + "finish": "完成", + "notReadyYet": "还没准备好", + "commingSoon": "敬请期待", + "pageNotFound": "页面不存在", + "audio": "音频", + "video": "视频", + "text": "文本", + "addRecourse": "添加资源", + "addResourseFromUrlOrLocal": "添加资源, 可以是 URL 或本地文件", + "editRecourse": "编辑资源", + "deleteRecourse": "删除资源", + "deleteRecourseConfirmation": "您确定要删除资源 {{name}} 吗?", + "transcribeAudioConfirmation": "这将删除原来的语音文本,您确定要重新对 {{name}} 进行语音转文本吗?", + "transcribeVideoConfirmation": "这将删除原来的语音文本,您确定要重新对 {{name}} 进行语音转文本吗?", + "localFile": "本地文件", + "recentlyAdded": "最近添加", + "resourcesYouAddedRecently": "最近添加的资源", + "recommended": "每日推荐", + "resourcesRecommendedByEnjoy": "Enjoy Bot 推荐的资源", + "fromCommunity": "来自社区", + "videoResources": "视频资源", + "audioResources": "音频资源", + "seeMore": "查看更多", + "resourcesFromTheCommunity": "来自社区的资源", + "noResourcesFound": "没有找到资源", + "allResources": "所有资源", + "playbackRate": "播放速度", + "transcription": "语音文本", + "regenerate": "重新生成", + "holdAndSpeak": "按住并说话", + "releaseToStop": "松开停止", + "deleteRecording": "删除录音", + "deleteRecordingConfirmation": "您确定要删除录音吗?", + "myRecordings": "我的练习", + "lastYear": "过去一年", + "less": "更少", + "more": "更多", + "total": "全部", + "totalRecordingsIn": " 在 {{duration}} 共 {{total}} 次练习", + "totalRecordings": "{{total}} 次练习", + "totalDuration": "总时长 {{duration}}", + "recordingActivity": "练习活动", + "recordingDetail": "录音详情", + "noRecordingActivities": "没有练习活动", + "basicSettings": "基本设置", + "advancedSettings": "高级设置", + "sttAiModel": "语音转文本 AI 模型", + "relaunchIsNeededAfterChanged": "更改后需要重新启动", + "openaiKeySaved": "OpenAI 密钥已保存", + "openaiKeyRequired": "未提供 OpenAI 密钥", + "newConversation": "新对话", + "startConversation": "开始对话", + "editConversation": "编辑对话", + "deleteConversation": "删除对话", + "deleteConversationConfirmation": "您确定要删除此对话,以及对话中的所有消息吗?", + "translation": "翻译", + "pressEnterToSend": "按 Enter 发送", + "send": "发送", + "sending": "正在发送", + "sent": "发送成功", + "copy": "复制", + "copyText": "复制文本", + "resend": "重新发送", + "anotherRequestIsPending": "另一个请求正在等待", + "selectScenario": "选择场景", + "selectAiEngine": "选择 AI 引擎", + "selectAiModel": "选择 AI 模型", + "youNeedToSetupApiKeyBeforeUsingOpenAI": "在使用 OpenAI 之前您需要设置 API 密钥", + "ensureYouHaveOllamaRunningLocallyAndHasAtLeastOneModel": "确保您已经在本地运行 Ollama 并且至少有一个模型", + "creatingSpeech": "正在生成语音", + "textToSpeech": "文字转语音", + "shadowing": "跟读", + "shadowingAudio": "跟读音频", + "shadowingVideo": "跟读视频", + "shadowingExercise": "跟读训练", + "addingResource": "正在添加资源", + "pronunciationAssessment": "发音评估", + "score": "得分", + "inputUrlToStartReading": "输入 URL 开始阅读", + "read": "阅读", + "add_story": "添加文章", + "context": "原文", + "keyVocabulary": "关键词汇", + "addedStories": "添加的文章", + "addedAudios": "添加的音频", + "addedVideos": "添加的视频", + "frontSide": "正面", + "backSide": "反面", + "aiExtractVocabulary": "AI 提取生词", + "toggleReadable": "切换阅读模式", + "lookingUp": "正在查询", + "thereAreLookupsPending": "有{{count}}个单词正在查询", + "noRecordsFound": "没有找到记录", + "pleaseTryLater": "请稍后再试", + "author": "作者", + "narrator": "朗读者", + "downloadSample": "下载试听", + "buy": "购买", + "from": "来自", + "presenter": "讲者", + "downloadAudio": "下载音频", + "downloadVideo": "下载视频", + "recordTooShort": "录音时长太短" +} diff --git a/enjoy/src/index.css b/enjoy/src/index.css new file mode 100644 index 00000000..f1bc9255 --- /dev/null +++ b/enjoy/src/index.css @@ -0,0 +1,85 @@ +@import "@vidstack/react/player/styles/base.css"; +@import "@vidstack/react/player/styles/default/theme.css"; +@import "@vidstack/react/player/styles/default/layouts/audio.css"; +@import "@vidstack/react/player/styles/default/layouts/video.css"; + +@tailwind base; +@tailwind components; +@tailwind utilities; + +@layer base { + :root { + --background: 0 0% 100%; + --foreground: 240 10% 3.9%; + + --card: 0 0% 100%; + --card-foreground: 240 10% 3.9%; + + --popover: 0 0% 100%; + --popover-foreground: 240 10% 3.9%; + + --primary: 240 5.9% 10%; + --primary-foreground: 0 0% 98%; + + --secondary: 240 4.8% 95.9%; + --secondary-foreground: 240 5.9% 10%; + + --muted: 240 4.8% 95.9%; + --muted-foreground: 240 3.8% 46.1%; + + --accent: 240 4.8% 95.9%; + --accent-foreground: 240 5.9% 10%; + + --destructive: 0 84.2% 60.2%; + --destructive-foreground: 0 0% 98%; + + --border: 240 5.9% 90%; + --input: 240 5.9% 90%; + --ring: 240 10% 3.9%; + + --radius: 0.5rem; + } + + .dark { + --background: 240 10% 3.9%; + --foreground: 0 0% 98%; + + --card: 240 10% 3.9%; + --card-foreground: 0 0% 98%; + + --popover: 240 10% 3.9%; + --popover-foreground: 0 0% 98%; + + --primary: 0 0% 98%; + --primary-foreground: 240 5.9% 10%; + + --secondary: 240 3.7% 15.9%; + --secondary-foreground: 0 0% 98%; + + --muted: 240 3.7% 15.9%; + --muted-foreground: 240 5% 64.9%; + + --accent: 240 3.7% 15.9%; + --accent-foreground: 0 0% 98%; + + --destructive: 0 62.8% 30.6%; + --destructive-foreground: 0 0% 98%; + + --border: 240 3.7% 15.9%; + --input: 240 3.7% 15.9%; + --ring: 240 4.9% 83.9%; + } +} + +@layer base { + * { + @apply border-border; + } + body { + @apply bg-background text-foreground; + } +} + +body { + user-select: none; +} diff --git a/enjoy/src/main.ts b/enjoy/src/main.ts new file mode 100644 index 00000000..db8ef45b --- /dev/null +++ b/enjoy/src/main.ts @@ -0,0 +1,73 @@ +import { app, BrowserWindow, protocol, net } from "electron"; +import path from "path"; +import settings from "@main/settings"; +import "@main/i18n"; +import mainWindow from "@main/window"; +import crypto from "crypto"; +import log from "electron-log/main"; + +log.transports.file.level = "info"; +log.transports.file.resolvePathFn = () => + path.join(settings.libraryPath(), "logs", "main.log"); +log.errorHandler.startCatching(); + +// Fix "getRandomValues() not supported" +global.crypto = crypto; + +// Handle creating/removing shortcuts on Windows when installing/uninstalling. +if (require("electron-squirrel-startup")) { + app.quit(); +} + +protocol.registerSchemesAsPrivileged([ + { + scheme: "enjoy", + privileges: { + standard: true, + secure: true, + bypassCSP: true, + allowServiceWorkers: true, + supportFetchAPI: true, + stream: true, + codeCache: true, + corsEnabled: true, + }, + }, +]); + +// This method will be called when Electron has finished +// initialization and is ready to create browser windows. +// Some APIs can only be used after this event occurs. +app.on("ready", async () => { + protocol.handle("enjoy", (request) => { + let url = request.url.replace("enjoy://", ""); + if (url.startsWith("library")) { + url = url.replace("library/", ""); + url = path.join(settings.userDataPath(), url); + } + + return net.fetch(`file:///${url}`); + }); + + mainWindow.init(); +}); + +// Quit when all windows are closed, except on macOS. There, it's common +// for applications and their menu bar to stay active until the user quits +// explicitly with Cmd + Q. +app.on("window-all-closed", () => { + if (process.platform !== "darwin") { + app.quit(); + } +}); + +app.on("activate", () => { + // On OS X it's common to re-create a window in the app when the + // dock icon is clicked and there are no other windows open. + if (BrowserWindow.getAllWindows().length === 0) { + mainWindow.init(); + } +}); + +// In this file you can include the rest of your app's specific main process +// code. You can also put them in separate files and import them here. diff --git a/enjoy/src/main/azure-speech-sdk.ts b/enjoy/src/main/azure-speech-sdk.ts new file mode 100644 index 00000000..60bfc31d --- /dev/null +++ b/enjoy/src/main/azure-speech-sdk.ts @@ -0,0 +1,74 @@ +import * as sdk from "microsoft-cognitiveservices-speech-sdk"; +import fs from "fs-extra"; +import log from "electron-log/main"; + +const logger = log.scope("AZURE"); +export class AzureSpeechSdk { + private config: sdk.SpeechConfig; + + constructor(token: string, region: string) { + this.config = sdk.SpeechConfig.fromAuthorizationToken(token, region); + } + + pronunciationAssessment(params: { + filePath: string; + reference: string; + language?: string; + }): Promise { + const { filePath, reference, language = "en-US" } = params; + + const audioConfig = sdk.AudioConfig.fromWavFileInput( + fs.readFileSync(filePath) + ); + + const pronunciationAssessmentConfig = new sdk.PronunciationAssessmentConfig( + reference, + sdk.PronunciationAssessmentGradingSystem.HundredMark, + sdk.PronunciationAssessmentGranularity.Phoneme, + true + ); + pronunciationAssessmentConfig.phonemeAlphabet = "IPA"; + + // setting the recognition language to English. + this.config.speechRecognitionLanguage = language; + + // create the speech recognizer. + const reco = new sdk.SpeechRecognizer(this.config, audioConfig); + pronunciationAssessmentConfig.applyTo(reco); + + logger.debug("Start pronunciation assessment."); + return new Promise((resolve, reject) => { + reco.recognizeOnceAsync((result) => { + reco.close(); + + switch (result.reason) { + case sdk.ResultReason.RecognizedSpeech: + const pronunciationResult = + sdk.PronunciationAssessmentResult.fromResult(result); + logger.debug( + "Received pronunciation assessment result.", + pronunciationResult.detailResult + ); + resolve(pronunciationResult); + break; + case sdk.ResultReason.NoMatch: + reject(new Error("No speech could be recognized.")); + break; + case sdk.ResultReason.Canceled: + const cancellationDetails = + sdk.CancellationDetails.fromResult(result); + logger.debug( + "CANCELED: Reason=" + + cancellationDetails.reason + + " ErrorDetails=" + + cancellationDetails.errorDetails + ); + reject(new Error(cancellationDetails.errorDetails)); + break; + default: + reject(result); + } + }); + }); + } +} diff --git a/enjoy/src/main/db/create-migration.mjs b/enjoy/src/main/db/create-migration.mjs new file mode 100755 index 00000000..8aba65aa --- /dev/null +++ b/enjoy/src/main/db/create-migration.mjs @@ -0,0 +1,27 @@ +#!/usr/bin/env zx + +const name = process.argv[3]; +const filename = path.resolve( + __dirname, + "migrations", + `${Date.now()}-${process.argv[3]}.js` +); + +const template = ` +const { DataTypes } = require("sequelize"); + +async function up({ context: queryInterface }) { + // code here +} + +async function down({ context: queryInterface }) { + // code here +} + +module.exports = { up, down }; +`; + +await fs.mkdir(path.resolve(__dirname, "migrations"), { recursive: true }); +await fs.writeFile(filename, template.trim()); + +console.log(chalk.green(`Created migration ${name} at ${filename}`)); diff --git a/enjoy/src/main/db/handlers/audios-handler.ts b/enjoy/src/main/db/handlers/audios-handler.ts new file mode 100644 index 00000000..adf08c10 --- /dev/null +++ b/enjoy/src/main/db/handlers/audios-handler.ts @@ -0,0 +1,206 @@ +import { ipcMain, IpcMainEvent } from "electron"; +import { Audio, Transcription } from "@main/db/models"; +import { FindOptions, WhereOptions, Attributes } from "sequelize"; +import downloader from "@main/downloader"; +import log from "electron-log/main"; +import { t } from "i18next"; +import youtubedr from "@main/youtubedr"; + +const logger = log.scope("db/handlers/audios-handler"); + +class AudiosHandler { + private async findAll( + event: IpcMainEvent, + options: FindOptions> + ) { + return Audio.findAll({ + order: [["createdAt", "DESC"]], + include: [ + { + association: "transcription", + model: Transcription, + where: { targetType: "Audio" }, + required: false, + }, + ], + ...options, + }) + .then((audios) => { + if (!audios) { + return []; + } + return audios.map((audio) => audio.toJSON()); + }) + .catch((err) => { + event.sender.send("on-notification", { + type: "error", + message: err.message, + }); + }); + } + + private async findOne( + event: IpcMainEvent, + where: WhereOptions> + ) { + return Audio.findOne({ + where: { + ...where, + }, + }) + .then((audio) => { + if (!audio) { + throw new Error(t("models.audio.notFound")); + } + if (!audio.isSynced) { + audio.sync().catch(() => {}); + } + + return audio.toJSON(); + }) + .catch((err) => { + logger.error(err); + event.sender.send("on-notification", { + type: "error", + message: err.message, + }); + }); + } + + private async transcribe(event: IpcMainEvent, id: string) { + const audio = await Audio.findOne({ + where: { + id, + }, + }); + if (!audio) { + event.sender.send("on-notification", { + type: "error", + message: t("models.audio.notFound"), + }); + } + + audio.transcribe().catch((err) => { + event.sender.send("on-notification", { + type: "error", + message: err.message, + }); + }); + } + + private async create( + event: IpcMainEvent, + source: string, + params: { + name?: string; + coverUrl?: string; + } = {} + ) { + let file = source; + if (source.startsWith("http")) { + try { + if (youtubedr.validateYtURL(source)) { + file = await youtubedr.autoDownload(source); + } else { + file = await downloader.download(source, { + webContents: event.sender, + }); + } + if (!file) throw new Error("Failed to download file"); + } catch (err) { + return event.sender.send("on-notification", { + type: "error", + message: t("models.audio.failedToDownloadFile", { file: source }), + }); + } + } + + return Audio.buildFromLocalFile(file, { + source, + ...params, + }) + .then((audio) => { + return audio.toJSON(); + }) + .catch((err) => { + return event.sender.send("on-notification", { + type: "error", + message: t("models.audio.failedToAdd", { error: err.message }), + }); + }); + } + + private async update( + event: IpcMainEvent, + id: string, + params: Attributes