Skip to content

Batch download implementation #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 37 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Feishu2Md
# feishu2md

[![Golang - feishu2md](https://img.shields.io/github/go-mod/go-version/wsine/feishu2md?color=%2376e1fe&logo=go)](https://go.dev/)
[![Unittest](https://github.com/Wsine/feishu2md/actions/workflows/unittest.yaml/badge.svg)](https://github.com/Wsine/feishu2md/actions/workflows/unittest.yaml)
Expand All @@ -20,13 +20,13 @@
配置文件需要填写 APP ID 和 APP SECRET 信息,请参考 [飞书官方文档](https://open.feishu.cn/document/ukTMukTMukTM/ukDNz4SO0MjL5QzM/get-) 获取。推荐设置为

- 进入飞书[开发者后台](https://open.feishu.cn/app)
- 创建企业自建应用,信息随意填写
- 选择测试企业和人员,创建测试企业,绑定应用,切换至测试版本
- (重要)打开权限管理,云文档,开通所有只读权限
- 「查看、评论和导出文档」权限 `docs:doc:readonly`
- 「查看 DocX 文档」权限 `docx:document:readonly`
- 「查看、评论和下载云空间中所有文件」权限 `drive:drive:readonly`
- 「查看和下载云空间中的文件」权限 `drive:file:readonly`
- 创建企业自建应用(个人版),信息随意填写
- (重要)打开权限管理,开通以下必要的权限(可点击以下链接参考 API 调试台->权限配置字段)
- [获取文档基本信息](https://open.feishu.cn/document/server-docs/docs/docs/docx-v1/document/get),「查看新版文档」权限 `docx:document:readonly`
- [获取文档所有块](https://open.feishu.cn/document/server-docs/docs/docs/docx-v1/document/list),「查看新版文档」权限 `docx:document:readonly`
- [下载素材](https://open.feishu.cn/document/server-docs/docs/drive-v1/media/download),「下载云文档中的图片和附件」权限 `docs:document.media:download`
- [获取文件夹中的文件清单](https://open.feishu.cn/document/server-docs/docs/drive-v1/folder/list),「查看、评论、编辑和管理云空间中所有文件」权限 `drive:file:readonly`
- [获取知识空间节点信息](https://open.feishu.cn/document/server-docs/docs/wiki-v2/space-node/get_node),「查看知识库」权限 `wiki:wiki:readonly`
- 打开凭证与基础信息,获取 App ID 和 App Secret

## 如何使用
Expand Down Expand Up @@ -71,6 +71,20 @@
--appId value Set app id for the OPEN API
--appSecret value Set app secret for the OPEN API
--help, -h show help (default: false)

$ feishu2md dl -h
NAME:
feishu2md download - Download feishu/larksuite document to markdown file

USAGE:
feishu2md download [command options] <url>

OPTIONS:
--output value, -o value Specify the output directory for the markdown files (default: "./")
--dump Dump json response of the OPEN API (default: false)
--batch Download all documents under a folder (default: false)
--help, -h show help (default: false)

```

**生成配置文件**
Expand All @@ -81,15 +95,28 @@

更多的配置选项请手动打开配置文件更改。

**下载为 Markdown**
**下载单个文档为 Markdown**

通过 `feishu2md dl <your feishu docx url>` 直接下载,文档链接可以通过 **分享 > 开启链接分享 > 复制链接** 获得。
通过 `feishu2md dl <your feishu docx url>` 直接下载,文档链接可以通过 **分享 > 开启链接分享 > 互联网上获得链接的人可阅读 > 复制链接** 获得。

示例:

```bash
$ feishu2md dl "https://domain.feishu.cn/docx/docxtoken"
```

**批量下载某文件夹内的全部文档为 Markdown**

此功能暂时不支持Docker版本

通过`feishu2md dl --batch <your feishu folder url>` 直接下载,文件夹链接可以通过 **分享 > 开启链接分享 > 互联网上获得链接的人可阅读 > 复制链接** 获得。

示例:

```bash
$ feishu2md dl --batch -o output_directory "https://domain.feishu.cn/drive/folder/foldertoken"
```

</details>

<details>
Expand Down
18 changes: 10 additions & 8 deletions cmd/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@ type ConfigOpts struct {

var configOpts = ConfigOpts{}

func handleConfigCommand(opts *ConfigOpts) error {
func handleConfigCommand() error {
configPath, err := core.GetConfigFilePath()
utils.CheckErr(err)
if err != nil {
return err
}

fmt.Println("Configuration file on: " + configPath)
if _, err := os.Stat(configPath); os.IsNotExist(err) {
config := core.NewConfig(opts.appId, opts.appSecret)
config := core.NewConfig(configOpts.appId, configOpts.appSecret)
if err = config.WriteConfig2File(configPath); err != nil {
return err
}
Expand All @@ -31,13 +33,13 @@ func handleConfigCommand(opts *ConfigOpts) error {
if err != nil {
return err
}
if opts.appId != "" {
config.Feishu.AppId = opts.appId
if configOpts.appId != "" {
config.Feishu.AppId = configOpts.appId
}
if opts.appSecret != "" {
config.Feishu.AppSecret = opts.appSecret
if configOpts.appSecret != "" {
config.Feishu.AppSecret = configOpts.appSecret
}
if opts.appId != "" || opts.appSecret != "" {
if configOpts.appId != "" || configOpts.appSecret != "" {
if err = config.WriteConfig2File(configPath); err != nil {
return err
}
Expand Down
122 changes: 98 additions & 24 deletions cmd/download.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"os"
"path/filepath"
"strings"
"sync"

"github.com/88250/lute"
"github.com/Wsine/feishu2md/core"
Expand All @@ -17,29 +18,20 @@ import (
type DownloadOpts struct {
outputDir string
dump bool
batch bool
}

var downloadOpts = DownloadOpts{}
var dlOpts = DownloadOpts{}
var dlConfig core.Config

func handleDownloadCommand(url string, opts *DownloadOpts) error {
func downloadDocument(client *core.Client, ctx context.Context, url string, opts *DownloadOpts) error {
// Validate the url to download
docType, docToken, err := utils.ValidateDownloadURL(url)
utils.CheckErr(err)
docType, docToken, err := utils.ValidateDocumentURL(url)
if err != nil {
return err
}
fmt.Println("Captured document token:", docToken)

// Load config
configPath, err := core.GetConfigFilePath()
utils.CheckErr(err)
config, err := core.ReadConfigFromFile(configPath)
utils.CheckErr(err)

// Create client with context
ctx := context.WithValue(context.Background(), "output", config.Output)

client := core.NewClient(
config.Feishu.AppId, config.Feishu.AppSecret,
)

// for a wiki page, we need to renew docType and docToken first
if docType == "wiki" {
node, err := client.GetWikiNodeInfo(ctx, docToken)
Expand All @@ -48,24 +40,28 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {
docToken = node.ObjToken
}
if docType == "docs" {
return errors.Errorf("Feishu Docs is no longer supported. Please refer to the Readme/Release for v1_support.")
return errors.Errorf(
`Feishu Docs is no longer supported. ` +
`Please refer to the Readme/Release for v1_support.`)
}

// Process the download
docx, blocks, err := client.GetDocxContent(ctx, docToken)
utils.CheckErr(err)

parser := core.NewParser(ctx)
parser := core.NewParser(dlConfig.Output)

title := docx.Title
markdown := parser.ParseDocxContent(docx, blocks)

if !config.Output.SkipImgDownload {
if !dlConfig.Output.SkipImgDownload {
for _, imgToken := range parser.ImgTokens {
localLink, err := client.DownloadImage(
ctx, imgToken, filepath.Join(opts.outputDir, config.Output.ImageDir),
ctx, imgToken, filepath.Join(opts.outputDir, dlConfig.Output.ImageDir),
)
utils.CheckErr(err)
if utils.CheckErr(err) != nil {
return err
}
markdown = strings.Replace(markdown, imgToken, localLink, 1)
}
}
Expand All @@ -83,7 +79,7 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {
}
}

if opts.dump {
if dlOpts.dump {
jsonName := fmt.Sprintf("%s.json", docToken)
outputPath := filepath.Join(opts.outputDir, jsonName)
data := struct {
Expand All @@ -103,7 +99,7 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {

// Write to markdown file
mdName := fmt.Sprintf("%s.md", docToken)
if config.Output.TitleAsFilename {
if dlConfig.Output.TitleAsFilename {
mdName = fmt.Sprintf("%s.md", title)
}
outputPath := filepath.Join(opts.outputDir, mdName)
Expand All @@ -114,3 +110,81 @@ func handleDownloadCommand(url string, opts *DownloadOpts) error {

return nil
}

func downloadDocuments(client *core.Client, ctx context.Context, url string) error {
// Validate the url to download
folderToken, err := utils.ValidateFolderURL(url)
if err != nil {
return err
}
fmt.Println("Captured folder token:", folderToken)

// Error channel and wait group
errChan := make(chan error)
wg := sync.WaitGroup{}

// Recursively go through the folder and download the documents
var processFolder func(ctx context.Context, folderPath, folderToken string) error
processFolder = func(ctx context.Context, folderPath, folderToken string) error {
files, err := client.GetDriveFolderFileList(ctx, nil, &folderToken)
if err != nil {
return err
}
opts := DownloadOpts{outputDir: folderPath, dump: dlOpts.dump, batch: false}
for _, file := range files {
if file.Type == "folder" {
_folderPath := filepath.Join(folderPath, file.Name)
if err := processFolder(ctx, _folderPath, file.Token); err != nil {
return err
}
} else if file.Type == "docx" {
// concurrently download the document
wg.Add(1)
go func(_url string) {
if err := downloadDocument(client, ctx, _url, &opts); err != nil {
errChan <- err
}
wg.Done()
}(file.URL)
}
}
return nil
}
if err := processFolder(ctx, dlOpts.outputDir, folderToken); err != nil {
return err
}

// Wait for all the downloads to finish
go func() {
wg.Wait()
close(errChan)
}()
for err := range errChan {
return err
}
return nil
}

func handleDownloadCommand(url string) error {
// Load config
configPath, err := core.GetConfigFilePath()
if err != nil {
return err
}
dlConfig, err := core.ReadConfigFromFile(configPath)
if err != nil {
return err
}

// Instantiate the client
client := core.NewClient(
dlConfig.Feishu.AppId, dlConfig.Feishu.AppSecret,
)
ctx := context.Background()

if dlOpts.batch {
return downloadDocuments(client, ctx, url)
}

return downloadDocument(client, ctx, url, &dlOpts)
}
16 changes: 11 additions & 5 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ func main() {
},
},
Action: func(ctx *cli.Context) error {
return handleConfigCommand(&configOpts)
return handleConfigCommand()
},
},
{
Expand All @@ -51,22 +51,28 @@ func main() {
Aliases: []string{"o"},
Value: "./",
Usage: "Specify the output directory for the markdown files",
Destination: &downloadOpts.outputDir,
Destination: &dlOpts.outputDir,
},
&cli.BoolFlag{
Name: "dump",
Value: false,
Usage: "Dump json response of the OPEN API",
Destination: &downloadOpts.dump,
Destination: &dlOpts.dump,
},
&cli.BoolFlag{
Name: "batch",
Value: false,
Usage: "Download all documents under a folder",
Destination: &dlOpts.batch,
},
},
ArgsUsage: "<url>",
Action: func(ctx *cli.Context) error {
if ctx.NArg() == 0 {
return cli.Exit("Please specify the document url", 1)
return cli.Exit("Please specify the document/folder url", 1)
} else {
url := ctx.Args().First()
return handleDownloadCommand(url, &downloadOpts)
return handleDownloadCommand(url)
}
},
},
Expand Down
26 changes: 26 additions & 0 deletions core/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"time"

"github.com/chyroc/lark"
"github.com/chyroc/lark_rate_limiter"
)

type Client struct {
Expand All @@ -21,6 +22,7 @@ func NewClient(appID, appSecret string) *Client {
larkClient: lark.New(
lark.WithAppCredential(appID, appSecret),
lark.WithTimeout(60*time.Second),
lark.WithApiMiddleware(lark_rate_limiter.Wait(5, 5)),
),
}
}
Expand Down Expand Up @@ -104,3 +106,27 @@ func (c *Client) GetWikiNodeInfo(ctx context.Context, token string) (*lark.GetWi
}
return resp.Node, nil
}

func (c *Client) GetDriveFolderFileList(ctx context.Context, pageToken *string, folderToken *string) ([]*lark.GetDriveFileListRespFile, error) {
resp, _, err := c.larkClient.Drive.GetDriveFileList(ctx, &lark.GetDriveFileListReq{
PageSize: nil,
PageToken: pageToken,
FolderToken: folderToken,
})
if err != nil {
return nil, err
}
files := resp.Files
for resp.HasMore {
resp, _, err = c.larkClient.Drive.GetDriveFileList(ctx, &lark.GetDriveFileListReq{
PageSize: nil,
PageToken: &resp.NextPageToken,
FolderToken: folderToken,
})
if err != nil {
return nil, err
}
files = append(files, resp.Files...)
}
return files, nil
}
Loading
Loading