Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readme: add macOS instructions for dependencies section #151

Merged
merged 1 commit into from
Oct 30, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 38 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,55 @@

A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text.

> **Note for returning users:** the Go import path for this package changed to `code.sajari.com/docconv`.

## Installation

If you haven't setup Go before, you first need to [install Go](https://golang.org/doc/install).

To fetch and build the code:

$ go install code.sajari.com/docconv/docd@latest
```console
$ go install code.sajari.com/docconv/docd@latest
```

See `go help install` for details on the installation location of the installed `docd` executable. Make sure that the full path to the executable is in your `PATH` environment variable.

## Dependencies

tidy, wv, popplerutils, unrtf, https://github.com/JalfResi/justext
- tidy
- wv
- popplerutils
- unrtf
- https://github.com/JalfResi/justext

### Debian-based Linux

Example install of dependencies (not all systems):
```console
$ sudo apt-get install poppler-utils wv unrtf tidy
$ go get github.com/JalfResi/justext
```

### macOS

$ sudo apt-get install poppler-utils wv unrtf tidy
$ go get github.com/JalfResi/justext
```console
$ brew install poppler-qt5 wv unrtf tidy-html5
$ go get github.com/JalfResi/justext
```

### Optional dependencies

To add image support to the `docconv` library you first need to [install and build gosseract](https://github.com/otiai10/gosseract/tree/v2.2.4).

Now you can add `-tags ocr` to any `go` command when building/fetching/testing `docconv` to include support for processing images:

$ go get -tags ocr code.sajari.com/docconv/...
```console
$ go get -tags ocr code.sajari.com/docconv/...
```

This may complain on macOS, which you can fix by installing [tesseract](https://tesseract-ocr.github.io) via brew:

$ brew install tesseract
```console
$ brew install tesseract
```

## docd tool

Expand All @@ -55,16 +72,18 @@ The `docd` tool runs as either:

Optionally you can build it yourself:

```
cd docd
docker build -t docd .
```console
$ cd docd
$ docker build -t docd .
```

3. via the command line.

Documents can be sent as an argument, e.g.

$ docd -input document.pdf
```console
$ docd -input document.pdf
```

### Optional flags

Expand All @@ -79,8 +98,10 @@ The `docd` tool runs as either:

### How to start the service

$ # This runs on port 8000
$ docd -addr :8000
```console
$ # This runs on port 8000
$ docd -addr :8000
```

## Example usage (code)

Expand Down Expand Up @@ -135,6 +156,6 @@ func main() {

Alternatively, via a `curl`:

```
curl -s -F input=your-file.pdf http://localhost:8888/convert
```console
$ curl -s -F input=@your-file.pdf http://localhost:8888/convert
```
Loading