Skip to content

Commit

Permalink
readme: add macOS instructions for dependencies section
Browse files Browse the repository at this point in the history
  • Loading branch information
jonathaningram committed Oct 30, 2023
1 parent 07e9902 commit ba2b85a
Showing 1 changed file with 38 additions and 17 deletions.
55 changes: 38 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,55 @@

A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text.

> **Note for returning users:** the Go import path for this package changed to `code.sajari.com/docconv`.
## Installation

If you haven't setup Go before, you first need to [install Go](https://golang.org/doc/install).

To fetch and build the code:

$ go install code.sajari.com/docconv/docd@latest
```console
$ go install code.sajari.com/docconv/docd@latest
```

See `go help install` for details on the installation location of the installed `docd` executable. Make sure that the full path to the executable is in your `PATH` environment variable.

## Dependencies

tidy, wv, popplerutils, unrtf, https://github.com/JalfResi/justext
- tidy
- wv
- popplerutils
- unrtf
- https://github.com/JalfResi/justext

### Debian-based Linux

Example install of dependencies (not all systems):
```console
$ sudo apt-get install poppler-utils wv unrtf tidy
$ go get github.com/JalfResi/justext
```

### macOS

$ sudo apt-get install poppler-utils wv unrtf tidy
$ go get github.com/JalfResi/justext
```console
$ brew install poppler-qt5 wv unrtf tidy-html5
$ go get github.com/JalfResi/justext
```

### Optional dependencies

To add image support to the `docconv` library you first need to [install and build gosseract](https://github.com/otiai10/gosseract/tree/v2.2.4).

Now you can add `-tags ocr` to any `go` command when building/fetching/testing `docconv` to include support for processing images:

$ go get -tags ocr code.sajari.com/docconv/...
```console
$ go get -tags ocr code.sajari.com/docconv/...
```

This may complain on macOS, which you can fix by installing [tesseract](https://tesseract-ocr.github.io) via brew:

$ brew install tesseract
```console
$ brew install tesseract
```

## docd tool

Expand All @@ -55,16 +72,18 @@ The `docd` tool runs as either:

Optionally you can build it yourself:

```
cd docd
docker build -t docd .
```console
$ cd docd
$ docker build -t docd .
```

3. via the command line.

Documents can be sent as an argument, e.g.

$ docd -input document.pdf
```console
$ docd -input document.pdf
```

### Optional flags

Expand All @@ -79,8 +98,10 @@ The `docd` tool runs as either:

### How to start the service

$ # This runs on port 8000
$ docd -addr :8000
```console
$ # This runs on port 8000
$ docd -addr :8000
```

## Example usage (code)

Expand Down Expand Up @@ -135,6 +156,6 @@ func main() {

Alternatively, via a `curl`:

```
curl -s -F input=your-file.pdf http://localhost:8888/convert
```console
$ curl -s -F input=@your-file.pdf http://localhost:8888/convert
```

0 comments on commit ba2b85a

Please sign in to comment.