Leiningen: Using goose via a local Maven repository
I’ve been playing around a little bit with goose - a HTML content/article extractor - originally in Java but later in clojure where I needed to work out how to include goose and all its dependencies via Leiningen.
goose isn’t included in a Maven repository so I needed to create a local repository, something which I’ve got stuck on in the past.
Luckily Paul Gross has written a cool blog post explaining how his team got past this problem.
Following the instructions from Paul’s post this is how I got goose playing nicely with clojure:
Inside my clojure project:
/Users/mneedham/github/android/text-extraction $ mkdir maven_repository
I then ran the following command from where I had goose checked out on my machine:
mvn install:install-file -Dfile=target/goose-2.1.6.jar -DartifactId=goose -Dversion=2.1.6 -DgroupId=goose -Dpackaging=jar -DlocalRepositoryPath=/Users/mneedham/github/android/text-extraction/maven_repository -DpomFile=pom.xml
I added the repository and goose dependency to my project.clj file which now looks like this:
(defproject textextraction "0.1.0"
:description "Extract text from urls"
:dependencies [[org.clojure/clojure "1.2.0"],
[org.clojure/clojure-contrib "1.2.0"],
[ring/ring-jetty-adapter "0.3.11"],
[compojure "0.6.4"]
[goose "2.1.6"]]
:dev-dependencies [[swank-clojure "1.2.1"]]
:repositories {"local" ~(str (.toURI (java.io.File. "maven_repository")))}
:main textextraction.main)
I then run:
/Users/mneedham/github/android/text-extraction $ lein run
And goose and all its dependencies are included in the 'lib' directory.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.