Elasticsearch: Importing data into App Search
For a side project that I’m working on I wanted to create a small React application that can query data stored in Elasticsearch, and most of the tutorials I found suggested using a tool called Elastic App Search.
I’d not heard of App Search before, and it took me a while to figure out that it’s the mid level product in between Elasticsearch Service and Elastic Site Search Service, as described on elastic.co/cloud
Launching Elastic App Search locally
Now that we’ve figured that out we’re going to setup a local running App Search server and import some data into it. I found a Docker compose file on the Okode blog that I adapted to the following:
docker-compose.yml
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.4.2
environment:
- "node.name=es-node"
- "discovery.type=single-node"
- "cluster.name=app-search-docker-cluster"
- "bootstrap.memory_lock=true"
- "ES_JAVA_OPTS=-Xms512m -Xmx2048m"
ulimits:
memlock:
soft: -1
hard: -1
ports:
- 9200:9200
- 9300:9300
appsearch:
image: docker.elastic.co/app-search/app-search:7.4.2
depends_on:
- elasticsearch
environment:
- "elasticsearch.host=http://elasticsearch:9200"
- "allow_es_settings_modification=true"
- "JAVA_OPTS=-Xmx2048m"
ports:
- 3002:3002
We can run the following command to launch AppSearch:
docker-compose up
Once that command has run App Search should be running at http://localhost:3002/. If we navigate to that URL in our web browser, we’ll see the following screen:
We need to create an engine
, which is App Search’s name for an index
.
Let’s create one called meals
, as in the Okode tutorial mentioned earlier.
Once we’ve done that we’ll see the following screen, which has instructions for importing data into our engine:
But we’re not going to use any of these approaches!
Installing the Python elastic-app-search library
Instead we’ll use the Python elastic-app-search library to import data into AppSearch. We’ll install the library using Pipenv via the following Pipfile:
Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
elastic-app-search = "*"
requests = "*"
stringcase = "*"
[requires]
python_version = "3.7"
We can set everything up by running the following commands:
pipenv shell
pipenv install
Once we’ve run those commands, we can check that the library is installed by executing the following command:
pipenv graph
If we run that we’ll see the following output:
elastic-app-search==7.4.0
- PyJWT [required: Any, installed: 1.7.1]
- requests [required: Any, installed: 2.22.0]
- certifi [required: >=2017.4.17, installed: 2019.9.11]
- chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4]
- idna [required: >=2.5,<2.9, installed: 2.8]
- urllib3 [required: >=1.21.1,<1.26,!=1.25.1,!=1.25.0, installed: 1.25.7]
stringcase==1.2.0
Importing data
We can now write a Python script to import some of the documents from themealdb.com:
from elastic_app_search import Client
import requests as r
engine_name = 'meals'
api_key = "private-kwicp7mhwssdxv54as9buzen"
client = Client(
api_key=api_key,
base_endpoint='localhost:3002/api/as/v1',
use_https=False
)
response = r.get("https://www.themealdb.com/api/json/v1/1/search.php?f=a").json()
documents = []
for entry in response["meals"]:
documents.append(entry)
if len(documents) % 50 == 0:
res = client.index_documents(engine_name, documents)
print(res)
documents = []
res = client.index_documents(engine_name, documents)
print(res)
We get the api_key
via the Credentials
menu item:
If we execute this script we’ll see the following output:
[{'id': None, 'errors': ['Fields can only contain lowercase letters, numbers, and underscores: idMeal.', 'Fields can only contain lowercase letters, numbers, and underscores: strMeal.', 'Fields can only contain lowercase letters, numbers, and underscores: strDrinkAlternate.', 'Fields can only contain lowercase letters, numbers, and underscores: strCategory.', 'Fields can only contain lowercase letters, numbers, and underscores: strArea.', 'Fields can only contain lowercase letters, numbers, and underscores: strInstructions.', 'Fields can only contain lowercase letters, numbers, and underscores: strMealThumb.',
...
]}]
We’re not allowed to have fields that contain uppercase letters, so we’ll need to fix that. We can use the stringcase library to fix this. The following script does this:
from elastic_app_search import Client
import requests as r
import stringcase
engine_name = 'meals'
api_key = "private-kwicp7mhwssdxv54as9buzen"
client = Client(
api_key=api_key,
base_endpoint='localhost:3002/api/as/v1',
use_https=False
)
response = r.get("https://www.themealdb.com/api/json/v1/1/search.php?f=a").json()
documents = []
for entry in response["meals"]:
new_entry = {stringcase.snakecase(key):entry[key] for key in entry}
new_entry["id"] = new_entry["id_meal"]
documents.append(new_entry)
if len(documents) % 50 == 0:
res = client.index_documents(engine_name, documents)
print(res)
documents = []
res = client.index_documents(engine_name, documents)
print(res)
If we execute that query, we’ll see the following output:
[{'id': '52768', 'errors': []}, {'id': '52893', 'errors': []}]
And now let’s navigate to http://localhost:3002/as#/engines/meals/documents to have a look at what we’ve imported:
Success!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.