menu icon

Java for Elasticsearch, episode 1. Querying the cluster

Diving into Elasticsearch from the code side is surprisingly straightforward once you get the hang of it. In the inaugural article of this series, we're excited to guide you through establishing a robust three-node cluster and connecting to it securely using self-signed certificates.

Java for Elasticsearch, episode 1.  Querying the cluster

Elasticsearch is a very powerful search engine and setting up a stack in a development environment can be a little tricky.

In this article and the following, we will progress step by step through the whole process and have a look at many possiblities.

We will talk about various subjects like requesting, cluster administration or vector search.

All the code displayed here is available on Gitlab. The branch matching this article is this one: 01-dev-elastic.

Step 1: Deploying a local Elasticsearch cluster

One of the main characteristics of Elasticsearch is that it can be deployed on a cluster composed of many nodes. It enables several things:

  • Horizontal scalability

  • Specialize some nodes according to the needs

  • Service availability

  • etc.

Communicating with a cluster is done with the http protocol on port 9200.

In order to remain close to production conditions, we will begin by setting up a three nodes cluster, with no specific role.

we will also set up a Kibana so that we can manage the cluster, request or visualize data.

In order to securize communications with the cluster, we will use TLS and self-signed certificates.

Because using unsecured communications in a production environment would be unthinkable.

In order to keep the setup simple, we will use Docker.

Elastic provides a docker-compose file on its website.

The one used here is strongly inspired but with slight modifications.

Warning: In the .env file, the path referenced in the variable DATA_PATH must be accessible to the user running Docker on your system, especially if your system is a Linux one

Once the .env file has been updated to match your need, you can run a

docker compose up -d

When it’s running, you can perform two checks

First, in the commande line, with a curl command

curl -k -X GET "https://localhost:9200/" --user "elastic:elasticpwd"

The response must be like the following

{
"name": "es01",  
  "cluster_name": "adelean-cluster",
  "cluster_uuid": "0ku-2DJrRKObuCUMT4nTfA",
  "version": {
    "number": "8.11.3",
    "build_flavor": "default",
    "build_type": "docker",
    "build_hash": "64cf052f3b56b1fd4449f5454cb88aca7e739d9a",
    "build_date": "2023-12-08T11:33:53.634979452Z",
    "build_snapshot": false,
    "lucene_version": "9.8.0",
    "minimum_wire_compatibility_version": "7.17.0",
    "minimum_index_compatibility_version": "7.0.0"
  },
  "tagline": "You Know, for Search"
}

Then connecting to Kibana with the URL http://localhost:5601 which will display the login page.

Once authenticated with the user “elastic”, the home page will display many options. One of them is trying sample data.

alt text
Page d'accueil de Kibana avec le choix 'Try sample data'

On the data set choice, a “Other sample data sets” menu will allow us to choose “Sample eCommerce orders” by clicking on the “Add data” button.

alt text
Choix des données d'exemple à importer

Step 2: Creating the java project

We will go on a Maven project with Java 21 and Spring Boot 3.2.

To avoid getting distracted by creating a UI, we will utilize unit testing to initiate the requests.

The POM file will look like this

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.0</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.adelean</groupId>
    <artifactId>dev-elastic-01</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>01-dev-elastic</name>
    <description>01-dev-elastic</description>
    <properties>
        <java.version>21</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>co.elastic.clients</groupId>
            <artifactId>elasticsearch-java</artifactId>
            <version>8.11.2</version>
        </dependency>

        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.12.7.1</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

Step 3: Connecting to Elasticsearch

To establish the connection, we must use the java client developped by Elastic.

But, as we are working with a cluser using SSL, we will need to use the certificates?

In the folder matching the PATH_DATA variable in the .env file, we can see that new folders were created.

The certificate we want is in $PATH_DATA/certs/ca/ca.crt

We need its fingerprint and we can get it in two different ways.

The first one is by looking at the logs of a node on the first start.

The second one is by running the following line:

openssl s_client -connect localhost:9200 -servername localhost -showcerts </dev/null 2>/dev/null \
  | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin

In the java project, in src/main/resources, we will add some data to the application.properties file:

elastic.host=localhost  
elastic.port=9200  
elastic.ca.fingerprint=<previously obtained fingerprint>

Once done, we can create a service class which will instantiate the client and use it.

package com.adelean.develastic01.services.supervision;

/* imports*/

@Service("IndicesService")
public class Indices {

    @Value("${elastic.host}")
    private String elasticHost;

    @Value("${elastic.port}")
    private int elasticPort;

    @Value("${elastic.ca.fingerprint}")
    private String fingerPrint;

    @Value("${elastic.user}")
    private String elasticUser;

    @Value("${elastic.password}")
    private String elasticPassword;

    private ElasticsearchClient getClient() {

        SSLContext sslContext = TransportUtils.sslContextFromCaFingerprint(fingerPrint);
        BasicCredentialsProvider credsProv = new BasicCredentialsProvider();
        credsProv.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(elasticUser, elasticPassword));

        RestClient restClient = RestClient
                .builder(new HttpHost(elasticHost, elasticPort, "https"))
                .setHttpClientConfigCallback( ccb -> ccb.setSSLContext(sslContext).setDefaultCredentialsProvider(credsProv))
                .build();

        ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());

        return new ElasticsearchClient(transport);

    }

}

The TransportUtils class is a utility class that allows us to et a SSL context from a certificate fingerprint or from a certificate itself

The first request will allow us to get the list of all the indexes stored in the cluster

public Set<String> listIndices() {

    GetIndexRequest request = new GetIndexRequest.Builder().index("*").build();

    try {
        GetIndexResponse response = getClient().indices().get(request);
        Map<String, IndexState> indices = response.result();

        return indices.keySet();

    } catch (IOException e) {
        throw new RuntimeException(e);
    }

}

Once done, all we have to do is to create a test class to check what this first request is getting as a result.

package com.adelean.develastic01.services.supervision;

/* imports */

@SpringBootTest
class IndicesTest {

    @Autowired
    private Indices indicesService;

    @Test
    void listAllIndices() {

        var result = indicesService.listIndices();

        assertTrue(result.size() > 1);

    }

}

The index list will look like:

  • .internal.alerts-ml.anomaly-detection.alerts-default-000001
  • .internal.alerts-observability.slo.alerts-default-000001
  • kibana_sample_data_ecommerce
  • .internal.alerts-observability.metrics.alerts-default-000001
  • .kibana-observability-ai-assistant-kb-000001
  • .internal.alerts-observability.logs.alerts-default-000001
  • .internal.alerts-observability.uptime.alerts-default-000001
  • .internal.alerts-observability.apm.alerts-default-000001
  • .internal.alerts-stack.alerts-default-000001
  • .internal.alerts-security.alerts-default-000001
  • .internal.alerts-observability.threshold.alerts-default-000001
  • .kibana-observability-ai-assistant-conversations-000001

In this list, we can find kibana_sample_data_ecommerce.

In the Kibana console, which can be accessed trhough “Management” -> “Dev Tools”, the matching request would be:

GET _cat/indices

In the next article, we will see how to enhance this code and run other requests.

Java for Elasticsearch, episode 2. Searching for data

05/01/2025

Refactor the code created previously then search for data based on specific criteria. Use lambda notation when using the toolbox. Store results in specific objects.

Read the article

Return from the DevFest Toulouse conference

19/11/2023

We are back from DevFest Toulouse, an opportunity for us to attend several conferences, train ourselves and share a personalized version of our presentation Cloner ChatGPT with Hugging Face and Elasticsearch.

Read the article

Diving into NLP with the Elastic Stack

01/04/2023

An overview about NLP and a practical guide about how it can be used with the Elastic stack to enhance search capabilities.

Read the article

Scaling an online search engine to thousands of physical stores – ElasticON

10/03/2023

A summary of the talk Scaling an online search engine to thousands of physical stores by Roudy Khoury and Aline Paponaud at ElasticON 2023

Read the article

Question answering,a more human-based approach to our research on all.site.

19/01/2023

Everything about Question-Answering and how to implement it using a flask and elasticsearch.

Read the article

Feedback - Fine-tuning a VOSK model

05/01/2022

all.site is a collaborative search engine. It works like Bing or Google but it has the advantage of being able to go further by indexing for example media content and organizing data from systems like Slack, Confluence or all the information present in a company's intranet.

Read the article

Feedback - Indexing of media file transcripts

17/12/2021

all.site is a collaborative search engine. It works like Bing or Google but it has the advantage of being able to go further by indexing for example media content and organizing data from systems like Slack, Confluence or all the information present in a company's intranet.

Read the article

New Search & Data meetup - E-Commerce Search and Open Source

28/10/2021

The fifth edition of the Search and Data meetup is dedicated to e-commerce search and open source. A nice agenda to mark our return to the Meetup scene

Read the article

Shipping to Synonym Graph in Elasticsearch

21/04/2021

In this article, we explain how we moved from the old Elasticsearch synonym filters to the new Synonym Graph Token Filter.

Read the article

When queries are very verbose

22/02/2021

In this article, we present a simple method to rewrite user queries so that a keyword-based search engine can better understand them. This method is very useful in the context of a voice search or a conversation with a chatbot, context in which user queries are generally more verbose.

Read the article

Enrich the data and rewrite the queries with the Elasticsearch percolator

26/04/2019

This article is a transcript of the lightning talk we presented this week at Haystack - the Search and Relevance Conference. We showed a method allowing to enrich and rewrite user queries using Wikidata and the Elasticsearch percolator.

Read the article

A2 the engine that makes Elasticsearch great

13/06/2018

Elasticsearch is an open technology that allows integrators to build ever more innovative and powerful solutions. Elasticsearch

Read the article