Using Jsoup in API Automation to Handle HTML Response

In the article REST API Automation we discussed API Automation using Groovy and Spock framework. In order to submit API request and intercept API response we created our API clients using the RestClient library of Groovy. One shortcoming of this library is that it cannot handle API responses delivered in HTML format. So, if we want to handle APIs which provide response in HTML format we need a different library to work with. In this regard Jsoup, an open-source Java library comes to our aid. Using this library we can parse and extract API response body provided in HTML and use necessary data according to our needs in our API automation project.

From Jsoup documentation site:

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

Implementing API Clients:

Before writing our API clients we need to import the necessary jsoup libraries as below:

import org.jsoup.Connection
import org.jsoup.Jsoup

GET Client:

Lets assume we have a direction API which accepts method GET and accepts two query parameters and one header. We can implement our GET API client like below:

when: "we attempt to get the direction between two places"
Connection.Response response = Jsoup.connect(API_PATH)
                    .method(Connection.Method.GET)
                    .header("key", API_KEY)
                    .data("destination", destination)
                    .data("origin", origin)
                    .execute()

POST Client:

As POST API example lets assume a login API that accepts user name, password and one header and this POST API client can be implemented like this:

when: "we attempt to login the registered user"
Connection.Response response = Jsoup.connect(API_PATH)
                    .method(Connection.Method.POST)
                    .data("username", userName)
                    .data("password", password)
                    .header("authCode", authCode)
                    .execute()

Implementation of POST and GET are quite alike. We need to intercept the response in a Connection.Response object and call the API endpoint with jsoup connect method. API media type needs to be provided in method. API body parameters need to be included in method data separately with the parameter name and value. To pass headers there are two options, one is using header method which accepts a single header with its name and value and the other one is the headers method that accepts a Map<String, String> structure containing header data. Lastly, we need to call the execute method to run our jsoup connection.

If we do not want to pass on the content type with our request, then we need to include an additional method before execute method like this: ignoreContentType(true)

Parsing Response:

Parsing with jsoup Connection.Response is pretty straightforward. We can parse HTTP status code, cookies, headers and the HTML response.

HTTP Status Code:

HTTP status code can be invoked by calling the method statusCode.

int statusCode = response.statusCode()

Cookies:

We can extract the cookies from response by calling the cookies method which returns a map.

Map<String, String> cookies = response.cookies()
for (Map.Entry<String, String> entry : cookies.entrySet()) {
      logger.info("Cookie Name : " + entry.getKey() + ", Value : " + entry.getValue())
}

Headers:

Extracting headers is similar to extracting cookies. We need to access the headers method which also returns a map.

Map<String, String> headers = response.headers()
for (Map.Entry<String, String> entry : headers.entrySet()) {
      logger.info("Header Name : " + entry.getKey() + ", Value : " + entry.getValue())
}

Parsing Body:

In order to parse the response body we need to declare an object of Document. This object will allow us to access all the part of an HTML response. We need to import the Document library as below:

import org.jsoup.nodes.Document

Now we can parse the Connection.Response object like this:

Document document = response.parse()

This object will allow us to extract the HTML page title, head or body content by calling default methods like this:

String title = document.title()
String headText = document.head().text()
String bodyText = document.body().text()

The class Document also provides wide range of DOM-like methods to extract different element by id, class, tag or attribute name.

Lets see an example where we will access an HTML element by its id and then access a tag elements inside it. Lets consider the following content in our response HTML body:

<body>
    <div id="suites">
        <p>
            <a href="index.html">Suites</a>
            <a href="output.html">Log Output</a>
        </p>

        <div class="test">
            <span style="float: none" class="successIndicator"
                  title="Tests passed.">&#x2714;</span>
            <a href="suite0_test0_results.html">Form Test</a>
        </div>
    </div>
</body>

We will parse the element with id suites and in this regard we need to invoke the following two classes:

import org.jsoup.nodes.Element
import org.jsoup.select.Elements

Element class will allow us to fetch an element by its id using the method getElementById and Elements class will allow us to fetch all the nested a tag elements using the method getElementsByTag.

Element suites = document.getElementById("suites")
Elements links = suites.getElementsByTag("a")
for (Element link : links) {
     logger.info("href : " + link.attr("href"))
     logger.info("linkText : " + link.text())
}

The previous code snippet will result in:

href: index.html
linkText: Suites
href: output.html
linkText: Log Output
href: suite0_test0_results.html
linkText: Form Test

More custom methods are documented here.

In some cases, HTML body may contain JSON formatted text. To extract the body and parsing the body data in JSON format we can use JsonSlurper library of Groovy.

import groovy.json.JsonSlurper

We need to call the parseText method passing the parsed body text from jsoup connection response. Then we are good to extract the JSON body data using conventional JSON notations.

def jsonSlurper = new JsonSlurper()
def jsonBody = jsonSlurper.parseText(document.body().text())

String userId = jsonBody.userId

Test Assertion:

We will intercept the API response in our when block and validate our expected results with API response in the then block in our groovy specification class. We can complete test case assertion by matching the HTTP status code like this:

then: "we receive a response"
response.statusCode() == 200

If we want to validate our test case against one of the body contents of API response then it can be done like this:

then: "we receive a response"
title.contains("Suites") && links.get(0).text() == "Suites"

Gradle Dependency:

We also discussed about how to add Gradle and Logback to our API Automation project. In this case we can add the jsoup dependency in our build.gradle file under dependency section like this:

implementation 'org.jsoup:jsoup:1.14.3'

P.S: Current version of Jsoup at the time of writing this article is 1.14.3

Bashiul Alam Sabab's Blog