Main menu

Normalized Data - DOH!

Most of us have been there. Working with relational databases, or several different databases, and needing to join data outside of SQL means finding a common key and writing code to iterate through lists of data. While it is not terribly difficult to do, there should be an easier way. Well, let's see how Anypoint Studio and DataWeave can do it.

First, we need a use case, so why don't we start with a simple case. We have two database tables that we need to join. As a side-bar, my actual production use case required us to join data from two SalesForce queries. For this more generic example, we have the following data:

Item Recordset

id name UOM
1001 Apples each
1002 Oranges each
1003 Pears each

Item Attributes

id attribute_name attribute_value
1001 color red
1001 stem_length long
1002 color orange 
1003 color green 
1003 stem_length short 

Now that we have the data source, we need to look at the the target format. We need this data in a single, flat JSON data format that should be:

[ { "itemId": 1001, "name": "Apples", "color": "red", "stem_length": "long" }, { "itemId": 1001, "name": "Apples", "color": "orange" } ]

The DataWeave script is extremely easy, though actually writing it was not as easy as the solution:

    <flow name="test-dataweaveFlow">
        <poll doc:name="Poll">
            <fixed-frequency-scheduler frequency="20000"/>
            <set-payload value="#[[['id': 1, 'name': 'Apples'],['id': 2, 'name': 'Oranges'],['id': 3, 'name': 'Pears']]]" doc:name="Set Payload"/>
        </poll>
        <set-variable variableName="itemAttrs" value="#[[['id': 1, 'attributes': ['color': 'red','stem_length': 'long']],['id': 2, 'attributes': ['color': 'orange']],['id': 3, 'attributes': ['color': 'green', 'stem_length': 'short' ]]]]"/>
        <dw:transform-message doc:name="Transform Message">
            <dw:input-payload mimeType="application/java"/>
            <dw:set-payload><![CDATA[%dw 1.0
                %output application/java
                ---
                payload map ((itemObject, itemIndex) -> {
                    itemId: itemObject.id,
                    name: itemObject.name,
                    (flatten (flowVars.itemAttrs filter $.id == itemObject.id).attributes)
                })
            ]]></dw:set-payload>
        </dw:transform-message>
        <logger message="#[payload]" level="INFO" doc:name="Logger"/>
    </flow>

For this example, we set a poller to generate the payload every twenty seconds. The payload represents the call to the database. To make things interesting, instead of using a regular, flat list of maps, we set a variable called itemAttrs to a more JSON-like (hash map) object representing the item attributes. The magic happens in the DataWeave script. We work with the original payload by calling "payload map". This allows us to "iterate" through the payload, working with individual rows in the database. By using itemObject and itemIndex, we can specifically refer to the item recordset. The first two assignment are easy, they set the itemId and the name to the respective values. The third assignment sets the attributes value to the corresponding attributes of the second list of maps. The filter tells DataWeave to pick only the correct attributes. We then flatten the list to pull the attributes to the top level of the data structure. Finally, we enclose all of it in parentheses in order to allow DataWeave to insert the attributes map into the item map.

Voilà!