Page 1
Standard

An Alternative to JSON-LD?

For the past few days, I have been reading up on the underlying concepts behind the Linked Data movement [Wikipedia]. And I gotta tell ya folks, the more I read about it the more I question whether JSON-LD will ever be adopted as a standard by us (the developers).

In this article I will try to describe what I believe a good Linked Data standard should look like. One that could be useful for ME as a developer and business owner to implement. Please keep in mind that these are just a few thoughts that I have had for a while so don’t take my criticism of JSON-LD as an absolute God given truth.

Before I dive into my solution to this problem, allow me to first state the reasons as to why I believe the JSON-LD format is not attractive enough.

1. Verbose Integration:
The main goal/requirement for JSON-LD was ease of integration. The idea here was that we (the developers) would be able to transform our JSON data into JSON-LD without much effort. And boy did this standard not reach that goal. The standard actually makes it harder for us to migrate our existing APIs into a standardized form. Consider the following:

Our API Endpoint: GET https://someservice.com/event/10105124

Currently produces something like:

{
    "eventID": "10105124",
    "summary": "Lady Gaga Concert",
    "location": "New Orleans Arena, New Orleans, Louisiana, USA",
    "start" : "2015-04-09 12:00"
}

If I wanted to migrate this event data into JSON-LD, I have to restructure my JSON so that it looks like the following:

{
  "@context": {
    "ical": "http://www.w3.org/2002/12/cal/ical#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ical:dtstart": {
      "@type": "xsd:dateTime"
    }
  },
  "ical:summary": "Lady Gaga Concert",
  "ical:location": "New Orleans Arena, New Orleans, Louisiana, USA",
  "ical:dtstart": "2011-04-09T20:00Z"
}

//This sample was taken from: http://json-ld.org/playground/index.html

Now, I don’t know about you but for me this syntax gives me a headache. Not only did we abstract a lot of the information into the @context inner object, but we also changed the naming and the way the columns behave. This is NOT, by any stretch of the imagination, what I would call easy integration. Even with the usage of something like Fractal, we would end up breaking the API clients our users have built. Our only option here is to issue a new version of the API with these changes in place. Even if renaming the columns was not a necessity (which it actually isn’t according to the JSON-LD spec) what will end up happening is that all our API clients AND API documentation needs to be rewritten to describe the changes in the schema.

Edit – Just to clarify: your JSON payload key-value pair do not need to change. Another implementation method would be to have everything related to the schema in the @context object. It is essentially the same thing. The only difference is where the verbosity is expressed – In the actual key or the @context object.

2. Payload:
The verbose nature of JSON-LD makes it extremely unattractive for high payload APIs (I would argue). Just take a look at some of the simple examples here and see how an object containing a few key – value pairs ends up containing more information about the object than the actual content it serves. Now, I am not saying that this is wrong. My concern here is that as your service gets bigger and your endpoints start delivering more complex nested data, this could become a hassle. And yes, I know that nested data is not the best approach. But sometimes it is far better than having to deal with multiple requests. The standard’s syntax should not become a big hurdle and prevent the developer from doing something that he knows is in the best interest of his application.

3. Benefits:
What exactly do I get by having to go through the hassle of implementing JSON-LD? honestly, I cannot think of a single short-term benefit. The benefits of linked data that I can see emerges when the entirety of the network uses the same standard. At that point, it becomes attractive for me to use the same standard in order to be able to take advantage of other providers data-sets with a common structure/format. However, this becomes a case of catch 22. Everyone needs to be connected in order for us to benefit yet no one will connect until everyone else is connected. The only short-term winners in this scenario are search engines. They get to categorize and map your data in relation to the network’s (provided you allow them to crawl it).

I might be wrong about all of this though. My main concern is that it is NOT feasible for most of us and our employers to go through the hassle of implementing this. So, with that in mind, I would like to suggest that we tackle this problem a bit differently.

First off I would start by creating something similar to Packagist.org. But rather than listing packages, we would list namespaces of pre-defined JSON components. This would be in a sense similar to the way JSON-LD currently does things. However, we need to add a twist to it. For the sake of illustration, lets call our JSON component manager for JSONPAC.org.
JSONPAC would start with a bunch of predefined simple components that looks like the following:

std/user -->
    id : STRING | REQ
    email : STRING | REQ
    password : STRING | OPT
    reg_date : DATETIME | OPT
    ...

std/event -->
    id: STRING | REQ
    title : STRING | REQ
    location : STRING | REQ 
    start : DATETIME | REQ
    sponsor : []std/user | OPT

...

The std namespace would have a ton of these starting objects that reflect the basic set of objects any API might return to its user. In order to understand how this differs from the JSON-LD approach, lets look at the following example:

Bob owns an API with the endpoint: GET https://bobservice.com/event/123123. Bob’s response, after implementing this standard, could then look like this:

{
    "@namespace" : "std/event",
    "id": "123123",
    "title": "Lady Gaga Concert",
    "location": "New Orleans Arena, New Orleans, Louisiana, USA",
    "start" : "2015-04-09 12:00"
}

Meanwhile, Alice owns a different API with the following endpoint: GET https://aliceservice.com/event/111222. Alice’s response could then look like this:

{
    "@namespace" : "std/event",
    "id": "111222",
    "title": "John Mayer in RIO",
    "location": "Somewhere in Canada",
    "start" : "2015-04-19 12:00",
    "sponsor" : [
        {
            "id" : 13584,
            "email" : "somedude@asd.com",
            "reg_date": "2011-01-01"
        },
        {
            "id" : 471548,
            "email" : "anotheruser@asd.com"
        }
    ]
}

Notice how Bob’s API has no sponsor field in his response because the sponsor field in JSONPAC std/event is OPTional. Meanwhile, Alice’s API uses an array of sponsors each of which have the @namespace std/user implicitly defined in the JSONPAC schema. If Alice tries to embedd something else into the sponsor field other than what is allowed in the JSONPAC std/event --> sponsor manifest, she should get an error. This way, we enforce the that the integrity of the data goes hand in hand with the standardized schema that is set for us by JSONPAC.

Your question now is: but waaaaaaaaaait a minute. What if Alice wants to return her own type of sponsor that isn’t the same as the []std/user schema? Well my answer to that would be Extend and Override. You simply create your own namespace at JSONPAC and extend the std/event object and override whatever you want. Thus the JSONPAC library would look like the following:

std/user -->
    id : STRING | INT | REQ
    email : STRING | REQ
    password : STRING | OPT
    reg_date : DATETIME | OPT
    ...

std/event -->
    id: STRING | INT | REQ
    title : STRING | REQ
    location : STRING | LONG-LAT | REQ 
    start : DATETIME | REQ
    sponsor : STRING | []std/user | OPT

...

alice/event --> std/event
    sponsor : []alice/sponsor

alice/sponsor -->
   id : STRING | REQ
   name : STRING | REQ

After creating her own namespace at JSONPAC, Alice can now return the following JSON result:

{
    "@namespace" : "alice/event",
    "id": "111222",
    "title": "John Mayer in RIO",
    "location": "Somewhere in Canada",
    "start" : "2015-04-19 12:00",
    "sponsor" : [
        {
            "id" : "userID1",
            "name" : "Dan Gilbert"
        },
        {
            "id" : "userID2",
            "name" : "Dan Gilbert The Second"
        }
    ]
}

As you can see, we extend and override namespacing. This way JSONPAC can handle all linked data relationships in one place. However, there is ONE problem with this approach. What if the developer gives the wrong @namespace or forgets to include it for whatever reason? What happens then? Well, my suggestion would be to write a tiny little test-executable that handles this. Once you are done building your API response you run the tool from your terminal using:

JSONPAC-TOOL validate https://aliceservice.com/event/111222

The tool gets the JSON result from the url, validates it against the provided namespacing from JSONPAC. If all goes well, the tool will display a success message, else the tool will provide the developer with the row/place where the error occurs. The tool should even recursively go through the sponsor[] array and see if each object matches alice/sponsor (the provided namespace) in its content. What this would do is add another layer of testing – namely output testing. If the validation test passes, then you guarantee your data-structure’s integrity and provide a way for all your API users to lookup your objects using JSONPAC. All you need to do in your documentation is write the following: Endpoint X returns alice/event. Notice how this entire testing process needed only 1 line: @namespace : "alice/event".

The front-end developer implementing your API can also download the JSONPAC-TOOL and verify your data’s integrity. This way any developer would know for sure in case an API is reliable or not. In addition, the JSONPAC-TOOL could also implement some cool and much needed functionality such as:

JSONPAC-TOOL extract https://aliceservice.com/event/111222 alice/sponsor

This command could for instance loop through the entire result set from the endpoint and extract all alice/sponsor objects and give them to the user of the tool. I can go on and on about what features could be implemented but the main point is this:

This approach gives us easier integration, centralized management, data-structure integrity testing, output reliability, built in data-extraction capability, extensiblity, and much more. In addition to all of that, the data structures are all linked in one central place with very non-verbose syntax.

Standard

CRON Error Notifications in Real Time

This ScriptCheck package available on Github.

If you have ever worked with CRON jobs that involve getting and parsing data from 3rd party APIs/feeds, you will understand me more than anyone else when I say: handling CRON-related Errors and Exceptions suck. It is inevitable that your script will fail at some point in the future for one of many reason. In my particular case, one of the data providers that I use tends to fail on me. Once my CRON script tries to parse what it perceives to be the retrieved data, an Exception (or an Error) is thrown.

Out of the box LAMP stack offers us two different ways to deal with these unexpected errors:

  1. File Logging: Have PHP log the error into a file that you later on have to check/review manually.
  2. CRON Email: LAMP can also utilize the CRON Email functionality to send you or the server administrator an email regarding the Error incident and what caused it.

Unfortunately, these two solutions are rather limiting. What if you want an external API to be notified upon script failure? or perhaps have a record inserted into a SQL DB table with all the possible Error info? these options are currently not available and for that reason I wrote a little library called ScriptCheck.

ScriptCheck is a simple package that allows you to specify a notification method by registering a handler prior to the script’s execution. At the start of the application, ScriptCheck will register itself as the default PHP error handler via set_error_handler and set_exception_handler. Afterwards, the script will continue executing as usual. If an error occurs, the error handler will catch that and notify all handlers/observers.

Sample code:

//Use composer autoloader or your own!
require 'vendor/autoload.php';

use Lorenum\ScriptCheck\ScriptCheck;
use Lorenum\ScriptCheck\Handlers\FileLoggerHandler;

//Instanciate ScriptCheck and add a fileLoggerHandler to be notified
$sc = new ScriptCheck();
$sc->addHandler(new FileLoggerHandler("test.log"));
$sc->register();

//You application logic goes below here!
throw new Exception("Test Exception");

As of right now, ScriptCheck allows for 4 different types of handlers to be registered:

  1. FileLoggerHandler: Log the error into a file, similar to the standard LAMP stack error logger.
  2. SQLDBHandler: Insert the error as a row into a SQL DB.
  3. EmailHandler: Emails the registered person all error details.
  4. APICallHandler: Calls an external API via POST or GET.

You can actually combine as many of these as you want. You can also extend any of them or implement your own by using the HandlerInterface interface.

More information about this package can be found in the Readme file at Github.

Cheers

Standard

Hello World!


<?php

echo "Hello World";

?>

As a first post on this blog, I cannot find anything more suiting to write than “Hello World!”.

This blog marks the start of a new chapter in my career as a programmer. I intend on having this blog become a creative outlet for me where I get to display some of my work, discuss new ideas/concepts and hopefully create an entertaining and educational environment for beginner programmers to enjoy!

 

This is going to be fun :)

 

Cheers!