Smart Filter Overview

Smart Filter Overview

Abstract

This specification details the serialization of a request-response structure using the JSON format. The structure may encapsulate a user's request for content or metadata about other users or objects, or a collection of results to such a request and may include references to user and content metadata distributed across the Web.

Introduction

Smart Filter technology is a [JSON]-based request and response data specification which incorporates features of [JSON-Schema], [JSON-LD], [SPARQL], and [Freebase/MQL], in a portable, concise format which can neatly encapsulate a variety of user and media profile metadata structures as well as decentralized requests for and subsets of those data structures. Areas of application include:

  • JSON Activity Streams [JSON-AS]
  • Data search and retrieval
  • Content filtering
  • Content and user recommendations
  • Content tagging and rating
  • Dynamic, tag-based media collections
  • Structured message passing

Developed by Kendra Initiative under the auspices of the EU FP7 project P2P-Next, the initial implementation of Smart Filters within the Kendra Signpost project involves a dynamic interface for centralised search and recommendations involving inference engine traversing the semantic graph of entity relationships/mappings. The project was developed as a Drupal install profile, consisting of a suite of Drupal modules which are available from Kendra Initiative's GitHub repository and available for testing via the Smart Filter Query interface.

Features:

  • faceted search and display
  • wildcard searches
  • date/time and numeric operators including numeric ranges
  • grouped/nested queries
Notational Conventions

The text of this specification provides the sole definition of conformance. Examples in this specification are non-normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

This specification allows the use of IRIs [RFC3987]. Every URI [RFC3986] is also an IRI, so a URI may be used wherever below an IRI is named. There are two special considerations: (1) when an IRI that is not also a URI is given for dereferencing, it MUST be mapped to a URI using the steps in Section 3.1 of [RFC3987] and (2) when an IRI is serving as an id value, it MUST NOT be so mapped.

Definitions

  • explicit contacts = profiles of users in a user's SARACEN contact list or in a user's linked contact lists
  • explicit peers = profiles of users in a user's selected group or social circle
  • implicit peers = "like-minded users": profiles of users with similar interests to mine based on selected bits of metadata
JSON Encoding

JSON Serialization

Smart Filters are serialized using the JSON format, as defined in [RFC4627]. Alternative serializations MAY be used but are outside the scope of this specification.

In the JSON serialization, absent properties MAY be represented either by an explicit declaration of the property whose value is null or by omitting the property declaration altogether at the option of the publisher; these two representations are semantically equivalent. If a property is having a value whose type is a JSON array, the absence of any items in that array MUST be represented by omitting the property entirely or publishing it with the value null, and MUST NOT be represented as an empty array, except as otherwise stated in the definition of a specific property.

Unless otherwise specified, all properties specifying date and time values within the JSON serialization, including extensions, MUST conform to the "date-time" production in [RFC3339]. In addition, an uppercase "T" character MUST be used to separate date and time, and an uppercase "Z" character MUST be present in the absence of a numeric time zone offset.

Advantages of JSON over RDF/XML

  • Compact
  • Concise
  • Easy to read
  • Cross-domain in-browser usage via standard AJAX methods (XMLHttpRequest, JSONP)
  • Parsers exist for most modern languages

Disadvantages of JSON over RDF/XML

  • Requires transformations to and/or from RDF/XML for some implementations (e.g. SPARQL endpoints with no JSON support)
  • TBD

Existing JSON specifications for metadata encoding

JSON-LD and [RDF/JSON]

The two prevalent specifications for encoding RDF-style metadata within a more compact, easily parsed JSON syntax are JSON-LD and RDF/JSON. For a comparison of JSON-LD and RDF/JSON for formatting data structures, see Comparative Serialisation of RDF in JSON and Follow-up to serialising RDF in JSON.

Advantages of JSON-LD over RDF/JSON

  • Clear treatment of namespaces
  • Compact
  • Concise
  • Easy to read
  • Clear separation of object values from predicate values
  • Support for datatype and language metadata
  • Support for CURIEs [CURIE]

Disdvantages of JSON-LD over RDF/JSON

  • Metdata cannot be handled as simple key-value sets without additional transformation

Examples

Example user profile data packet including normalization of the "age" attribute in Javascript:

var myObj = { "@context" : { "xsd" : "http://www.w3.org/2001/XMLSchema#", "name" : "http://xmlns.com/foaf/0.1/name", "age" : "http://xmlns.com/foaf/0.1/age", "homepage" : "http://xmlns.com/foaf/0.1/homepage", "@coerce": { "xsd:nonNegativeInteger": "age", "xsd:anyURI": "homepage" } }, "name" : "Joe Jackson", "age" : "42", "homepage" : "http://example.org/people/joe" }; // Map the language-native object to JSON-LD var jsonldText = jsonld.normalize(myObj);

Using jsonGRDDL, JSONT, and JSON-Schema to transform JSON into RDF/JSON

  • TBD: update schema from RDF/JSON to JSON-LD
  • TBD: provide description

Examples

  • TBD: update examples
    Here is a simple JSON instance representing a person. It links directly to a JsonT transformation:
{ "$transformation" : "http://buzzword.org.uk/2008/jsonGRDDL/jsont-sample#Person" , "name" : "Joe Bloggs" , "mbox" : "joe@example.net" }

The JsonT transformation maps the object to RDF/JSON using FOAF:

var Person = { "self" : function(x) { var rv = { "_:Contact" : { "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" : [{ "type" : "uri" , "value" : "http://xmlns.com/foaf/0.1/Person" }], "http://xmlns.com/foaf/0.1/name" : [{ "type" : "literal" , "value" : x.name }], "http://xmlns.com/foaf/0.1/mbox" : [{ "type" : "uri" , "value" : "mailto:" + x.mbox }] } }; return JSON.stringify(rv, 0, 2); } };

The output would be the following RDF/JSON:

{ "_:Contact" : { "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" : [{ "type" : "uri" , "value" : "http://xmlns.com/foaf/0.1/Person" }], "http://xmlns.com/foaf/0.1/name" : [{ "type" : "literal" , "value" : "Joe Bloggs" }], "http://xmlns.com/foaf/0.1/mbox" : [{ "type" : "uri" , "value" : "mailto:joe@example.net" }] } };

Existing JSON stratifications for request and response formatting

MQL

The Metaweb Query Language (MQL) [Freebase/MQL] which is used to query and navigate over 12 million topics on Freebase.com (acquired by Google in July 2010 [FREEBASE]), leverages some of the subtleties of the JSON specification to provide a clear and concise syntax for querying large datasets across domain boundaries.

Example request:

[{ "type":"/music/artist", "name":null, "album": [{ "name":null, "track": [{ "name":"Too Much Information", "length": null }] }] }]

Example response:

[{ "type" : "/music/artist", "name" : "The Police", "album" : [{ "name" : "Ghost in the Machine", "track" : [{ "name" : "Too Much Information", "length" : 222.733 }] },{ "name" : "Message in a Box (disc 3)", "track" : [{ "name" : "Too Much Information", "length" : 222.733 }] }] },{ "type" : "/music/artist", "name" : "Duran Duran", "album" : [{ "name" : "Duran Duran", "track" : [{ "name" : "Too Much Information", "length" : 296.573 }] }] },{ "type" : "/music/artist", "name" : "Quiet Riot", "album" : [{ "name" : "Alive and Well", "track" : [{ "name" : "Too Much Information", "length" : 268 }] }] }] Smart Filter Query format

Data are encapsulated JSON structure

Queries are serialized as a set of rules using JSON-LD syntax.

  • [SPARQL] is the W3C recommendation for querying large RDF data stores.

Responses and query result sets are encapsulated using JSON-LD.

Examples

Following is a simple, minimal example of a JSON serialized Smart Filter representing a search executed explicitly by the user against a content database:

* *TBD*

The search server could return a response such as the following, which is also a JSON serialized Smart Filter:

* *TBD* Methodology
  1. Refer to users unambiguously, using strong identifiers, e.g. by OpenID, WebFinger, SG nodemapper ID, or email address
    • Discuss and document the advantages of each
    • Document use of OpenID or other standard for referring to SARACEN users internally
  2. Building filters
    • Combination of the explicit and implicit rules
    • Including individual items (e.g. users or content) in a set
      • e.g. Smart Filter = set of all users in a user's "work" group plus Alice and Bob minus Foo and Bar that have been blocked
    • Combining Smart Filter rule sets with explicit groups (e.g. of users)
  3. Moving a user's contacts between social circles
    • Moving users from implicit to explicit (i.e. follow or request friendship)
    • Moving users from explicit to implicit (i.e. recommend circle to my contact)
    • Blocking and unblocking users from a set
  4. Creating implicit filters at time of activity
    • e.g. activity stream based on implicit filters which are extracted from the media metadata
  5. Filter groups - group types of filters together
  6. Selecting filters at the time of publishing (ref: Google+, Facebook)
Content recommendations based on a user's interests
  • Propose other media for the user to follow
  • Propose changes to the user's security settings based on
Discussion
  • "with whom am I sharing my current activity?"
  • "to whom is this shown, besides my friends?"
  • "How do I know the recommendations are coming from my content-based filtering
Security Considerations

Publishers or Consumers implementing Smart Filters as a stream of public data may also want to consider the potential for unsolicited commercial or malicious content and should take preventative measures to recognize such content and either identify it or not include it in their stream implementations.

Publishers should take reasonable measures to make sure potentially malicious user input such as cross-site scripting attacks are not included in the Smart Filters data they publish.

Consumers that re-emit ingested content to end-users MUST take reasonable measures if emitting ingested content to make sure potentially malicious ingested input is not re-emitted.

Consumers that re-emit ingested content for crawling by search engines should take reasonable measures to limit any use of their site as a Search Engine Optimization loophole. This may include converting un-trusted hyperlinks to text or including a rel="nofollow" attribute.

Consumers should be aware of the potential for spoofing attacks where the attacker publishes activities or objects with falsified property values with the intent of injecting malicious content, hiding or corrupting legitimate content, or misleading users.

Smart Filters are JSON Documents and are subject to the same security considerations described in [RFC4627].

Smart Filters implementations handle URIs. See Section 7 of [RFC3986].

Smart Filters implementations handle IRIs. See Section 8 of [RFC3987].

License

As of [date], the following persons or entities have made this Specification available under the Open Web Foundation Agreement Version 1.0, which is available at http://www.openwebfoundation.org/legal/.

[List of persons or entities]

You can review the signed copies of the Open Web Foundation Agreement Version 1.0 for this Specification at http://activitystrea.ms/licensing/, which may also include additional parties to those listed above.

Your use of this Specification may be subject to other third party rights. THIS SPECIFICATION IS PROVIDED "AS IS." The contributors expressly disclaim any warranties (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, fitness for a particular purpose, or title, related to the Specification. The entire risk as to implementing or otherwise using the Specification is assumed by the Specification implementer and user. IN NO EVENT WILL ANY PARTY BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS SPECIFICATION OR ITS GOVERNING AGREEMENT, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT THE OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Normative References
  • [RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.
  • [RFC3339] Klyne, G., “Date and Time on the Internet: Timestamps,” July 2002.
  • [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI),” January 2005.
  • [RFC3987] Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs),” January 2005.
  • [RFC4627] Crockford, D., “The application/json Media Type for JavaScript Object Notation (JSON),” July 2006.
  • [SPARQL] SPARQL Query Language for RDF, Available from http://www.w3.org/TR/rdf-sparql-query/
Non-Normative References
Define filters for searches, display and access control which are portable, shareable and interoperable.

Issues

Buggy or inaccurate documentation? Need support? Need help programming? Please file an issue.

There are no related issues for this document.