External source

From Pelennor

Jump to: navigation, search

An external source is a data provider which attempts to push content into the local repository. There are two types, anonymous and trusted.

Contents

Anonymous

Providers which the user has no prior knowledge or involvment with are referred to as anonymous sources. These may push content at the local repository provided it matches defined relevance criteria with the data currently owned by the user.

Weaknesses

Content seeding attack

Content pushed by an anonymous source could be used to seed the relevance filter and open the door for unwanted content.

Excessive network traffic

Considering that over 90% of global SMTP traffic is now spam related, it is easy to imagine how a system which allows reception of anonymous push content would be widely abused. Because the cost of transmission is negligable, filters only cause spammers to try harder, thus generating even more traffic. This is a strong reason not to allow any form of anonymous push content to a stationary data store. Nevertheless, mobile clients may wish to receive localized push content from short-range wireless stations or other nearby mobile clients.

Convenient attack vector

Allowing for reception of data from unknown sources always increases the risk of attack via exploitable software flaws. Trusted sources may still be compromised, but denying anonymous sources at least buys some time to apply patches and provides a level of accountability. We may also use a simple frontend to authenticate data before it is passed to the backend components, which are more complex and difficult to verify.

Possible solutions

Limit rate and type of anonymous data

For example, we may allow a single untrusted source to push on packet of up to 1 kilobyte of text per minute through a heavily filtered channel.

Cryptographic cookies

Spammers rely upon the ability to transmit high volumes of data with minimal computational cost. Cryptographic cookies are computational challenges which must be solved before the source is allowed to transmit. (Ex. cracking a cipher with a reasonably short key) This would most likely be combined with other techniques.

Human verification

CAPTCHA and other techniques can be used to allow only humans to send anonymous content. This is good for contact with friends and colleagues before a formal trust link is established.

Trusted

Providers may be trusted by the user (utilizing cryptographic verification) to supply meaningful content. Once trusted, the user may subscribe to a number of channels provided by the source, confident the originator will not create bogus channels.

Course trust model

Trust falls on the source level and not on individual channels because a whole source must be trusted not to perform a “bait and switch” at the channel level. That is, if a source is not fully trusted but a channel is, that channel could be changed later to push unwanted content. Sources should only be trusted if the user is confident it will not corrupt the channels currently providing desirable content. (this may not be valid if we eschew the idea of channels and use sources as a flat concept.)

Levels of trust

Levels of trust may be used to specify finer grained security.

Web of trust

As a robust web (graph) of trust is developed, users may also decide to allow content from entities within a specified distance. This may be based on degrees of separation (nodes) or the total edge distance of the closest trust path to the node. For example, I may elect to moderately trust all of the close friends of my close friends.

Commonality

Duplicate content

It is likely and encouraged that multiple data sources will offer identical content. This will work to increase content availability in a similar fashion to BitTorrent and other peer-to-peer protocols.

Challenges

  • Keeping data consistent or ensuring the latest revision will be difficult across an unmanaged environment, especially if using UUIDs for identification rather than URLs. This may be mitigated by an optional system of of “location authority” which allows content to be traced back to a particular originator for validation, updates, or revision control. It is likely that some sort of tracker entity would be used to point to the latest root or catalog node of a logical data set. In the case of sensitive or critical information, cryptographic signatures may be used to ensure that a particular trusted source is the only contributor to a selection of data. For example, we may choose between an official news release or a collaboratively edited version.
Personal tools