[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tor-talk] Cryptographic social networking project

On Tue, Jan 13, 2015 at 09:48:21PM +0000, contact@sharebook.com wrote:
> It is possible to use pubsub between Alice's SC's third hop and Bob's RP
> but it doesn't improve anything for our application because your pubsub
> is a topological multicast mechanism over application layer but what
> Alice want send to her 167 friends from SC's third hop is 167 completely
> different packet. Multicast is for when she want send same packet to 167
> recipient thus pubsub at Alice's SC's third hop doesn't make bandwidth
> cost of sending Notifications cheaper. 

Well, the truth is she wants to send the same packet to 167 people and
we are discussing the trade-offs between efficiency and anonymity.
Don't include the conclusion of your thoughts in the presumptions -
that is a logical fallacy.

> My estimation was accurate but if tens of millions users supposed to
> really install our application then we need more Tor relays because
> exchanging 10 MB data with PseudonymousServer by millions of users cost
> a lot, Today Tor have thousands of relays but they are all busy to
> handle other things, there is no much space left for us. 

If people experience Tor as their Facebook replacement, that will
motivate many new donors and volunteers. The question is, if the
growth of traffic is linear or exponential.

> Sending cipher-text of "Blocks" via hidden services? I think it would
> cost much more for Tor network because hidden service have 6 hop while
> downloading cipher-text from PseudonymousServer is done through 3 hop

Hey wait, now you are saying something that doesn't fit what you said
before. Didn't we say that Alice fans out the 167 notifications from
the 3rd hop? So if we insert a multicast pubsub in there, we still only
have 4 hops per recipient on reception side (RP, intermediate, guard,
end-system), not 6 (actually 7, if you include end-system).

So the maths is, instead of having 167 interactions with a cloud system
in TOR2WEB mode (which requires 5 hops using 4 Tor relays, not 3: guard
node, intermediate, third hop, rendez-vous point) we have one notification 
being distributed in a load-balancable way across a tree of rendez-vous 
points that need to have the info anyway, because they want to forward 
it to the 167 hidden service recipients.

Other advantage is that the pubsub architecture can be run in low latency,
so it can be used for anonymized webcam multicasting or other streaming 
without affecting the Tor network as hard as it would to use a Bittorrent 
on top. And that debating in the comments of a status update feels more
real-timeish, chat-like.

Also, I am not so sure running the PseudonymousServer in TOR2WEB mode
is such a great idea. It makes it rather easy for an attacker to see
which machine, rendez-vous point or cloud system they need to take
down to stop conversations in an entire social network. I keep on not
being enthusiastic about this dependency. There could be several such
PseudonymousServers, but still they are easy to figure out and the less
people use them (decentralization), the more their social graph becomes 

Whereas in the multicast architecture it is not obvious which RP serves
which people, and should a certain RP be taken down, the recipients can
resubscribe just the affected pubsubs or simply renegotiate an RP going 
via the DHT. The attacker can't easily take down the entire network,
they can only hope to de-anonymize and attack individuals - and even
that we can address, by using a bit more of GNUnet's sybil-attack
resistant routing capabilities.

> and also friends hidden service might be offline (when friends are
> offline we send Notifications to public pool and whenever they become
> online they grab the Notification from pool and download corresponding

Oh, so we have a bit of Bitmessage-like architecture as well there.
Only that it is being done by cloud technology rather than a distributed
diverse infrastructure.. then again, it's up to us how we implement that.
I prefer the idea of letting either the rendez-vous points or, even better,
the user's guard nodes maintain a bit of spool space for its users.

> Block from PseudonymousServer) but it is possible to optionally send the
> cipher-text of post directly to each friend who is online through hidden
> service but still user should upload the Block on PseudonymousServer in
> order to be able retrieve it in future times because user at client side
> have limited storage which means might delete things from local cache or
> accidentally lose data on its storage which requires retrieving
> cipher-text again from PseudonymousServer. I don't see any security

I think storage space on user devices is no longer a problem. Even
smartphones can easily have a 16G memory card. We have a choice of
trade-offs to make, and asking users to devote more disk space so they
can enjoy their social network also when they are outside wi-fi range
and spool social interactions until they get back in wi-fi range - I
think that is likely to be experienced more like a cool feature rather
than a burden.

I think the way the Internet is evolving for the worst we should make
systems that keep their data to themselves rather than store them in
clouds. If in a distant future the encryption fails us, attackers would
be able to decrypt what they see right there plus how much they have
been keeping as a "full take" or "Tor snapshot." That I hope is different
from being able to access the entire history of all social network
interactions, because they're all in that cloud. Also, who pays for
Utah-like storage requirements? What is your business model for financing
the sharebook cloud servers?

> improvement on doing this because when we download something from server
> we trust Tor (if attacker can't deanonymize Tor, server only can observe
> numbers of retrievals that is useless when all users approximately have
> between 100-250 friends thus each block shows averagely 100-250
> retrieval, there is no correlation between blocks), for sending data
> through hidden services we still need trust Tor so doing that just
> increase the complexity for no reason. 
> I don't see any bandwidth consuming problem when Paul says "i agree" as
> a comment then encrypt it and upload cipher-text as a block on
> PseudonymousServer then send a Notification to Alice then Alice forward
> same Notification to all her 167 friends because it cost same amount of
> bandwidth for transferring Notifications and same amount of bandwidth
> for downloading the block 167 times from PseudonymousServer. Sending the

That's not true, you are still assuming distribution optimization is

> cipher-text block through hidden services will cost much more than
> downloading it from PseudonymousServer because when users download a
> block from it they are behind 3 hop but when Alice send something to
> friends hidden services, there is 6 hop in between. 

And here you are repeating the theory that we would be making new
circuits for each delivery which I thought was my misunderstanding
three mails ago, why are you making it yours now? There are no 6/7 hops
there, in neither the sharebook or the RP-multicast scenario.

> I think it's better spell the question of choice of trade-off like this:
> do we want forward secrecy for sending each Notification to each friend
> when we only use Mceliece cryptosystem for asymmetrical encryption? or
> do we want forget about group PQ forward secrecy by encrypting the
> Notification using a common secret (or using Attribute-Based Encryption)
> that is same for all friends to be able multicast the cipher-text value

Don't forget that there is link level encryption between each multicast
node, so an attacker would have to take over the network of relay nodes
to gather significant knowledge.

Also we should devise a multicast ratcheting method by which each
branch of the tree re-encrypts the content with a different ratchet,
thus making it difficult for somebody who p0wns a certain number of
relay nodes to recognize which subtrees belong to the same root.

> which will be same for all friends? The answer is very clear when
> security is priority and sending 167 Notification in size of 60 byte is
> very affordable. If we decide switch to another strategy, it will be
> easy by telling a common secret to all friends using a "Notification"
> that a specific "Mark" at its beginning declare what is going on (if
> desired "Mark" is not defined in application yet, we can ask users to
> update application with a new version that understands the "Mark"), but
> we won't change our strategy. 
> I don't understand why you involved block storage cloud in the question…
> it have 100% efficiency and have nothing to do with cryptography. 

It's a trade-off you are making in order to afford a round-robin fan out.
I'm trying to come up with a multicast model that respects a reasonable
amount of anonymity and no longer needs a PseudonymousServer cloud.
You are trading in scalability for what you think is the necessary
cryptography but researches seem to be of a different opinion as the
following papers show.

> But the real problem is that multicasting is not metadata friendly. 

That is a bold claim.

> it's not feasible to protect metadata secrecy on multicasting because
> you fundamentally can't send a random packet to each recipient and when
> you multicast same value then you enter one-to-many pseudonyms paradigm
> which means some social graphs between pseudonymous vertices become
> visible to observers in that zone (search social network
> de-anonymization papers for more info). 

2009, "De-anonymizing Social Networks" by Arvind Narayanan and Vitaly
Shmatikov is about correlating Twitter and Flickr users.
Is this really what you mean? Sounds pretty off-topic to me.

Other papers on the topic are these:

- 2000, "Xor-trees for efficient anonymous multicast and reception"
- 2002, "Hordes — A Multicast Based Protocol for Anonymity"
- 2004, "AP3: Cooperative, decentralized anonymous communication"
- 2006, "M2: Multicasting Mixes for Efficient and Anonymous Communication"
- 2006, "Packet coding for strong anonymity in ad hoc networks"
- 2007, "Secure asynchronous change notifications for a distributed file system"
- 2011, "Scalability & Paranoia in a Decentralized Social Network."
- 2013, "Design of a Social Messaging System Using Stateful Multicast."

The last two are our own. I'm afraid I can't find a paper that supports
your bold assertion there. You will have to help me.

Other papers on the topic of distributed social multicast, but without anonymity:

- 2003, "Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh"
- 2003, "SplitStream: high-bandwidth multicast in cooperative environments"
- 2005, "The Feasibility of DHT-based Streaming Multicast"
- 2006, "Minimizing churn in distributed systems"
- 2007, "SpoVNet: An Architecture for Supporting Future Internet Applications"
- 2008, "TRIBLER: a Social-based Peer-to-Peer System." 

Papers about anonymous social networking, without the scalability bit:

- 2010, "The Gossple Anonymous Social Network."
- 2010, "Pisces: Anonymous Communication Using Social Networks"
    by Prateek Mittal, Matthew Wright, and Nikita Borisov isn't about
    multicast, but it elaborates a way how social networks can in theory
    improve onion routing.
- 2011, "X-Vine: Secure and Pseudonymous Routing Using Social Networks."

Papers suggesting the use of social graph data to protect against
sybil attacks:

- 2006, "SybilGuard: defending against sybil attacks via social networks"
- 2013, "Persea: A Sybil-resistant Social DHT"

I think there are more on this specific topic.. ah yes, X-Vine also
proposes social protection against sybil attacks.

Since I'm not a paid researcher I have not read all of these papers,
but it does so far look like there is a majority in favor of our
architecture rather than yours.

> in pubsub, constant connections even between pseudonyms might reveal
> some parts of social graph, what I proposed as Hybrid hidden service is

I believe seeing little pieces of branches will not get you far.
The disadvantages of requiring a storage cloud are more heavy-weight.

> discontinuous and packets traveling from SC's third hope to RPs don't
> look relevant to each other, there is no way to draw a social graph
> between one sender and several RPs because when an OR sends 167 packet
> to 167 RP, an observer in between can't separate these packets are from
> same person who sent them to all those RPs, or 167 different person at
> that OR sent those packets to RPs in a linear paradigm as each packet
> looks random without any connection information. everything changes when

I challenge that, at least in the current Tor network. If the attacker
applies traffic shaping to the outgoing notification. Only if the
notification has a fixed size the third hop can avoid replicating the
shaped traffic and thus allow an observer to see which rendez-vous
points are being addressed - possibly de-anonymizing many involved
hidden services behind them. Probably there is even a chance of
de-anonymization if notifications had a fixed size, since the third hop
will suddenly be busy sending out all similarly shaped packets to 167 RPs.

> there is a constant identical connection between SC's third hope and 167
> RP that makes entire relations between pseudonyms visible to an
> observers between them without hacking ORs. 

I challenge that as well. Given a high latency packet-oriented multicast
system being fed from the third hop, distributing the content to a network
of reception points, the maximum de-anonymization that can be achieved
is by p0wning some nodes, seeing some fragments of somebody's trees,
still not being able to tell where the stuff came from and where it
will end up.

> I think pubsub is a useful tool for a liberal network when everything is
> centralized but it's not enough for a secure network when Goliaths are
> snooping on everything. 

You may be wrong, and some papers think differently.

> >Yes, because you accept the trade-off of having all those people
> >retrieve their own copy of the data block from the block cloud service.
> >That is the actual bandwidth trade-off here. How much does it cost to
> >have a million people fetch the latest tweet of @ioerror from the block
> >cloud compared to distributing that tweet using a distribution tree?
> >We essentially reduce the traffic by one million GET requests and a
> >million of copies of the tweet being pushed from the block cloud into
> >the Tor network. We only keep the one million outgoing circuit
> >operations. That should make quite a difference in scalability.
> Blocks don't have trade-off on bandwidth. Bandwidth trade-off is for
> Notifications. 

Of course accessing blocks from a third party server is a trade-off
in excessive bandwidth, please.

> We can use twitter's distribution strategy on PseudonymousServer, you
> can consider blocks as tweets, how twitter sends a plaintext tweet to
> 167 different person from different IP addresses who ask it? I guess we
> can use same method to deliver blocks to 167 different person who
> request it. 

Twitter uses a multicast-like replication system, like all cloud
systems. The question is if it makes sense to access that via
a TOR2WEB gateway or better have it built into the anonymization
network. Cloud systems are easier to set up because they are a
well understood thing, but the disadvantages are relevant.

> And in our app we limit numbers of friends to ~250 friends, if someone
> shares something to millions then probably it's not private. 

Yes, but the fact that I am interested in ioerror's tweets says
something about me. That's why I believe anonymization should
happen at any scale. That's why I would rather opt for a system
that can scale with the number of people adopting it, rather
than having to say: Sorry, twitter.com or livestream.com use
cases are unwelcome - you have to give up anonymity for those.

> >That can be achieved by creating suitable motivation. If the social
> >distance can be computed even for anonymous data, people can sponsor relays
> >that offer services to first or second degree friends without knowing 
> >what exactly and who exactly they are working for. The space for ideas
> >in this field is still vast methinks.
> There might be a lots of volunteers who are willing to donate their
> storage for incentives but they are finite not infinite, someday we

Yes, they grow at the same speed as the number of people wanting
to use them - so the principles of scalability are respected.

> finally get ride of them as numbers of blocks rapidly grow without stop
> and we should keep all blocks forever. Just search about how much data
> people post on social media everyday. 

Exactly, so your model with the centralized block cloud is doomed,
as I see it.

> >Why gone? They should already have a copy on their hard disk.
> They are gone because the hard disk itself that kept a copy is gone
> (pirated movies overflow, memory failures, volunteers running away,
> cryptolockers etc) 

Occasional failures can be recovered - we have a recovery scheme
for that. I don't believe people will systematically not provide
disk space to have a great social networking experience. One
that beats Facebook's in many ways, not just privacy.

[OFF-TOPIC: Talking about the lightweb]

> Most of people out there think Tor is only for escaping from
> totalitarians or buying drugs, they call websites behind hidden services
> "Darkweb" that sounds very macabre. Tor team should expand its coverage
> on legitimate applications for ordinary users who are not doing anything
> wrong or escaping from someone. The only upgrade that really helps is
> getting more Tor relays. I don't understand why they don't do PR like
> most of other companies. Incentive systems like Torcoin can be a good
> start. 

ioerror advocates the "lightweb" if I remember correctly.
I frequently find myself explaining to people, that even if the
anonymization were imperfect, Tor hidden services are still the most
popular end-to-end public key routing system offering higher authenticity
guarantees of the counterpart you are addressing than TLS/HTTPS. And then
you can always combine both systems. Since onion certificates are useless
if you don't own the private key, certificate authorities could hand out
certifications freely - they just need to check the identity of the
requestor, not care about the name of the onion. In other words, Tor
could market itself as the better HTTPS - if only Hidden services were
healthier. I find it surprising how frequently even people familiar
with Tor haven't looked at it that way. TOR2WEB mode for the win. If
both sides use TOR2WEB mode you have an improved HTTPS over just one proxy
hop. That hop could be optimized away actually, by going directly to
the IP of the onion server and expecting the Tor public key as the HTTPS
identity. But this all has nothing to do with the rest of our discussion.
And in general I am in favor of anonymizing more, not less.


Something which is half-way on-topic instead is... should we employ
GNUnet as a distribution infrastructure plugged betwen the third
outgoing hop and the rendez-vous points, it probably makes sense to
also use GNUnet's sybil attack resistant DHT instead of Tor's, possibly
introducing better look-up privacy. But that is something Christian
and Georges should be working out.

tor-talk mailing list - tor-talk@lists.torproject.org
To unsubscribe or change other settings go to