Spotting influencers and VIPs in LinkedIn with PowerShell – Part 2: “The Dark Side”

What we have seen on our previous post may seem interesting and powerful. Essentially, what we are doing is opening the door to creating local datasets with personally identifiable information coming from our Social Networks. That's a pretty big deal. Therefore, there are a couple of things that we need to understand before going forward.

Privacy, Law and Ethics

Usually, on Digital Media, whenever you can access some information is because you have rights and permissions to do so. However, I would like you to consider the difference between “can” and “should”. Now, you have the chance to “download” datasets to your PC. That's a substantial difference from when data lives fully on the server. Now you are hosting an instance of that data and therefore you are legally bound by law to those jurisdictions applicable to you.

Bare in mind that regulatory bodies may require that you apply high levels of protection and security to datasets containing personally identifiable information about religion, race, sex orientation, or any sort of affiliation. This could very well be the case of your dataset. However, even if this is not the case, you can't be sure whether the ones you are going to deal with tomorrow will put you in such a situation. Therefore, it is a very good idea to verify that you are working in a truly secure environment.

Having said that, let's take our thoughts a little bit further. Law and Regulations will define a set of boundaries. Unfortunately, just because something is “legal”, it doesn't mean that it is “right”. Isn't it? This is the field of values and ethics. This is an area where different people actually have different points of view and where heated debates can take place. Yes, of course, I have my own positions about these issues. However, this time, I would just like to pinpoint the problem and suggest tools that could be helpful managing these situations.

Let's face it. You might very well find yourself behaving like “your own NSA”. Maybe because you “need” it, but mostly because you “can”. Let's see some examples to make it clear what I mean by this:

you might build “personal profiles” that last for as long as you decide.
you might store “historical records” about topics, people or communities.
you might share, transfer or trade (let's ignore legal implications for now) those datasets with third parties in your organization or, eventually, with partners of some kind.
not to mention that all this can happen without any independent oversight …

Yes, most of the times all these activities will be guided by “good faith” and conducted by well meant people … But, what if someone changes his mind? What if goals and intentions suddenly change? Things can get messy very very quickly … I think that you can see where we are going.

As usual, technology challenges the “status quo” by exposing us to unexpected situations. This time is no difference. Therefore, it is extremely important that you define policies for all the information states and operations (Retention, Confidentiality, Integrity, Transfer, Query, Access, etc.) and establish processes and procedures to make them effective.

The Social Media Scripting Framework provides some features that can help you with this:

CreationDate, LastUpdateDate and RetainUntilDate are properties present on every single object and can help you on implementing a Data Retention policy.
OwnerId and OwnerDisplayName are properties present on every single object that can help you tracking as many Internal “Data Processors” as you might have inside your organization.
Make use of PrivacyLevels in combination with ConvertTo-PrivateTimeLine and ConvertTo-PrivateUserProfiles. These tools will mask personal information and, therefore, help you share it with third parties while preserving privacy at the same time.

Here you have one example of how this last feature works:

$OriginalPrivacyLevel              = $connections.LinkedIn.PrivacyLevel
$connections.LinkedIn.PrivacyLevel = $PRIVACY_LEVEL_HIGH

# Anonymizing the first 10 posts of the LinkedIn Timeline
$PrivateTimeLine                   = ContertTo-PrivateTimeLine -from $LINPosts[0..9]

I agree that these may not be bulletproof features because some of them can be tampered with. Nevertheless, at least, they are better than nothing and can be a good starting point.

Terms of Service

Besides what I've discussed before; besides that people might have shared things publicly; besides people have provided explicit consent to make their data available to us through APIs; etc. the truth is that we are all bound by “Terms and Conditions” of the different Digital Channels we use.

Because we all use more than one Social Network or Cloud Service, these legal terms might be incompatible or inconsistent among each other and eventually change in unexpected ways. That would turn the situation of your existing datasets shady or unclear.

To begin with, we must understand that companies behind Social Media and Public Cloud services constantly explore the boundaries of their relationship with their users, developers and other stakeholders. They try to find a balance among their own interests, the ones from their users and their legal obligations in different countries. Unfortunately, this situation is very dynamic and, yet, far from being over. In such an environment, every party involved ends up assuming some degree of risk to keep his/her business going. This is, actually, a defining characteristic of immature industries or market segments where innovations are taking place.

There is indeed a “grey area”, a “paradox” actually, that I think we should also analyze for a moment. There is a significant difference between what Social Platforms let us “technically do” and what they tell us that we “can do”. In fact, it would also be possible to comply with the “literality” of their terms, while not with the “spirit”.

Let's consider the case of LinkedIn as an example. LinkedIn APIs Terms of Use on its chapter 3, section B, reads as follows:

No Copying: Except as expressly permitted in these Terms, you must not copy or store any Content. This restriction includes any derived, hashed, or transformed data, or any method where you capture information expressed by the Content, even if you don’t store the Content itself.

Cache for Performance: To improve the member experience, you may cache LinkedIn Content, but you must not do so for more than 24 hours from your original request. This limited permission to cache is for performance reasons only. You do not have any rights to the LinkedIn Content beyond this limited use.

At this point we can make the following remarks:

They are forbidding something that they know they can't enforce at all. Copying. It's in the essence of any digital asset the capability of being copied/cloned “ad infinitum” without quality loss. The whole industry struggles to avoid it, although there are plenty of proofs that demonstrate that these efforts are futile.
They are limiting caching. The impact of such a limitation goes beyond mere “performance”. It essentially limits the value of an API that is “rate limited”. Anyway, again, there is no practical way to verify that this rule it's been honored. Nevertheless, they impose that restriction anyway …
APIs can enforce some policies. However, they can't be too defensive against its users and still expect to be attractive and useful.

The whole point of participating on a Social Network is, precisely, leveraging the value of that network. Our network is just a subset of the whole data hosted by the Service Provider. His platform provides APIs whose sole purpose is to empower a community and nurture an ecosystem around that core set of APIs. When that relationship works, the combined value proposition of the core plus the ecosystem can be significantly higher … Everybody wins! (at least in theory) …

Ok, then, why are they imposing limitations that they know they can't enforce and that clearly undermine the usefulness and value of the service as a whole? In my opinion – and I can be wrong, of course – the answer to this question is twofold:

“Because they are forced to do it”. Here we find, on one hand, “Law and Regulation” and, on the other, “Performance and Capacity Management” as main reasons.
“Because they want to do it”. Here we find “Fear and Control”.

Personally speaking, I find that the source of the problem lays on this very last point: they want to establish a clear hierarchy of authority and control. Sometimes it's because they feel the need to protect themselves form ruthless competitors, sometimes because, in the end, they don't trust their own users … It's all about protecting their market position. Unfortunately, there is a point on these arguments. However, recent digital history has shown us that, precisely, it's by losing control how you achieve that market position and protection … That's why I think they are fundamentally wrong …

Let's have a look at some additional facts …

Current LinkedIn daily limits allow you to get 500 Public Profiles from other people on a “per user basis”. You can't cache information beyond 24 hours, which means that your query limit is, in practical terms, 500.

A multi-user application has a daily limit of 100k Public Profiles, but each user has a limit of 500. This means that your application has a theoretical limit of 200 users, provided that all of them decide to use the service at its maximum capacity … Because we can cache data for 24 hours, an application would, eventually, be able to share 100k user Public Profiles among 200 users. Therefore, these 200 users would have to be a tight community with very disjointed interests to be able to fully leverage that cache … I don't know what you think, but I see this is very, very unlikely.

Note: I know that the distribution of users can be bigger than 200 depending on how many Public Profiles are requested by each of them. However, this approach is enough for the purposes of this explanation.

Oh, but, wait a minute, I can't remember any application that doesn't deliver all the information I've ever requested … So, if these are the limitations, how are they able to make it? ;-). There are three possible answers to this question:

They have a special agreement with LinkedIn that involves a different set of limits in exchange, of course, of some money.
They are disregarding those limitations, knowingly or not …
These scenarios are so new and “innovative” that they are not correctly managed by the API … (I would be flattered, … but we all know that I am not that smart).

If I were to evaluate which of these possibilities is more likely, I would nominate the first two. The first one is, specially, a beautiful reminder that Social Networks are not a “Democracies”, but “Businesses”. We don't have “rights” and we are not “made equal” inside a Social Network. We are “the product”, we are “traded” everyday in exchange of our ability to leverage “our Network and Social Relationships” inside that platform … But, wait! Aren't we discussing how are they limiting our capacity to do precisely that? … Congratulations, you've got it ;-).

Everything comes down to a set of Principles …

I would like to stress that the LinkedIn case is just an example of what you could find in many other places. The nature of this problem is pretty generic and, in fact, expands beyond the Social Media domain. It also affects any sort of cloud-based service that is data-related in some way. Multi-sided markets are, actually, the ultimate sophistication of this issue. So, this is not just about LinkedIn.

The question is that this new world is pushing us to make the most of Digital Media for competitive reasons. Sometimes it will provide us with a competitive edge and others, will just allow us to stay in business. But the problem is that this world is now working by these rules … No matter if we like it or not, that's how it is.

“Ok, so now, what?” “What should we do when the context is so blurry?” Well, to be honest, I don't know for sure. But I think that there isn't a unique answer:

If you are just a consumer and a market follower; if you are one of those that go with the masses, maybe your answer would be “play by the rules”. Use well known tools and established practices and follow strictly “Terms & Conditions”. However, you shouldn't expect any competitive edge or some crazy/awesome results.
If you are an early adopter or an innovator, you will find yourself pushing the boundaries of the “status quo” sooner or later.

In this last case, what happens is that we are left alone with our principles to assist and guide us in the decisions we make. Principles are something very intimate and personal. However, I strongly believe that embracing the “don't be evil” motto can be very helpful in these situations. Ok, but what does this exactly mean? Let's try to translate it into something more concrete and in line with the concerns we have discussed so far:

Social relationships are built on mutual trust. Honor that trust and manage the data they have shared with you with the same care and respect you would like for yourself.
Your relationship with Social Networks is symbiotic. We all need each other, so don't be a parasite.
Be honest, open and transparent about what's going on. It is more than likely that you are not alone.
If you are a competitor and your only choice is “to steal data”, you certainly have bigger problems than this one. So, don't do it.
You are responsible for what you do, however you are not responsible for the blurred and paradoxical environment you operate in. Complex problems often have more than one root cause but don't fool yourself into thinking that you are not the weakest of the stakeholders involved.

Everybody can be their own NSA

Many say that “Privacy is over”. I am not sure if I am ready to subscribe such a bold statement. What I can acknowledge is that, without a doubt, our relationship with Privacy has changed a great deal. Our capacity to gather Intelligence on-line has expanded to a point that it is now a commodity -at least, on those less sophisticated layers of technology. The Social Media Scripting Framework is just one example that can confirm this.

However, in spite of all the negativity that this conclusion might provoke, I do think that there is a positive angle that we could acknowledge. When everybody has a sort of “a nuclear weapon” at home, the awareness about the responsibility that such a thing represents tends to grow and spread proportionally. This involve not only regular individuals, but also institutions and businesses. As a result, a kind of “self-control” and a more stable set of regulations and practices tend to be developed over time.

Unfortunately, this is what I think it will happen in the long run. In the mean time, we will live a transition phase where headaches, nightmares and collateral damages will certainly happen with a diverse degree of frequency. And that's painful, to say the least.

Additionally, we can't forget that the world is not a uniform nor a static reality. This affects how we analyze those natural counterbalances we have just described. Let's face it, of those that have “this new nuclear weapons at home”, 80% don't care and don't know, 15% do know and do care and 5% are just evil people. This distribution is just an oversimplification and, of course, these proportions might very well be different. The point is that we realize of two facts:

The distribution will change over time and will never be perfect.
The ones that “do know and do care” have a significant responsibility: specially if we analyze in detail how this group is decomposed to see who they are (Institutions, Service Providers, Marketing and Communications Professionals, Activists, Hackers, etc.).

I believe that it is about time to talk about a Code of Ethics and a set of Principles for all of us working in technology. This is going to be tough because, nowadays, many different professions and disciplines deal with technology in many different ways. Nevertheless, its complexity won't make the problem go away and, the sooner we start working on it, the better for us as society.

Conclusion

Don't get me wrong, I am not inviting you to break the law or ignore the “Terms & Conditions” that you have signed and accepted.

I just want you to understand that the environment is not as clear as many would think. That the very same companies behind these New Media contribute to that confusion as part of their journey trying to find a balance on their relationship with their users, developers, other stakeholders and, of course the law and regulations on different jurisdictions.

I just want you to understand that new technologies, like the Social Media Scripting Framework in this case, may create new situations that might challenge the “status quo” and put you in situations that you didn't expect in the first place.

I think that we have to accept living with certain degree of ambiguity because of all these things. That means not only being compliant with laws and regulations. It also means going beyond them and implementing additional policies and risk management tools that allow us to move forward and deal with that uncertainty at the same time.

At the begining of this post I said that things could become “messy very very quickly”. Hopefully, now you have a view of how “messy” this environment actually is and which elements do we have to confront it … Unfortunately, this is still too broad and open. Do you think that it would be useful to work on a proposal for an Open Policy Framework of Reference? That way, we all would have a more tangible starting point to translate these concerns into our organizations … What do you think? What is your point of view about these issues?

Picture: “These are not the droids you're looking for” by Johan Larsson. Licensed under CC-BY-20.