Skip to content →


  • Decomposing the Value Chain in the Cloud

    Decomposing the Value Chain in the Cloud

    Back when consumer USB drives were flooding the market, it was quite common to witness how people would challenge datacenter-grade storage. They would do so by comparing it with devices they could readily buy on any retail store immediately and inexpensively. By comparison, IT storage was costly and slower to deliver.

    Perceptive observers could think that, either there was a scam going on… or, most probably, those two things were not exactly the same

    The funny thing is that the same still (yes, still) happens with Cloud Services. Despite being around for more than a decade, people still react suggesting that “Cloud Service Providers (CSPs) are quicker and cheaper than the IT department”. There is some merit to that claim (and we will get to that in a minute). However, as with the case of consumer storage, these two are different actors doing different things in the IT Service Delivery value chain.

    I know that this is a question that might be clear for some of you. But I keep seeing this way too often. The range of the confusion that I see is also quite large: from non-tech savvy people to folks that participate in the Cloud Services business to diverse degrees.

    This suggests that, regardless of what somebody would think, this topic is quite sophisticated and complex for many people. Therefore, let's try to clear it up, highlight the potential differences, and explore the key implications that they represent.

    The tip of the iceberg

    Often times, people refer to Cloud Services that match the core of the computing stack when they make those claims about purchasing and provisioning technical components quickly and inexpensively. The context of these conversations can be shown in figures 1 and 2.

    Decomposing The Value Chain In The Cloud - Figure 1
    Figure 1 – Computing Stack
    Decomposing The Value Chain In The Cloud - Figure 2
    Figure 2 – Computing Stack and related Cloud Delivery Models

    Sure, purchasing and provisioning take time in the traditional IT world and, yes, they just take minutes in the Cloud. However, these processes are just two small pieces in the IT Service Delivery value chain. For instance: somebody has to select which components must be provisioned and why; someone must integrate them, configure them and support them; some other must make sure that the solution is secure at all the stages of its life cycle; the operational model of that solution must fit in the ecosystem it belongs; medium and long term concerns also ought to be taken into consideration …

    In other words, whereas the scope of the initial conversation might seem narrow and self-contained, the reality of IT Service Delivery is much more complex, as highlighted in Figure 3.

    Decomposing The Value Chain In The Cloud - Figure 3
    Figure 3 – IT Service Delivery Architecture

    As you can see, the IT Service Delivery story is about a team effort of different functions and processes that go beyond purchasing and provisioning.

    In addition, when the discussion is focused on technology, speed of development and price of specific components, a more important aspect gets abused during argumentative wars: that is “Business Value”.

    The value chain

    The term “Business Value” is quite overused these days. That is good since everybody is keen to showcase how their different proposals provide a positive contribution to their prospects. Nevertheless, the constant reference to it also tends to erode its actual relevance and meaning.

    As hinted above, the actual value to the business comes from IT Service Delivery as a value chain and not from any of the elements that participate in it. It is a team effort, not a “solo” play. Yes, local incremental improvements have an impact on the overall process (especially when they accumulate). However, it is critical to pay attention to the whole process to make sure that those improvements do not create unexpected problems somewhere else (as it is showcased in Theory of Constraints).

    Nicholas Carr in his piece “IT doesn't matter” stated that:

    “as infrastructural technologies availability increases and their cost decreases – as they become ubiquitous – they become commodity inputs. They may be more essential than ever to society, but from a strategic business standpoint, they become invisible; they no longer matter”.

    Since then, everybody agrees that Cloud Services are just commodities. And, when you put them in the context of the IT Service Delivery value chain, they represent a fraction of the Service Pricing scheme (as depicted in Figure 4) and, generally speaking, have a minor relative weight in the grand scheme of things.

    Decomposing The Value Chain In The Cloud - Figure 4
    Figure 4 – IT Service Delivery value chain


    Now that we understand the landscape, we are in a much better position to confront the original claim: “CSPs are quicker and cheaper than the IT department”. Of course they are. But they do different things, have different missions. CSP's costs are also commodities with a minor relative weight in the IT Service Delivery value chain. In other words, in my opinion, that statement misses the whole point.

    CSPs optimize provisioning and purchasing in a very immediate way, but we must remember that this just represents a local optimization of the IT Service Delivery pipeline. We can't forget that traditional approaches still respond to business needs and constraints. And, even in these cases, there are optimization strategies that can be explored too, like co-location, renting or outsourcing, to name just a few.

    Cloud Service Management and Operations, practices such as ChatOps and DevSecOps, and new roles such as the SRE, leverage Cloud-based Technologies to improve other parts of that value chain. However, these are not “things” that you can buy. In fact, adopting them takes commitment, time, and effort.

    This is the actual Cloud Transformation story. A story that also takes into consideration traditional environments, existing modes of operation and specific business challenges, constraints, and priorities. In this context, the perspective of an Enterprise Architecture is key: it is not Cloud Transformation, but Business Transformation that matters. Cloud Transformation is just a means to that end! The ultimate challenge is the ability of the whole organization to develop and transform its own capabilities. Because these will ultimately determine its ability to thrive in the marketplace in the medium and long term.

    In summary. Now, the next time you hear “CSPs are quicker and cheaper than the IT department” … I hope, you can say to yourself: “seriously?” and, maybe, you could help by explaining why that is not the point.

    Picture: “Chains” by Raul Lieberwirth. Licensed under CC BY-NC-ND 2.0

  • Selecting Components for WordPress

    Selecting Components for WordPress

    I have been dealing with WordPress for years now. I have developed and maintained my own site; built a number of experiments and Proofs-of-Concept and managed blogging platforms for others. Each context has taught me something new, but one of the aspects that keeps me hooked into this environment is the vibrant community that WordPress has.

    The amount of plugins, themes, and solutions that get built on top of it is astonishing. WordPress is estimated to run to 33% of all websites and 60% of the ones with a known CMS. There is certainly a virtuous cycle behind this success … but that is a topic for another post 😉

    With so many options, comes complexity and some kind of disutility. There are many choices for doing the same thing; components that get abandoned after some period of time; components with serious quality problems despite being “beautiful”; components that do not respect privacy or local regulation; components that are insecure or contain malware; components that do not get properly supported; components that are not compatible with each other; etc.

    All this raises the need for filtering and curation. Unfortunately, we are let alone doing this task. You could try searching the web looking for articles that share some recipes. And, some of them can help you discover useful ingredients for your solution. However, extracting the set of criteria that teach you how to do your own selection and maintain it over time is a different thing.

    That is why I've decided to share the ones that I use myself hoping that it can also be useful to you.

    Dimensions to consider

    There are a number of aspects that that help us ensure that we are fully addressing our concerns and also support us turning our check-list into a more actionable one. These are:

    1. Risks and concerns addressed by each criterion.
    2. Metrics and indicators that we can use to compare components.
    3. Rating/Ranking/Priority that can help us weight situations created by conflicting criteria. I like the MoSCoW method, but you can use any other system or combine them in such a way that better fits your needs.

    I have also found that including a description for each item and providing some room to provide notes and remarks is quite helpful. This is especially true when other teams must interact with that list on their own.

    You can also consider including the rationale for each criterion. However, most of the times, this is implicitly covered by the list of risks and concerns. My take up until now has been to include it only when my stakeholders lack some key domain expertise that would help them understand that implicit rationale. This way, I make sure that the outcome is less redundant and more pragmatic.

    Selection Check-list

    With all this in mind, here you have the architecture artifact that I regularly use myself:

    Selecting Components for WordPress
    Component Selection Check-list

    External References:


    The WordPress ecosystem is great. It provides plenty of opportunities to minimize efforts in coding and maintenance thanks to the consumption and integration of features delivered by existing components. However, selecting them requires work and discipline to separate wheat from straw.

    The table above is, by no means, a definitive one. I wouldn't be surprised if you have to tailor it to fulfill your specific needs. What matters, in my opinion, is to be driven by a check-list like this. That way you will always be sure that your key concerns will be addressed properly at the earliest stage of the life cycle. Other platforms also share the same type of problems. As a result, this approach might also be helpful there.

    Of course, this is my point of view and the strategy I follow to address this curation problem. My question to you is, how do you do it? Do you miss some criteria? Which other concerns do you take into consideration?

    Picture: “Project 365 – Day 18” by Dave Edens. Licensed under CC BY-NC-ND 2.0

  • Additional thoughts about Cloud Portability

    Additional thoughts about Cloud Portability


    Despite how much I love the Cloud, it would be foolish to ignore the many challenges that it poses. And, when concepts such as Liquid IT or Multi-Cloud become part of the agenda, one of those is, without a doubt, Portability.

    Back when I was a member of the Atos Scientific Community, I was one of the authors of a whitepaper that addressed this very same topic. Since then, I was fortunate enough to witness other points of view about the subject and some have certainly got me thinking. And, despite Cloud Portability might be one of those never-ending discussions, I think that some additional aspects worth additional consideration.

    My starting point

    Portability in the Cloud has multiple facets and there is no easy and single answer to it. That is why we defended the “Architect's Approach” as the way to address it.

    Without a doubt, going “all-in” with a Cloud Service Provider (CSP) and giving up on Portability has immediate returns. It allows you to gain speed and extract value from day zero. However, the same can be told about any other technology adoption choice. This is not a new problem. Why do we care, then? Because there are risks and concerns at multiple levels. Cloud technologies are unique, though, at the scale of the potential impacts that such decisions might result in and the speed at which they happen.

    In any case, hiding behind Architecture Principles such as “Technology and/or Vendor Independence” or “Technology and/or Vendor Agnosticism” to prevent or set back changes does not make anybody a favor. Especially, since inaction or delays can represent a real competitive disadvantage.

    This means that Architects and Organizations alike must find their balance while, at the same time, push themselves to be honest and open to challenges that defy their own positions and preconceptions.

    Given said that, let's discuss some of the points of view that drew my attention:

    A systemic risk means “no risk”

    I've seen this argument taking different shapes. But, in essence, it states that the Cloud is now a systemic risk: since everyone is using it, everyone is affected. As a result, and here is the catch, nothing is going to happen because it is in everybody's interest… but, if it does, it doesn't matter either since everyone will be screwed …

    … I don't know a single serious business that would accept such a statement as a valid Risk Management strategy. Actually, if we were to accept it for just a second, then it would make no sense to outline Disaster Recovery or Business Continuity Plans, wouldn't it? Our lives would be so much simpler, that is for sure!

    Risk Management is an individual responsibility which can't be delegated. Sure, we can ignore it, we can fool ourselves, whatever, fine … But this doesn't change which are our responsibilities: one thing is to make conscious decisions based on thoughtful consideration and a different one is claiming that company in distress makes the sorrow less …

    Scale mitigates disruptive changes…

    Supporters of this idea suggest that, even though CSPs may not take decisions compatible with customer's needs and concerns, the scale they have and at which they operate prevents them from impacting a large number of users. This way, the Scale itself becomes the mitigating factor that forces them to self-behave and self-control.

    Let's use one example to illustrate the point. Let's assume that AWS still has just 1M enterprise customers and let's suppose that they make one decision that impacts just 1% of them. We can argue that 1% qualifies as a “small number” of customers or, at least, not a large one … But, oh wait! 1% of 1M represents 10.000 customers! To give you an idea, they represent roughly the equivalent to all the companies in Spain between 100 to 500 employees.

    This proves that, when talking about planet-scale figures, words like “large” or “small” do have an impact and cannot be dismissed or treated lightly. However, the truth is that all of this is completely irrelevant. If you are one of those 10.000, the argument that you are part of the small community of affected companies will not solve your problems at all. That is an argument for CSPs, but not for consumers of Cloud Services.

    This brings back the notion of Risk Management as an individual responsibility that I've mentioned above. If it does matter to you, it is on you to do something about it.

    Investment + Talent = Reliability

    This claim pretends to neglect the need for Risk Management assuming that reliability of a cloud-based solution is a direct function of the huge capital and unparalleled talent that CSPs have. If they are investing so much, and have the brightest minds on earth, they can't fail. As a result, it would be crazy not to go all-in, right?

    Needless to say that Investments, Talent and Reliability are three different things that may be related, might be correlated, but, by no means, are equivalent. If they were, instead of three words, we would certainly have just … one?

    But, anyway, the crux of the argument is Reliability. So, what is it? Reliability is a multi-level quality of an Architecture or a Solution. Cloud Services usually make the foundations upon which we build and deploy our own stuff in order to deliver an application or a service. This means that at least one piece is not owned by the CSP. We share responsibility with the Service Provider and, as a result, externalizing reliability is, simply, not possible.

    On the other hand, CSPs have been shouting to us the “Design for Failure” mantra for years. With that, they indicate that reliability should be baked into the application code rather than infrastructure components. “Design for Failure” represents a paradigm shift from traditional Enterprise Architectures in which reliability was the responsibility of the infrastructure and it also means the opposite of the original claim.

    A global oligopoly is good …

    This other position claims that prices will always be pushed downwards since big players are on this quest for never lasting economies of scale. At the same time, Cloud Service Providers will never be tempted to turn the screws on fees to drive margins up because the competition is so intense.

    I concur that this is a picture that describes the current situation quite well. However, there is a limit on how much prices can go down since infinite growth doesn't exist. On the other hand, “lock-in” (the opposite to Portability) represents, by definition, a barrier to competition. This means that, the more “lock-in” the less competition and, therefore, the bigger the temptation to raise prices.

    The truth is that we can't tell how prices will evolve. We know, however, three things:

    1. Lock-in is less likely to happen (although not impossible) in commodity services offered by different providers supported by de-facto or industry standards that facilitate both entries and exits.
    2. Lock-in is also more likely to happen in situations in which data gravity has become an issue.
      Lock-in happens more frequently on highly differentiated services for which there is no easy replacement.
    3. A small number of globally dominant players is known as an oligopoly. And this is not good. Period.

    This means that, sooner or later, price evolution will through us unpleasant surprises. Actually, it has already happened. The question will always be the depth and breadth of the impact.

    Cloud is “just” a way of consuming technology and innovation

    I cannot agree more with this posture. There is a BIG catch, though. Cloud Computing is much more than that. Overall other consideration, it is a relationship with a Service Provider. As with any other type of relationship, things can go south. This is why contracts, laws, and regulations contemplate exit clauses.

    And here is where we must assess the situation. One thing is to decide to change the energy provider for your home or even ending a life-long marriage. That may affect you personally and even your family, but the scope is limited. On the other side, ending the thing that powers systems, data, and processes for a company has deeper and wider implications. The more exposed to technology and the more coupled the company is with the Service Provider, the harder the impact will be.

    Therefore, besides being a way of consuming technology and innovation, Cloud Computing is a relationship that must be managed and taken care of.

    The innovation argument also raises the question about “where” that innovation happens. A single-vendor strategy could place you in a competitive disadvantage. Let's say, for instance, that qualitative progress leaps happen on a different Cloud Service Provider. Data gravity, licensing or other contractual terms might become barriers to entry and ruin that opportunity.

    In other words, Portability, as an architectural quality, not only supports Vendor Management activities but also ensures that the company is free to act as it needs either at tactical or strategic levels.

    The zero-sum game

    Some people sustain that we must disregard the concern of being agnostic. This is a cost-based argument is that claims that either you pay the price at the beginning (to become agnostic) or you pay it at the end (when you want to exit out of a Cloud Service Provider). Both situations override each other, resulting in a zero-sum game. This reasoning also raises a huge opportunity cost: if there is no “exist” there is no price to pay … Compelling, isn't it?

    Hopefully, at this stage, we have already debunked the fallacy about ignoring Risk Management or exit strategies.

    In any case, talking about zero-sum games means talking about numbers. Therefore, we cannot discard the possibility that there is indeed a zero-sum game in certain cases. However, we can't claim it to be a general case. Actually, given that the entrance and the exit can't easily be compared, I would assume that zero-sum games would be the exceptions rather than the norm.

    Why do I think they can't be easily compared? Let's see:

    • Entrances are progressive over time. That is, you adopt once, and you grow from there. On the opposite side, exists are bound to a period of time: “I want to exit this year on this date” (hello BREXIT!).
    • When you adopt the cloud, you know both the source and the target environment. This is not the case with exits unless you have a strategy planned in advance.
    • Systems establish relationships with one another over time making the whole bigger than its parts. This means that exits can be expected to be harder and more expensive than entrances.

    Obviously, these aspects grow in importance with scale. The bigger the adoption, the bigger the impact. And, when things get “big enough”, the nature of the problem changes too …

    All in all, despite the initial appeal of the zero-sum game argument, I can't buy into it …

    Last thoughts

    The conventional wisdom minted the expression “better safe than cure” many moons ago. I subscribe to that and, not surprisingly, my general point of view sits on the side of Portability, Agnosticism, and Independence as core values. However, they must be integrated within an approach driven by Enterprise Architecture. This means framing them in a business context and a business/service strategy. Consequently, technological decisions must be subordinated to them and not the other way around. This also means that universal or maximalist positions will never work.

    Lock-in risks tend to concentrate on the upper layers on the stack (especially FaaS/BaaS and SaaS) which are usually closer to business services. This suggests that the pressure to leverage them will always be there. In my opinion, lock-in is OK provided that decisions are rational and conscious; that risks are managed accordingly; and that we make sure that we are not trapped by fallacies like the ones exposed before. But, more importantly, we must make sure that there is complete alignment with the business at all levels.

    Picture: “Think!” by Christian Weidinger. Licensed under CC BY-NC-ND 2.0

  • Getting into the Enterprise Architecture space

    Getting into the Enterprise Architecture space

    I don't know your experience with this, but I always had this feeling that Enterprise Architecture is one of those things that everybody talks about without quite knowing what actually is.

    I myself can be blamed about this very same thing… I have practiced IT Architecture for years at different levels and across different domains. And, despite always doing my best; learning from brilliant people and working on challenging projects, the same issues emerged over and over again. Here are just a few:

    • The focus on a narrow scope wouldn't let me connect with the global picture or the ultimate vision that was driving what I was doing.
    • Despite using the most widely accepted conventions in the field, I regularly witnessed how interactions with stakeholders from other domains were hard and painful. There was no “lingua franca” and everybody had to make their best to bridge the gap.
    • I was learning incrementally about discrete techniques and approaches and thinking, “this is it”.

    I couldn't be more wrong… Eventually, it became obvious to me that, in this hyper-sophisticated world, many things must have already been solved; that my quest to connect with a global vision (whatever that meant at each moment), had a body of knowledge, a method and a set of techniques; that a community of people concerned about them must be doing something somewhere. In other words, somebody must have applied the mindset of an Architect to the Architecture discipline itself.

    If all that was true, I had to do something. Otherwise, I would be doomed to fighting the trees without understanding the forest. And, in a sense, I would also be reinventing the wheel over and over again.

    So there I was, staring at Enterprise Architecture and thinking “this is it”. Then, I got certified in TOGAF and thought, “I know Kung-Fu”. But this time was different. Now, I was right.

    Unfortunately, this has a limited impact unless TOGAF becomes a widespread practice in the industry and becomes a de-facto standard. That is why I have decided to share some materials I've developed while I was preparing my certification. I have translated the books (TOGAF 9.1 and 9.2) into a series of interconnected mind maps. There, you will find, not just the structure and the key elements of each topic, but also links going back to the original sources, embedded diagrams, notes, and other online references.

    [metaslider id=2962]

    I have found them useful not only to study the subject but also as a quick reference that can be used for everyday practice. I wish you find it useful too and, if you have any feedback, just let me know! 😀

    Picture: “Fukuoka 2012 Acros” by Arun Katiyar. Licensed under CC BY-SA 2.0

  • The dirty secrets of Speed

    The dirty secrets of Speed

    Digital Transformation is a buzzword these days. It is so for good reasons. Many businesses today face fierce pressures from competitors leveraging web scale technologies and digital business models. Many others must also adapt to younger customer bases that are particularly sensitive to engagements via digital means. In this context, Digital is no longer an instrumental item to the business, but a defining one.

    Technology has always had the ability to speed things up. We have been hearing this forever and, precisely because of this, we all tend to normalize our perception about speed and minimize how relevant it can be. Just think for a second how many technology advances have been piling up since Computing was born as a discipline: this accumulation results in the business speed that we perceive today.

    But progress is not linear. Through history, different waves of change have brought explosive new developments in relatively short periods of time. We may all remember the impact of some of them: the dawn of Personal Computing, The Internet, Mobile Computing, The Cloud and, now, the Internet of Everything. Therefore, speed is the result of something even more important: non-linear acceleration.

    In the past, we have been able to address speed and acceleration without breaking the essentials of our tools and practices. However, we have reached a point in which simple evolutions are not enough and more radical approaches are being played. This level of speed and acceleration calls for doing things differently. We must now use different tools, change our practices, rationalize both assets and teams, optimize communication and collaboration and even change our culture!

    All of this might look bright and shiny. And, if you are like me, you couldn’t be more excited with such a perspective! It’s all opportunity!

    But let’s put now this vision about speed in the Digital Transformation context. In this accelerated world, we are desperate to complete this transformation quickly. And here is the thing: at speed and at scale risks can be deadly, and there are plenty of them. Particularly, there is one that has to do with this notion that claims that we can be “quicker” and “cheaper” by “doing less stuff”. Unfortunately, when this idea is taken too far the consequences can be catastrophic.

    Yes, we must identify what really matters and think seriously about what to do with the rest. However, crossing the line of doing less Quality Assurance, less Testing, less Security or less Automation (to name just a few), means setting up a recipe for disaster. Actually, “at speed” and “at scale”, you may need more of them.

    In other words, rationalization is one piece of the puzzle. Certainly a necessary one. But when thinking about speed the key is, more than anything else, do and think things differently.

    This post was first published on Ascent
    Picture: “Tunnel (cropped)” by “Thunderchild7”. Licensed under CC BY 2.0

  • Spotting influencers and VIPs in LinkedIn with PowerShell – Part 2: “The Dark Side”

    Spotting influencers and VIPs in LinkedIn with PowerShell – Part 2: “The Dark Side”

    What we have seen on our previous post may seem interesting and powerful. Essentially, what we are doing is opening the door to creating local datasets with personally identifiable information coming from our Social Networks. That's a pretty big deal. Therefore, there are a couple of things that we need to understand before going forward.

    Privacy, Law and Ethics

    Usually, on Digital Media, whenever you can access some information is because you have rights and permissions to do so. However, I would like you to consider the difference between “can” and “should”. Now, you have the chance to “download” datasets to your PC. That's a substantial difference from when data lives fully on the server. Now you are hosting an instance of that data and therefore you are legally bound by law to those jurisdictions applicable to you.

    Bare in mind that regulatory bodies may require that you apply high levels of protection and security to datasets containing personally identifiable information about religion, race, sex orientation, or any sort of affiliation. This could very well be the case of your dataset. However, even if this is not the case, you can't be sure whether the ones you are going to deal with tomorrow will put you in such a situation. Therefore, it is a very good idea to verify that you are working in a truly secure environment.

    Having said that, let's take our thoughts a little bit further. Law and Regulations will define a set of boundaries. Unfortunately, just because something is “legal”, it doesn't mean that it is “right”. Isn't it? This is the field of values and ethics. This is an area where different people actually have different points of view and where heated debates can take place. Yes, of course, I have my own positions about these issues. However, this time, I would just like to pinpoint the problem and suggest tools that could be helpful managing these situations.

    Let's face it. You might very well find yourself behaving like “your own NSA”. Maybe because you “need” it, but mostly because you “can”. Let's see some examples to make it clear what I mean by this:

    • you might build “personal profiles” that last for as long as you decide.
    • you might store “historical records” about topics, people or communities.
    • you might share, transfer or trade (let's ignore legal implications for now) those datasets with third parties in your organization or, eventually, with partners of some kind.
    • not to mention that all this can happen without any independent oversight …

    Yes, most of the times all these activities will be guided by “good faith” and conducted by well meant people … But, what if someone changes his mind? What if goals and intentions suddenly change? Things can get messy very very quickly … I think that you can see where we are going.

    As usual, technology challenges the “status quo” by exposing us to unexpected situations. This time is no difference. Therefore, it is extremely important that you define policies for all the information states and operations (Retention, Confidentiality, Integrity, Transfer, Query, Access, etc.) and establish processes and procedures to make them effective.

    The Social Media Scripting Framework provides some features that can help you with this:

    • CreationDate, LastUpdateDate and RetainUntilDate are properties present on every single object and can help you on implementing a Data Retention policy.
    • OwnerId and OwnerDisplayName are properties present on every single object that can help you tracking as many Internal “Data Processors” as you might have inside your organization.
    • Make use of PrivacyLevels in combination with ConvertTo-PrivateTimeLine and ConvertTo-PrivateUserProfiles. These tools will mask personal information and, therefore, help you share it with third parties while preserving privacy at the same time.

    Here you have one example of how this last feature works:

    $OriginalPrivacyLevel              = $connections.LinkedIn.PrivacyLevel
    $connections.LinkedIn.PrivacyLevel = $PRIVACY_LEVEL_HIGH
    # Anonymizing the first 10 posts of the LinkedIn Timeline
    $PrivateTimeLine                   = ContertTo-PrivateTimeLine -from $LINPosts[0..9]

    I agree that these may not be bulletproof features because some of them can be tampered with. Nevertheless, at least, they are better than nothing and can be a good starting point.

    Terms of Service

    Besides what I've discussed before; besides that people might have shared things publicly; besides people have provided explicit consent to make their data available to us through APIs; etc. the truth is that we are all bound by “Terms and Conditions” of the different Digital Channels we use.

    Because we all use more than one Social Network or Cloud Service, these legal terms might be incompatible or inconsistent among each other and eventually change in unexpected ways. That would turn the situation of your existing datasets shady or unclear.

    To begin with, we must understand that companies behind Social Media and Public Cloud services constantly explore the boundaries of their relationship with their users, developers and other stakeholders. They try to find a balance among their own interests, the ones from their users and their legal obligations in different countries. Unfortunately, this situation is very dynamic and, yet, far from being over. In such an environment, every party involved ends up assuming some degree of risk to keep his/her business going.  This is, actually, a defining characteristic of immature industries or market segments where innovations are taking place.

    There is indeed a “grey area”, a “paradox” actually, that I think we should also analyze for a moment. There is a significant difference between what Social Platforms let us “technically do” and what they tell us that we “can do”. In fact, it would also be possible to comply with the “literality” of their terms, while not with the “spirit”.

    Let's consider the case of LinkedIn as an example. LinkedIn APIs Terms of Use on its chapter 3, section B, reads as follows:

    No Copying: Except as expressly permitted in these Terms, you must not copy or store any Content. This restriction includes any derived, hashed, or transformed data, or any method where you capture information expressed by the Content, even if you don’t store the Content itself.

    Cache for Performance: To improve the member experience, you may cache LinkedIn Content, but you must not do so for more than 24 hours from your original request. This limited permission to cache is for performance reasons only. You do not have any rights to the LinkedIn Content beyond this limited use.

    At this point we can make the following remarks:

    • They are forbidding something that they know they can't enforce at all. Copying. It's in the essence of any digital asset the capability of being copied/cloned “ad infinitum” without quality loss. The whole industry struggles to avoid it, although there are plenty of proofs that demonstrate that these efforts are futile.
    • They are limiting caching. The impact of such a limitation goes beyond mere “performance”. It essentially limits the value of an API that is “rate limited”. Anyway, again, there is no practical way to verify that this rule it's been honored. Nevertheless, they impose that restriction anyway …
    • APIs can enforce some policies. However, they can't be too defensive against its users and still expect to be attractive and useful.

    The whole point of participating on a Social Network is, precisely, leveraging the value of that network. Our network is just a subset of the whole data hosted by the Service Provider. His platform provides APIs whose sole purpose is to empower a community and nurture an ecosystem around that core set of APIs. When that relationship works, the combined value proposition of the core plus the ecosystem can be significantly higher … Everybody wins! (at least in theory) …

    Ok, then, why are they imposing limitations that they know they can't enforce and that clearly undermine the usefulness and value of the service as a whole? In my opinion – and I can be wrong, of course – the answer to this question is twofold:

    • “Because they are forced to do it”. Here we find, on one hand, “Law and Regulation” and, on the other, “Performance and Capacity Management” as main reasons.
    • “Because they want to do it”. Here we find “Fear and Control”.

    Personally speaking, I find that the source of the problem lays on this very last point: they want to establish a clear hierarchy of authority and control. Sometimes it's because they feel the need to protect themselves form ruthless competitors, sometimes because, in the end, they don't trust their own users … It's all about protecting their market position. Unfortunately, there is a point on these arguments. However, recent digital history has shown us that, precisely, it's by losing control how you achieve that market position and protection … That's why I think they are fundamentally wrong

    Let's have a look at some additional facts …

    Current LinkedIn daily limits allow you to get 500 Public Profiles from other people on a “per user basis”. You can't cache information beyond 24 hours, which means that your query limit is, in practical terms, 500.

    A multi-user application has a daily limit of 100k Public Profiles, but each user has a limit of 500. This means that your application has a theoretical limit of 200 users, provided that all of them decide to use the service at its maximum capacity … Because we can cache data for 24 hours, an application would, eventually, be able to share 100k user Public Profiles among 200 users. Therefore, these 200 users would have to be a tight community with very disjointed interests to be able to fully leverage that cache … I don't know what you think, but I see this is very, very unlikely.

    Note: I know that the distribution of users can be bigger than 200 depending on how many Public Profiles are requested by each of them. However, this approach is enough for the purposes of this explanation.

    Oh, but, wait a minute, I can't remember any application that doesn't deliver all the information I've ever requested … So, if these are the limitations, how are they able to make it? ;-). There are three possible answers to this question:

    • They have a special agreement with LinkedIn that involves a different set  of limits in exchange, of course, of some money.
    • They are disregarding those limitations, knowingly or not …
    • These scenarios are so new and “innovative” that they are not correctly managed by the API … (I would be flattered, … but we all know that I am not that smart).

    If I were to evaluate which of these possibilities is more likely, I would nominate the first two. The first one is, specially, a beautiful reminder that Social Networks are not a “Democracies”, but “Businesses”. We don't have “rights” and we are not “made equal” inside a Social Network. We are “the product”, we are “traded” everyday in exchange of our ability to leverage “our Network and Social Relationships” inside that platform … But, wait! Aren't we discussing how are they limiting our capacity to do precisely that? … Congratulations, you've got it ;-).

    Everything comes down to a set of Principles …

    I would like to stress that the LinkedIn case is just an example of what you could find in many other places. The nature of this problem is pretty generic and, in fact, expands beyond the Social Media domain. It also affects any sort of cloud-based service that is data-related in some way. Multi-sided markets are, actually, the ultimate sophistication of this issue. So, this is not just about LinkedIn.

    The question is that this new world is pushing us to make the most of Digital Media for competitive reasons. Sometimes it will provide us with a competitive edge and others, will just allow us to stay in business. But the problem is that this world is now working by these rules … No matter if we like it or not, that's how it is.

    “Ok, so now, what?” “What should we do when the context is so blurry?” Well, to be honest, I don't know for sure. But I think that there isn't a unique answer:

    1. If you are just a consumer and a market follower; if you are one of those that go with the masses, maybe your answer would be “play by the rules”. Use well known tools and established practices and follow strictly “Terms & Conditions”. However, you shouldn't expect any competitive edge or some crazy/awesome results.
    2. If you are an early adopter or an innovator, you will find yourself pushing the boundaries of the “status quo” sooner or later.

    In this last case, what happens is that we are left alone with our principles to assist and guide us in the decisions we make. Principles are something very intimate and personal. However, I strongly believe that embracing the “don't be evil” motto can be very helpful in these situations. Ok, but what does this exactly mean? Let's try to translate it into something more concrete and in line with the concerns we have discussed so far:

    • Social relationships are built on mutual trust. Honor that trust and manage the data they have shared with you with the same care and respect you would like for yourself.
    • Your relationship with Social Networks is symbiotic. We all need each other, so don't be a parasite.
    • Be honest, open and transparent about what's going on. It is more than likely that you are not alone.
    • If you are a competitor and your only choice is “to steal data”, you certainly have bigger problems than this one. So, don't do it.
    • You are responsible for what you do, however you are not responsible for the blurred and paradoxical environment you operate in. Complex problems often have more than one root cause but don't fool yourself into thinking that you are not the weakest of the stakeholders involved.

    Everybody can be their own NSA

    Many say that “Privacy is over”. I am not sure if I am ready to subscribe such a bold statement. What I can acknowledge is that, without a doubt, our relationship with Privacy has changed a great deal. Our capacity to gather Intelligence on-line has expanded to a point that it is now a commodity -at least, on those less sophisticated layers of technology. The Social Media Scripting Framework is just one example that can confirm this.

    However, in spite of all the negativity that this conclusion might provoke, I do think that there is a positive angle that we could acknowledge. When everybody has a sort of “a nuclear weapon” at home, the awareness about the responsibility that such a thing represents tends to grow and spread proportionally. This involve not only regular individuals, but also institutions and businesses. As a result, a kind of “self-control” and a more stable set of regulations and practices tend to be developed over time.

    Unfortunately, this is what I think it will happen in the long run. In the mean time, we will live a transition phase where headaches, nightmares and collateral damages will certainly happen with a diverse degree of frequency. And that's painful, to say the least.

    Additionally, we can't forget that the world is not a uniform nor a static reality. This affects how we analyze those natural counterbalances we have just described. Let's face it, of those that have “this new nuclear weapons at home”, 80% don't care and don't know, 15% do know and do care and 5% are just evil people. This distribution is just an oversimplification and, of course, these proportions might very well be different. The point is that we realize of two facts:

    1. The distribution will change over time and will never be perfect.
    2. The ones that “do know and do care” have a significant responsibility: specially if we analyze in detail how this group is decomposed to see who they are (Institutions, Service Providers, Marketing and Communications Professionals, Activists, Hackers, etc.).

    I believe that it is about time to talk about a Code of Ethics and a set of Principles for all of us working in technology. This is going to be tough because, nowadays, many different professions and disciplines deal with technology in many different ways. Nevertheless, its complexity won't make the problem go away and, the sooner we start working on it, the better for us as society.


    Don't get me wrong, I am not inviting you to break the law or ignore the “Terms & Conditions” that you have signed and accepted.

    I just want you to understand that the environment is not as clear as many would think. That the very same companies behind these New Media contribute to that confusion as part of their journey trying to find a balance on their relationship with their users, developers, other stakeholders and, of course the law and regulations on different jurisdictions.

    I just want you to understand that new technologies, like the Social Media Scripting Framework in this case, may create new situations that might challenge the “status quo” and put you in situations that you didn't expect in the first place.

    I think that we have to accept living with certain degree of ambiguity because of all these things. That means not only being compliant with laws and regulations. It also means going beyond them and implementing additional policies and risk management tools that allow us to move forward and deal with that uncertainty at the same time.

    At the begining of this post I said that things could become “messy very very quickly”. Hopefully, now you have a view of how “messy” this environment actually is and which elements do we have to confront it … Unfortunately, this is still too broad and open. Do you think that it would be useful to work on a proposal for an Open Policy Framework of Reference? That way, we all would have a more tangible starting point to translate these concerns into our organizations … What do you think? What is your point of view about these issues?

    Picture: “These are not the droids you're looking for” by Johan Larsson. Licensed under CC-BY-20.

  • Spotting influencers and VIPs in LinkedIn with PowerShell – Part 1: “The How To”

    Spotting influencers and VIPs in LinkedIn with PowerShell – Part 1: “The How To”

    Liking, commenting, tagging, bookmarking or defining something as favorite are all common on-line activities these days. However, most of us don't realize the depth of information we leave behind each time we perform them and what we can actually do with that information.

    This is perfectly understandable in a world that hides all those details behind APIs that regular people can't use. Fortunately, this is no longer the case anymore. Today I would like to show you how to leverage the Social Media Scripting Framework to extract meaningful information from those that connect with you or your brand in LinkedIn.

    Let's start by getting data from our LinkedIn Timelines …

    $LINPosts = Get-LINTimeLine -results 100 -quick

    Now, let's have a look at what we've got in return:

    # Distribution by SubChannel (Real Names have been masked for privacy reasons)
    $LINPosts.NormalizedPost | group SubChannelName | select Name, Count | sort count -Descending | ft -AutoSize
    Name               Count
    ----               -----
    XXXXXXXXX (Company)   33

    As you can see, we have posts from one LinkedIn Group and from a LinkedIn Company Page. You many also have noticed that we requested 100 posts and we were given just 83 … Sorry guys, welcome to the world of the LinkedIn API … In future posts I will show you how to get this around ;-).

    Let's have a look to the interactions that have taken place on these posts:

    # All interactions collected
    $LINPosts.PostConnections.Count # 1573
    # All unique people that have interacted with this brand
    $LINPosts.PostConnections | select UserDisplayName, UserDescription -unique | measure # 987

    Simple, isn't it? Notice that there are a number of people that consistently interact with that brand.

    Now, let's see if we can identify VIPs from that list of connections. For that we will use a Regular Expression to spot which ones claim to work as CxOs, VP, etc.:

    # All unique potential VIPs
    $LINPosts.PostConnections | where UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" | measure # 315

    Not bad, 315 VIPs out of 1573 … But we can do better. Let's try to reduce “noise” even further. To do that, let's try the following:

    $LINPosts.PostConnections | where {  ( $_.UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" ) -and `
                                        !( $_.UserDescription -Match "ReplaceWithYourCompanyName|ReplaceWithYourBrandName|certified|manager|ITIL|PMP|CCN|CCD|Account|Project|Factory|developer|engineer|MBA|negocios|funcional|sales|Consultant|Assistant" ) -and `
                                        !( $_.UserDisplayName -Match "PersonToExclude1|PersonToExclude2|PersonToExclude3|..." ) } `
                              | select UserDisplayName, UserDescription -unique `
                              | measure # 87

    Much better! Now, 83 out of 1573! If you read carefully the above statement this is what we are doing:

    1. We select those people that “claim” to have a significant job or role (CxOs, etc.) with a Regular Expression.
    2. We exclude insiders (those who belong to the company or brand we are analyzing) and other terms and roles that can introduce noise in our selection.
    3. We exclude specific names of people that we know are insiders but, for some reason still show up after applying our previous filters.
    4. We create a unique list of those names.
    5. And, finally, count them.

    That's fine, but I would like to know who they are, not just count them. Ok, that's easy:

    $LINPosts.PostConnections | where {  ( $_.UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" ) -and `
                                        !( $_.UserDescription -Match "ReplaceWithYourCompanyName|ReplaceWithYourBrandName|certified|manager|ITIL|PMP|CCN|CCD|Account|Project|Factory|developer|engineer|MBA|negocios|funcional|sales|Consultant|Assistant" ) -and `
                                        !( $_.UserDisplayName -Match "PersonToExclude1|PersonToExclude2|PersonToExclude3|..." ) } `
                              | select UserDisplayName, UserDescription -unique `
                              | format-table -autoSize

    Notice that this is just the same sentence we run before. The only difference lays on the last sentence: now, instead of Measure, we are issuing a Format-Table command … So, similarly, if we were interested on exporting that list to Excel, we should use the Export-Csv as our last step:

    $LINPosts.PostConnections | where {  ( $_.UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" ) -and `
                                        !( $_.UserDescription -Match "ReplaceWithYourCompanyName|ReplaceWithYourBrandName|certified|manager|ITIL|PMP|CCN|CCD|Account|Project|Factory|developer|engineer|MBA|negocios|funcional|sales|Consultant|Assistant" ) -and `
                                        !( $_.UserDisplayName -Match "PersonToExclude1|PersonToExclude2|PersonToExclude3|..." ) } `
                              | select UserDisplayName, UserDescription -unique `
                              | Export-Csv .\LinkedIn-EngagedVips-201403-0.csv

    Well, we've got it! But, wait a second, wouldn't it be nice if we were able to identify those posts where these VIPs were engaging? That would actually “close the circle” and give us a a view of the complete picture including the content … Ok, let's try the following:

    $LINPosts | where {  ( $_.PostConnections.UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" ) } `
              | select @{ Name='Title'; Expression={ $_.NormalizedPost.Title } } -unique `
              | measure # 48 posts selected

    The main difference on this query is that we are focusing on $LINPosts rather than on $LINPosts.PostConnections as our data source. Why? Because, now, we want to select content rather than people. Anyway, you may have noticed that the structure of the query is pretty similar to the previous ones:

    1. We take a collection of objects as a data source.
    2. We apply a number of filters.
    3. We select the properties we need.
    4. We define how do we want the data to look like on the output.

    Anyway, we have now 48 posts selected and if we want to view them or take them into a Excel sheet, we should proceed as we did before:

    $LINPosts | where {  ( $_.PostConnections.UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" ) } `
              | select @{ Name='Title'; Expression={ $_.NormalizedPost.Title } } -unique `
              | format-table -autoSize
    $LINPosts | where {  ( $_.PostConnections.UserDescription -Match "[ /S]VP[ /]|[ /]CIO[ /]|[ /]CTO[ /]|CEO|[ /]CMO[ /]|[ /]COO[ /]|director|head|chairman|principal|fellow|owner|founder|cofounder|co-founder|President|vice president|deputy|mgmt|Sr|Exec|Entrepreneur|Strategist|Strategy|Fundador" ) } `
              | select @{ Name='Title'; Expression={ $_.NormalizedPost.Title } } -unique `
              | Export-Csv .\LinkedIn-EngagedVips-content-201403-0.csv

    Oh, wait! Can we do the same thing in Twitter?

    Of course ;-). Just replace your LinkedIn timeline, $LINPosts in our case, with one coming from Twitter; $TwPosts for example. And that's it! :D. In fact, it's even better. You can have a Timeline with posts coming from multiple Social Media channels and apply the above queries to that dataset! Therefore, if you build a $FullTimeline based on contents from $LINPosts and $TwPosts, you can use that as your data source.

    # Building a Timeline with posts comming from more than one Social Channel
    $FullTimeline = $LINPosts + $TwPosts

    Just as a reminder, here is how you build Twitter timelines … As you can see, it's almost the same as the LinkedIn case:

    # Acquiring a timeline from ...
    $TwPosts = Get-TwTimeLine -name cveira -results 100

    The Future: Scoring Systems

    The methods and techniques discussed here are just the beginning. Future versions of the framework will be able to provide information coming from different on-line reputation scoring systems like Klout, PeerIndex, Kred and TrustCloud. You will also be able to define your own score metric if you will. That way, your capabilities for identifying influencers will be significantly expanded. Unfortunately, although we have defined the foundations to do it, we are not there yet. But, stay tuned! 😀


    As you can see, now it is pretty straightforward to perform this sort of queries against a Social Media Timeline to get insights that, otherwise, wouldn't be possible or at least, very hard or expensive to achieve.

    Additionally, the operational pattern followed to get this information is consistent and reusable which makes it ready for something we are very familiar with: the “copy & paste”. That way, those of you that may not feel prepared for an environment as intimidating as a CLI, now, hopefully, can confront it with a little bit of more confidence.

    Anyway, I know that there is always room for improvement. What is your point of view about this?

    Picture: “Magic Mirror Eye” by Steve Jurvetson. Licensed under CC-BY-20.

  • “Less is more” … Have we achieved it on this new release of the Framework?

    “Less is more” … Have we achieved it on this new release of the Framework?

    Back in 2013, I shipped the Social Media Scripting Framework for the first time. I was excited about it, but, at the same time, I realized that there were some things that, clearly, were too complicated. There is still a lot of work to do to make it even more simple and more capable. This is, definitely, not over. Anyway, I would like to spend some time showing you how the new updates have simplified the way you interact with the framework and how to get the most of it.

    Working with Excel

    The Excel module has changed a great deal in many ways. First, you no longer have to have a local copy of Excel on your local computer in order to work with data stored on Excel files. Of course, you can have it, but the Social Media Scripting Framework doesn't rely on it any more.

    Let's see what your experience was and how it is today:

    On previous versions …

    # Opening an Excel file ...
    $excel              = New-ExcelInstance
    $book               = Open-ExcelFile         -instance $excel -file .\SMCampaignControl-demo-201303-0.xlsx
    $CampaignSheetId    = Get-ExcelSheetIdByName -book $book -name 'campaign'
    $CampaignDataSource = Get-ExcelSheet         -book $book -id $CampaignSheetId
    $CampaignInfoSchema = Get-ExcelHeaders       -sheet $CampaignDataSource
    $campaign           = Load-ExcelDataSet      -sheet $CampaignDataSource -schema $CampaignInfoSchema  # ETC: 15 minutes.
    Close-ExcelFile -instance $excel -book $book
    # Saving data to an Excel file ...
    $excel              = New-ExcelInstance
    $book               = Open-ExcelFile          -instance $excel -file .\SMCampaignControl-demo-201303-0.xlsx
    $CampaignSheetId    = Get-ExcelSheetIdByName  -book $book -name 'campaign'
    $CampaignDataSource = Get-ExcelSheet          -book $book -id $CampaignSheetId
    $CampaignInfoSchema = Get-ExcelHeaders        -sheet $CampaignDataSource
    Save-ExcelDataSet   -DataSet $UpdatedCampaign -sheet $CampaignDataSource -schema $CampaignInfoSchema # 6 mins.
    Save-ExcelFile      -book $book
    Close-ExcelFile     -instance $excel -book $book

    And, with the new version …

    # Opening an Excel file ...
    $campaign = Import-ExcelDataSet -file .\SMCampaignControl-demo-2014-0.xlsx -sheet campaign -DataStart 3,2
    # Saving data to an Excel file ...
    Export-ExcelDataSet $UpdatedCampaign -file .\SMCampaignControl-demo-2014-0.xlsx -sheet campaign -DataStart 3,2

    Despite being more simple, no previous functionality has been lost. In fact, the new implementation includes the possibility to define the point where your dataset begins.

    The Information and Process Cycle …

    On previous versions of the framework it was still necessary to transform and/or attach external data to that retrieved from the different APIs. That operation was done during the so called “Normalization process” which, by the way, couldn't be executed until the main information was received from the service we were interrogating. This difference between “Data Acquisition” and “Data Normalization” also made regular information updates more complex than necessary.

    There weren't either any means to enforce common data structures for each digital channel. As a result, datasets coming from different digital channels were not compatible among each other … Long story short, a little nightmare composed by multistage processes that had to be performed in specific ways …

    But let's try to make it visual. Let's see how the “before” and the “after” look like:

    On previous versions …

    $TwitterTL            = Get-TweetsFromUser cveira -results 250 -quick     # Data Acquisition
    $NormalizedTwitterTL  = Normalize-TwitterTimeLine $TwitterTL -IncludeAll  # Data Normalization
    # (Re)Building a "Virtual Time Line" from a set of permalinks. This was the way of "updating" information ...
    $UpdatedTwitterTL     = Rebuild-TwitterTimeLine -from $campaign                  # Data Acquisition: Timeline with "updated information"
    $NormalizedTwitterTL  = Normalize-TwitterTimeLine $UpdatedTwitterTL -IncludeAll  # Data Normalization
    # Analyze-FBTimeLineProspects can only be used with Facebook data sets ...
    $FBTimeLine | Analyze-FBTimeLineProspects | sort Likes, Comments -descending | select -first 10 | format-table -autoSize

    And, with the new version …

    $TwPosts        = Get-TwTimeLine -name cveira -results 100 -quick } # Data Acquisition + Data Normalization. All in one shot.
    $TwUpdatedPosts = Update-TwPosts -from $( $TwPosts | where { ( [datetime] $_.NormalizedPost.PublishingDate ).Year -eq 2014 } ) # Data Update + Data Normalization. All in one shot.
    $TwPosts.PostConnections | Measure-SMPostProspects # This "Prospect Analysis" function now works for any supported platform

    As you can see, now, we just focus on “what we want”. We don't need to think or recall that there is a “Normalization Process” or a “Rebuilding process”. We don't need to figure out “how they work”. We just “declare what we want” using very basic primitives: “Get” and “Update”. That's it.

    Notice that we can now define advanced in-line filters to narrow the information we want to update:

    $TwPosts | where { ( [datetime] $_.NormalizedPost.PublishingDate ).Year -eq 2014 }

    Because now data structures have been standardized, now you can reuse that filter as a pattern that you can apply to other digital channels. In other words, just learn things once, and profit as many times as you use them ;-).

    Social Media Campaigns and Social Media Analytics

    In a previous post I already discussed the topic of how to run a Social Media campaign with PowerShell. It basically explored ways for building data sets with information coming from different digital sources so that we could, afterwards, merge and correlate them with other business-related data. The goal remains the same. Then, the question is: are there any significant changes? Let's have a look:

    ### Twitter ----------------------------------------------------------------------------------------
    # It is always a good idea to get to know how much time it takes to acquire data from a Social Network or any Digital Channel.
    # Measure-Expression is part of the framework too ;-)
    . Measure-Expression { $TwPosts = Get-TwTimeLine -name cveira -results 100 -quick } # 100 posts - 00:01:44.
    $TwPosts | Export-DataSet -SourceType TwPosts -label MyCampaign # Optional step. But it might save you time later on.
    # A couple of weeks later, let's see how many posts do we have from February ...
    $TwPosts         = Import-DataSet .\DataSet-TwPosts-MyCampaign-2014220-1657-26-392.xml
    $TwPosts | where { ( [datetime] $_.NormalizedPost.PublishingDate ).Month -eq 2 } | measure  # 65 posts
    # Instead of updating our full Timeline, let's update just February, because that's what we want. That would be much quicker!
    $TwUpdatedPosts  = Update-TwPosts -from $( $TwPosts | where { ( [datetime] $_.NormalizedPost.PublishingDate ).Month -eq 2 } )
    $TwUpdatedPosts | Export-DataSet -SourceType TwPosts -label MyCampaign # Let's persist the data, just in case ...
    ### Facebook ---------------------------------------------------------------------------------------
    # Let's run a quick Data Acquisition. It won't attach all the external information to the dataset, but it will be quicker.
    $FBPosts         = Get-FBTimeLine -quick # 250 posts - 00:42:44
    # We only want full data from posts published this year.
    $FBUpdatedPosts  = Update-FBPosts -from $( $FBPosts | where { ( [datetime] $_.NormalizedPost.PublishingDate ).Year -eq 2014 } )
    $FBUpdatedPosts | Export-DataSet -SourceType Facebook -label MyCampaign # Let's persist the data, just in case ...
    ### LinkedIn ---------------------------------------------------------------------------------------
    # Again. Let's run a quick data acquisition.
    $LINPosts        = Get-LINTimeLine -results 100 -quick # 62 posts - 00:13:31 (non cached data)
    # We only want full data from posts published this year.
    $LINUpdatedPosts = Update-LINPosts -from $( $LINPosts | where { ( [datetime] $_.NormalizedPost.PublishingDate ).Year -eq 2014 } )
    $LINUpdatedPosts | Export-DataSet -SourceType LinkedIn -label MyCampaign # Let's persist the data, just in case ...
    ### Merging Digital & Business Data ----------------------------------------------------------------
    # Let's aggregate data comming from all channels.
    $FullDataSet     = $TwUpdatedPosts + $FBUpdatedPosts + $LINUpdatedPosts
    # Let's load the existing campaign data from Excel (if any).
    $campaign        = Import-ExcelDataSet -file .\MyCampaign-2014-0.xlsx -sheet campaign -DataStart 3,2
    # Optional step: let's verify that our campaign hasn't been updated yet ...
    $campaign | where Channel -eq "Facebook" | select -first 10 | format-table Conversations, Likes, Audience, Downloads, Clicks -AutoSize 
    Conversations Likes Audience Downloads Clicks
    ------------- ----- -------- --------- ------
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
                0     0        0         0      0
    # Let's update our campaign dataset ...
    $UpdatedCampaign = Update-DataSet $campaign -with $FullDataSet.NormalizedPost -using $CampaignRules
    # Optional step: now, let's make sure that our campaign has been updated ...
    $UpdatedCampaign | where Channel -eq "Facebook" | select -first 10 | format-table Conversations, Likes, Audience, Downloads, Clicks -AutoSize 
    Conversations Likes Audience Downloads Clicks
    ------------- ----- -------- --------- ------
                5    40     2426         0      0
                7    42     1873         0      0
                2    45     1706         0      0
                1    34     1608         0      0
               12    44     1886         0      0
                6    75     3368         0      0
                2    58     1948         0      0
                0    37     1672         0      0
                6    55     2098         0      0
               12    34     2378         0      0
    # Finally, let's save our updated dataset into a new Excel file ...
    Export-ExcelDataSet $UpdatedCampaign -file .\MyCampagin-2014-1.xlsx -sheet campaign -DataStart 3,2

    Notice how the process for each digital channel is exactly the same and the expressiveness of each operation… Now, you are ready to revisit my previous posts and compare for yourself.


    I know that there is still a long way to go. But, fortunately, this new version of the Social Media Scripting Framework is not only more simple, but also more consistent, expressive and declarative. More “PowerShelly” in many ways ;-). In fact:

    • With this new release it is pretty straightforward to write reusable “recipes” that, eventually, you could share with others.
    • Now it is so much easier to build and maintain datasets that you can use to map digital media information with business data.
    • Now it is realistic to think of feeding your analytical models with meaningful on-line information in a sustainable way because the resources needed to do it are significantly lower.
    • Now, you can even start considering the delegation of certain parts of your Social Media Analytics process or automate them completely so that you can focus on more value added business activities.

    Sure, it is still a prototype with significant functional gaps. Nevertheless, I am confident that we have set up a solid foundation to build a powerful and robust tool capable of unleashing your digital data and boost your digital assets. But, hey!, that's what I think and, actually, it doesn't matter at all!. What matters is what you think! Do you believe that we have succeeded? Do you think that we are on the right track?

    Picture: “Less is more” by othree. Licensed under CC-BY-20.

  • Social Media Scripting Framework v0.5 BETA, has been finally released!

    Social Media Scripting Framework v0.5 BETA, has been finally released!

    It's been a while since my last blog post. I know it. But, I have been focusing on “Getting Things Done”. This new version of the Social Media Scripting Framework has been quite a challenge and, in fact, it has taken longer than expected. Anyway, it is finally here!

    But let's start from the beginning. Once the previous version was on the street, I have started collecting feedback from different sources. Many were exited with the possibilities that this tool was bringing to the table. However, it became more and more clear that some aspects had to change in order to make it more easy to use and also more scalable. In other words, if I wanted this to grow, a much better foundation was needed.

    Therefore, I started working on a full redesign of the whole architecture in order to achieve these goals. As a result, a number of significant changes were done. Here you have the main ones:

    • New Design/Architecture: Unified Data Structures for all Digital Media channels/formats and Unified semantics for all Digital Media channels.
    • Native support for LinkedIn!
    • Native support for RSS feeds!
    • Microsoft Excel is no longer needed in order to work with Excel files!
    • Configuration and Session profiles.
    • Data versioning support.
    • Data privacy policies.
    • Data expiration/aging support.
    • Data tagging/labeling support.
    • Data normalization has been made mandatory and transparent to end users.
    • Improved data quality checks. A hell of a job! 8-D.
    • Significantly enhanced and simplified user experience. The so-called “Timeline rebuilding” concept is no longer needed or used. Long story short: way easier to use! :D.
    • Basic logging and caching capabilities.
    • Improved and enhanced exception handling. In other words, enhanced reliability ;-).
    • Improved Debugging/Tracing capabilities.
    • Now function names try conform more closely to PowerShell naming conventions.
    • TimeLine analysis functions are now independent from the Digital Media channel.
    • Basic help-based documentation.
    • Lots of bug fixes.
    • … and all this, without increasing the System Requirements. Just PowerShell, for the most part 😉

    I won't bother you with too much detail for now because I will cover relevant aspects of these changes on future and more focused blog posts. However, I hope that, with this short description, you can have an idea of the scale and the depth of the changes introduced on this new release.

    There is still a lot to be done. For example, making something “scalable” is not only about having a design “good enough” that can incorporate new Services over time. It's also about performance when volumes go up significantly (Big Data). Unfortunately, I am afraid, we are not there. Maybe some day we should go down that path … but, maybe, we shouldn't. Only you will tell. The question is that there are so many opportunities in front of us, that I am absolutely positive that the journey is going to be awesome ;-D.

    Anyway, I'd like to think that this is just another step in the right direction. Nevertheless, it is up to you to decide what do you think about it … If you think this is worth your time, please, find the download details here and let me know what you think!

    1000 thx in advance! 😀

  • Running Social Media Campaigns with PowerShell

    Running Social Media Campaigns with PowerShell

    The Social Media Scripting Framework has been published now for several weeks and the feedback that I’ve been collecting so far it’s been quite positive. In fact, I’ve learned a lot from the conversations that I’ve had with some of you and I am pretty confident that we are going to see interesting evolutions on future releases thanks to your contributions. Therefore, I would like to start thanking you all for help and support.

    However, it is time to start explaining some concepts more in detail and showing up more complex examples. So, let’s start with the challenge: let’s run a Social Media Campaign from PowerShell!

    Defining the scope …

    For the purpose of this exercise, we are going to define our “project” as a broadcasting campaign run by someone that needs to know the exact impact of the activities that have been developed:

    1. under the scope of that campaign
    2. in the Social Media channels – namely, Facebook, Twitter, LinkedIn, etc. – that will participate on it.

    In order to achieve it, we have to capture the metrics associated to each post that gets pushed on these channels as a result of every action of our broadcasting campaign. In other words, if there is additional activity on them we need to be able to ignore it and identify those posts that strictly belong to our campaign. To be able to do so, we will refer to the Permalink that each post will, hopefully, have on each of these channels.

    What can we get and what we can’t …

    There are many things that we want to know. Once we start looking at Social Media channels we will soon realize that there is data that we can directly take from them. However, we will also see that, in order to turn that into meaningful information, we need to bind it to or meld it with other attributes that uniquely belong to our particular context.

    Ok, this is turning too abstract, let’s consider one simple example. Imagine that we we are only broadcasting messages to Twitter:

    • Our campaign might contain messages addressing different “themes”. Maybe, we need to aggregate information by that criteria. To do so, we should be able to “label” each tweet with the right theme.
    • Each “Theme” might contain different “stories” and these, might be composed by different “messages” or tweets …
    • Similarly, our campaign might be sharing different types of deliverables: blog posts, images, e-books, whitepapers, polls, videos, webinars, etc. What if we want to analyze our campaign by “Deliverable Type”? … Exactly, we should be able to “label” each tweet with the right “Deliverable Type”.

    I guess that you can see a pattern here: you need to manually “label” each post with all these additional contextual attributes that matter to you. The key word here, however, is “manually”. Why? because, even though you can try to infer some of these attributes or categories automatically, most of the times this is not possible. And, even when it is, you might find yourself discovering that the way to do it is not reliable nor consistent and you have to be ready to deal with the errors by hand.

    Manual work sucks. Fortunately, we can mitigate this significantly using the right tools. Microsoft Excel or Microsoft Access, to name just two of them, shine at this. They are both widely available and will help you with functions like drag & drop, copy & paste, grouping, filtering, auto-typing, data validation rules, pivot tables, data templating, and many others.

    I know, this is not an ideal situation, but now consider these other scenarios:

    1. Ignoring those context attributes at all. Ok, I agree, this doesn’t seem to be a viable alternative …
    2. Applying these labels by hand through the web interface of your Social Media tool. My experience with this approach has shown me that this is not a productive nor an scalable solution. Hopefully, this will change in the future but, unfortunately, we are not even close.
    3. Planning in advance all your campaign elements in detail with all those attributes defined so that you can import them on your cloud-based Social Media tool … Oh, wait! To do that you will use Excel! So, why bother doing the job twice? ;-).Additionally, we can’t assume that we can plan every situation. Planning is great and mature and highly structured organizations will tend to do it very well. However, we can very well be involved with teams or face situations that force us to adapt our ways. One good example of this was Oreo’s reaction to the blackout during the last Super Bowl

    Anyway, I am not telling that Excel is the right tool for the job, but it is a useful example to visualize that Excel-like editing facilities are needed to deal with this kind of tasks.

    The secret sauce …

    Ok, with all these considerations in mind we are ready to begin running our Social Media campaign in PowerShell with the help of the Social Media Scripting Framework.

    However, before we start, I would like to share with you one little secret: no matter what you do, you will find yourself following the next cycle. So, rather than memorizing “commands”, I would recommend interiorizing this chart. It is way easier, believe me:


    Basically, you can follow one of these strategies:

    1. Go to each Social Network, grab each post with its metrics and persist the results in Excel.
    2. Insert each permalink in Excel, by hand or using the previous strategy, and let PowerShell create a “virtual Time Line” that will only contain those posts whose permalinks you had in your sheet. Once you have that, you can collect the metrics and persist the results back to your workbook.

    Essentially, neither of them are mutually exclusive. Therefore you can switch back and forth at your convenience. The only thing that you must know in advance is how each Social Network deals with its Permalinks. For example, Facebook has more than one valid representation for a post. That particularity, advises to use the first strategy in the first place.

    Running the campaign …

    Let’s start by opening our Excel campaign workbook:

    $excel              = New-ExcelInstance
    $book               = Open-ExcelFile         -instance $excel -file .\SMCampaignControl-demo-201303-0.xlsx
    $CampaignSheetId    = Get-ExcelSheetIdByName -book $book -name 'campaign'
    $CampaignDataSource = Get-ExcelSheet         -book $book -id $CampaignSheetId
    $CampaignInfoSchema = Get-ExcelHeaders       -sheet $CampaignDataSource
    $campaign           = Load-ExcelDataSet      -sheet $CampaignDataSource -schema $CampaignInfoSchema  # ETC: 15 minutes.
    Close-ExcelFile -instance $excel -book $book

    There are three interesting things to notice here. The first one is the somewhat unavoidable ceremony that it takes dealing with Excel. You can blame me for that because I wanted to provide flexibility when handling different Excel layouts.

    The good news is that, thanks of real-life experience and your feedback, I’ve realized that making this process more simple and straightforward is more than interesting. So, hopefully, you should expect significant improvements on future releases on this particular chapter.

    The second one is observing how slow Excel operations can be. In this case, it took around 15 minutes to load 141 posts (rows) from my workbook.

    This clearly means that future releases will have to implement different strategies so that these activities become far more “tolerable”… As you can see, my to-do list is growing fast! :-D.

    In fact, if you think that, for some reason, you will have to close your PowerShell window before you have finished your work, I would recommend you to save your dataset to disk. That way, it will be quicker and easier for you to restart the process later on:

    $campaign | Save-DataSet –source excel –label demo
    $campaign = Load-DataSet .\DataSet-excel-demo-2013326-1323-4-552.xml

    Finally, on the last sentence you can see that I close the Excel file once I have loaded the information from it. This practice will provide you a more reliable experience on your Excel operations.

    The reasons of this are quite technical and are out of the scope of this post. However, if you are interested in going deep on this particular topic, just let me know.

    Ok, now that we have our dataset loaded, let’s have a quick look at it for a moment. Just remember that you don’t have to do this in order to run your campaign: we are doing it just for educational purposes.

    $campaign.count   # 141 posts
    $campaign | Get-Member
       TypeName: Deserialized.System.Management.Automation.PSCustomObject
    Name                      MemberType   Definition
    ----                      ----------   ----------
    Equals                    Method       bool Equals(System.Object obj)
    GetHashCode               Method       int GetHashCode()
    GetType                   Method       type GetType()
    ToString                  Method       string ToString()
    Approved                  NoteProperty System.String Approved=
    Audience                  NoteProperty System.String Audience=
    Author                    NoteProperty System.String Author=
    Campaign_Content          NoteProperty System.String Campaign_Content=
    Campaign_Medium           NoteProperty System.String Campaign_Medium=
    Campaign_Name             NoteProperty System.String Campaign_Name=
    Campaign_Source           NoteProperty System.String Campaign_Source=
    Campaign_Term             NoteProperty System.String Campaign_Term=
    Campaign_URL              NoteProperty System.String Campaign_URL=
    Channel                   NoteProperty System.String Channel=
    Clicks                    NoteProperty System.String Clicks=
    Content_Source_Team       NoteProperty System.String Content_Source_Team=
    Conversations             NoteProperty System.String Conversations=
    Deadline                  NoteProperty System.String Deadline=
    Deliverable_Title         NoteProperty System.String Deliverable_Title=
    Deliverable_Tracking_Code NoteProperty System.String Deliverable_Tracking_Code=
    Deliverable_Type          NoteProperty System.String Deliverable_Type=
    Description               NoteProperty System.String Description=
    Done                      NoteProperty System.String Done=
    Downloads                 NoteProperty System.String Downloads=
    Keywords                  NoteProperty System.String Keywords=
    LastUpdateDate            NoteProperty System.String LastUpdateDate=
    Likes                     NoteProperty System.String Likes=
    ObjectId                  NoteProperty System.Int32 ObjectId=3
    Post_PermaLink_URL        NoteProperty System.String Post_PermaLink_URL=
    Publisher                 NoteProperty System.String Publisher=
    Publishing_Date           NoteProperty System.String Publishing_Date=
    Scope                     NoteProperty System.String Scope=
    Short_URL                 NoteProperty System.String Short_URL=
    Short_URL_Stats           NoteProperty System.String Short_URL_Stats=N/D
    SME                       NoteProperty System.String SME=
    Story_Text                NoteProperty System.String Story_Text=
    Subchannel                NoteProperty System.String Subchannel=
    Tags                      NoteProperty System.String Tags=
    Target_Audience           NoteProperty System.String Target_Audience=
    Target_URL                NoteProperty System.String Target_URL=
    Theme                     NoteProperty System.String Theme=
    Time_Scope                NoteProperty System.String Time_Scope=
    $campaign | group channel | sort count -descending | format-table Count, Name -autoSize
    Count Name
    ----- ----
       28 LinkedIn
       28 twitter
       23 facebook
       10 Flickr
        8 Atos Blog
        1 SlideShare

    As you can see, my Social Media campaign contains information from posts pushed to different channels. So, in this case, instead of downloading a complete timeline from each Social Network, I will create a “Virtual Timeline” out of the Permalinks that each element of my campaign has.

    Here is how it gets done:

    $TwitterTL  = Rebuild-TwitterTimeLine -from $campaign # 7 minutes
    $FacebookTL = Rebuild-FBTimeLine      -from $campaign # 2 minutes
    $LinkedInTL = Rebuild-LINTimeLine     -from $campaign # 5 minutes
    $TwitterTL.count  # 25 posts
    $FacebookTL.count # 13 posts
    $LinkedInTL.count # 16 posts

    Now that we have the Timeline, we are ready to get the full set of metrics associated to them. To do so we will “Normalize” each timeline. The normalization process involves two main activities:

    • Adjusting the original data structures and formats that each Social Network returns so that it is easier to deal with later on.
    • Including all the additional information that didn’t come with the original request for whatever reason.

    At this point, knowing what happens behind the scenes helps a little bit. For example, on twitter, getting information about the retweets of a post, involves calling a very slow API. Therefore, it is a very good idea to download this information only for those tweets that we actually know that have been retweeted.

    $TwitterTL | where { $_.retweet_count -gt 0 } | Measure # 24 posts
    $NormalizedTwitterTL  = Normalize-TwitterTimeLine $TwitterTL  -IncludeAll  # ETC: 53 minutes
    $NormalizedFacebookTL = Normalize-FBTimeLine      $FacebookTL -IncludeAll  # ETC: 1 minutes

    You may have noticed that the LinkedIn Timeline doesn’t need to be normalized. This is due to the fact that the current implementation of the framework takes the information via web scraping and not through the API.

    I am already working to bring more consistency to this particular aspect of user experience. Additionally, I am also exploring additional abstractions and strategies to address this particular step of the process. Hopefully it will be improved significantly in future releases ;-).

    We are almost there! Now that we have the metrics, we can update our campaign accordingly:

    $UpdatedCampaign  = Update-DataSet $campaign -with $NormalizedTwitterTL  -using $TwitterRules
    $UpdatedCampaign += Update-DataSet $campaign -with $NormalizedFacebookTL -using $FacebookRules
    $UpdatedCampaign += Update-DataSet $campaign -with $LinkedInTL           -using $LinkedInRules
    $UpdatedCampaign  = $UpdatedCampaign -ne $null # workaround for a known issue
    $UpdatedCampaign.count # 54

    At this point you might be wondering, what those “rules” are and where do they come from. Well, they are defined on the SMSF-settings.ps1 file and each one contains the necessary mapping rules to automatically match each Social Network information with your Excel dataset.

    Going deep into the mapping rules is out of the scope of this blog post. However, they are pretty powerful and flexible. If you are interested in knowing how they work, I invite you to check out the in-line “documentation” that is included on the same SMSF-settings.ps1 file.

    Now we have all the metrics and Social Media information perfectly mapped with the information that came from Excel. It has taken seconds for something that usually takes hours of human-intensive work… It looks nice, doesn’t it?

    Anyhow, let’s have a look at the differences between the original campaign and the updated one. You can actually skip this step, of course, but it is very useful to know how to spot the differences, isn’t it?

    $campaign | Format-Table ObjectId, Channel, Likes, Conversations, Audience, Clicks, Post_PermaLink_URL -auto
    $UpdatedCampaign | Format-Table ObjectId, Likes, Conversations, Audience, Clicks, Post_PermaLink_URL -auto

    Finally, we need to persist the data back to Excel to close the loop. We will do it by following almost the same path that we did when we loaded the information in the first place:

    $excel              = New-ExcelInstance
    $book               = Open-ExcelFile          -instance $excel -file .\SMCampaignControl-demo-201303-0.xlsx
    $CampaignSheetId    = Get-ExcelSheetIdByName  -book $book -name 'campaign'
    $CampaignDataSource = Get-ExcelSheet          -book $book -id $CampaignSheetId
    $CampaignInfoSchema = Get-ExcelHeaders        -sheet $CampaignDataSource
    Save-ExcelDataSet   -DataSet $UpdatedCampaign -sheet $CampaignDataSource -schema $CampaignInfoSchema # 6 mins.
    Save-ExcelFile      -book $book
    Close-ExcelFile     -instance $excel -book $book

    Last thoughts …

    It is obvious that there is still a lot of work to do. This is, actually an example of how it feels working with the first working prototype. However, you can also see all the pieces working together solving a real problem that, until now, it is mostly addressed by labor-intensive processes. I wish that I’ve been successful on visualizing, not only the problem, but also the difference that gets accomplished by using the tools and techniques described above.

    This initial release has taught me a lot of things and your feedback has been invaluable. Thanks to your comments and contributions now I have lots of notes that, hopefully, will make future releases better than I originally expected. Therefore, I would like to thank you all for your contributions and, of course, I would love to continue listening to your feedback.