Tag Archives: Business

How to automate Twitter and make a bit of a tit of yourself at the same time …

Free twitter badge

Free twitter badge (Photo credit: Wikipedia)

Oh dear, oh dear – post in haste repent at leisure ! ( If you don’t know what I’m talking about – see here. ) I’m glad to say that I recently read a book1 on business that suggested that an Agile approach ( release early, release often, and fix your bugs as you go along ) was definitely  the way towards successful business, so I’m going to imply that I did it on purpose.

So what’s gone wrong ? I can see that the script ( twitter.py -r )is running fine from cron ( /var/log/cron – it appears to run every five minutes ) – I know that if I run it from the command line within 5 minutes of creating the schedule that it works, that implies that the logic in the program ( if badly written ) is at least ok … So where is the issue occurring ? I thought initially that it was a path problem – I guess that my fault so far is that I’ve not made any effort to capture any errors. Ok, so I’ll give that a go … Great, nada being reported by cron. That’s not helpful.

Ah hah ! Got an error at last.

tweepy.error.TweepError: Status is a duplicate.

Whilst I can’t find any specific references to the error, it seems to me to be quite self explanatory. You can’t keep re-tweeting the same message – it needs to differ. That explains why the HootSuite interface was such a pain in the neck as they offload this onto the user to populate their CSV file with. I guess that the outstanding question then is “How much does a Tweet need to differ by _not_ to be considered a duplicate ?” by definition this should be a single char, so, for my 24 scheduled tweets I need to create 24 unique chars to add to the tweet. The simplest way would be to either count up or count down, this would hopefully give sufficent change to be different, as well as indicating easily to me, if not the casual observer, how far through the re-Tweet lifecycle it currently is.

The code now reads as follows:

#!/usr/bin/env python
##########################
# Python Auto Re-Tweeter #
# (C) Simon Biles 2012   #
# http://www.biles.net   #
##########################
# Version 0.01 -         #
# A first stab at it !   #
##########################
# Version 0.02 -         #
# A working version !    #
##########################

# All those tasty Python imports
import argparse
import datetime
import struct
import sys
import tweepy

from ConfigParser import SafeConfigParser

# Get the command line arguments
parser = argparse.ArgumentParser(description='Regular Tweet Generator.')
parser.add_argument('-s','--schedule', action='store_true', help='Schedule a Tweet for the next 7 days')
parser.add_argument('-r', '--run', action='store_true', help='Run the schedule')
parser.add_argument('-u','--update', action='store_true', help='Update Status Tweet immediately')
parser.add_argument('tweet', nargs='?')
args = parser.parse_args();

# Global variable
time_fmt = "%Y-%m-%d %H:%M"

# Get the config file data
parser = SafeConfigParser()
parser.read('twitter.conf')
CONSUMER_KEY = parser.get('consumer_keys','CONSUMER_KEY')
CONSUMER_SECRET = parser.get('consumer_keys','CONSUMER_SECRET')
ACCESS_KEY = parser.get('access_keys','ACCESS_KEY')
ACCESS_SECRET = parser.get('access_keys','ACCESS_SECRET')
FILE_NAME = parser.get('file_name', 'SCHEDULE_FILE')

# Main body

# Quick Command Line Update
if args.run == False and args.schedule == False and args.update == True:
   auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
   auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
   api = tweepy.API(auth)
   api.update_status(sys.argv[1])
   sys.exit()
# Schedule a Tweet by adding it to the schedule file
elif args.run == False and args.schedule == True and args.update == False:
   file_obj = open(FILE_NAME, 'a')
   current = datetime.datetime.now()
   nexttweet = current;
   count = 0
   while (count < 24):
      diff = datetime.timedelta(hours=count)
      nexttweet = nexttweet + diff
      tweettime = nexttweet.strftime(time_fmt) + " " + str(count+1) + "/24 " + args.tweet +"\n"
      file_obj.write(tweettime)
      count = count + 1
   file_obj.close 
   sys.exit() 
# Parse the schedule file and see if anything should have happend within 5 minutes of now.
elif args.run == True and args.schedule == False and args.update == False:
   file_obj = open(FILE_NAME, 'r')
   auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
   auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
   api = tweepy.API(auth)
   current = datetime.datetime.now()
   baseformat = "16s 1x"
   for line in file_obj:
      line = line.rstrip('\n')
      numremain = len(line) - struct.calcsize(baseformat)
      lformat = "%s %ds" % (baseformat, numremain)
      tweettime, tweet = struct.unpack(lformat, line)
      linetime = datetime.datetime.strptime(tweettime, time_fmt)
      delta = linetime - current
      if delta <= datetime.timedelta(minutes=5) and delta >= datetime.timedelta(minutes=-5):
         if delta <= datetime.timedelta(minutes=5):
            api.update_status(tweet)
   file_obj.close
   sys.exit()

So there you have it, a working version ! I’ve watched 10/24 Tweets fly by over the weekend, and the other 14 will play out over the next week and a bit – I must admit though that it is a bit front loaded at the moment, and behaves a little “spamily” for my liking. I think before I unleash it again, I might start it off at 6 hour intervals and let it grow from there for 18 Tweets. I’m thinking of how to track it’s success, and I have an idea, but more of that later !

 


1. The book is question was ReWork: Change the Way You Work Forever. Which I rather enjoyed, it was short and to the point – I don’t think that it is necessarily a “how-to” guide, but did get me thinking about a few things and gave me some inspiration to go out on the web and make a tit of myself like this 😉

Tagged , , , ,

Why are statistics useful in Security ? ( Part 1 )

I have a fascination with Statistics. To be honest, it tends to be a fascination with its misuse, but it is a fascination none the less. I was reminded of this over the weekend twice – once on Sunday morning, before coffee, when I retweeted a statistic:

@MarkMazza1 93% of companies that lost their data center for 10 days or more due to a disaster, filed for bankruptcy within one year. @dpoecompany

It may, or may not be true, I have no idea – but because it sounds good, I retweeted it anyway (it doesn’t actually harm my business case either), a few minutes later, half way down the first coffee of the day, it occured to me that this wasn’t quite right and I tweeted the following in pennance:

“Nothing like retweeting an unsubstantiated statistic first thing on a Sunday morning. 95% of people agree ;-)”

The second thing, was courtesy of my son, who forwarded me the following – genuine and true – statistic:

Owly Images[ I’m sorry for the lack of attribution – I don’t know whence it came – if anyone wants to tell me – I’ll happily give a credit ]

The statistics of advertising fascinate me too – the variable, and selective sample size that returns just the right percentage of “dogs that prefer” muttfood™. ( That involves finding the 8 dogs who have no sense of smell and 2 who do – to give a believable 80% … )

The point is that, as someone said ( generally attributed to Disraeli – but aparrently not his )

“There are three kinds of lies: lies, damned lies, and statistics.”

When selling things, statistics are warped and presented in such a way as to scare us, to emphasise our need for the product or even to show us that our peers are using it, so why aren’t we ? Obviously there is a huge overlap between psychology and statistics here, but none the less, the point stands.

When we know about statistics though, we can turn them to our advantage. Not only are we in a position to treat what we are told with more care, but we can start to ask questions that might actually enlighten what the reality is. Let’s go back to our first example:

“93% of companies that lost their data center for 10 days or more due to a disaster, filed for bankruptcy within one year.”

What can we ask about this data ? Well, let’s start with asking where it came from ? Who has admited that their data centre was down for 10 days ? What happens to those who’s data centre was only down for 9 days ? 5 days ? 2 days ? Is there a direct linear corrolation between data centre down time and probability of bankruptcy ? Were all the companies in good financial shape before hand ? Were they skimping on data-centre maintenance because of poor cash flow ? Was their main stock warehouse in the same building as the data-centre when it burnt down ?

A key thing to remember is that corrolation isn’t causation. This is important, and why the placebo effect is an issue in medical trials. If A rises and B rises, is A the cause of B – or is there an unseen, or more to the point, unmeasured, C that is causing the rise of B ?

However we can bring forward even more questions. Ok, so ( more or less ) 1 in 10 companies that have a 10 day outage survive – what are they doing right that we can emulate ? Did they have a better buisiness continuity plan ? ( Almost certainly – but I don’t have the data to back that statement up ). Is there a commonality amongst the companies that survived ? ( Are they all in the same industry – all consultancies for example ? Does this mean that my business is at less risk ? )

I hope you see my point that too little information about the data is, whilst not a bad thing per se, not exactly condusive to sensible decision making.

So where does that leave us within our own organisations ? Well it leaves us with an necessity to collect the right data. That’s easier said than done to be honest, because we’re back up against the corrolation/causation barrier again – we need to be sure that we are gathering data that does actually relate to what we are seeking to study. Ensuring that A is related to B involves verifying that C has nothing to do with it – nothing acts in isolation, so excluding C can save a wild goose chase and a waste of money pursuing the wrong track.

Much as it may seem unscientific, I really do recommend the idea of getting together a few people and brainstorming possible data sources and other connections between the possible influencing factors. Everyone has a perspective, and often it is the perspectives of others that add the most value !

Don’t forget the human factor in this – it could be that there are less viruses during the summer, not only because of your new AV product, but because the staff are away, surfing the net less and bringing less into the network – in fact your trial data is useless because the product is actually worse, but it had less to find, and thus looks more effective … More effective decision making is enabled with good statisics, effective decision making saves money.

This is where historical data has a value – don’t discard old reports and metrics, use them to show year on year growth and annual, monthly, weekly, daily and hourly trends. You’ll be able to make more sense of any new data in light of this information. You can also spot anomalies in the data, and, if you get to the stage of doing this in real time, you can find problems and security incidents as they happen, and that is the holy grail of information systems and security management.

In part two, next week, we’ll start to decompose some basic things to collect, potential sources and analysis of the data.

Why don’t you subscribe, either to my Twitter feed (@si_biles) or to the Blog, and you’ll be notified of that post and other things of interest as time goes on ?

Tagged , , , ,