A rip roaring affair

So sometimes you get asked unusual questions in my inbox this morning was a letter from a nice person called Neyma I won’t publish all of it just the bit relevant to this post.

but for now, I have a theoretical question that you may be inclined to
answer -

Is it possible to rip the entire SU database? Can it be done in
one-click or would a bot have to be set? How hard would this be? what
sort of information could we get?
what about for some of the other Social sites?

For Neyma; not all, bot, not easy, stuff, yes

Now I’m sure my emailer has a perfect reasonable reason for theorising about wanting to rip the entire Stumbleupon database but I have a feeling Ebay would not be impressed. So I thought would discuss content scraping how its done and suggest some ways to prevent it.
Read the rest of this entry »

Re-captcha your comments

Comment spam sucks, its as simple as here at Venture Skills blog we get hundreds of spam a day most are caught by Akismet but not all are. If your hosting your own blog or site then no doubt you have tried many combinations. One simple method of reducing spam is a CAPTCHA “Completely Automated Public Turing test to tell Computers and Humans Apart” These provide puzzles which are easy for a human to solve but are much more difficult for a machine. Read the rest of this entry »

User agents and referrers – who are you any way?

We rely on knowing who is coming to our site and how they got their our conversion goals are set by this and sometimes our authentication systems rely on them but what are these concepts and how can we abuse use them.

User Agents

A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web. Web user agents range from web browsers to search engine crawlers (”spiders”), as well as mobile phones, screen readers and braille browsers used by people with disabilities. When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.

Read the rest of this entry »

Stumbling voyeur

I wish to tell you a tale of how the on-line world can suck you in, it all started when I joined the Stumbleupon beta group you can to! The beta group gets to try out all the great new things like the new user interface, it also give you new buttons to press! and here is where it all went wrong…
screenshot of stumbleupon new interface
As you can see the new interface is a much more polished feel to it then the old one, it also now made some features that were perhaps buried more easily accessible one of them is “stumble sites they like” button so now you can start going through other stumblers likes and dislikes ;) all your secrets are mine muhuhhhhaaaa
Read the rest of this entry »

Google Analytics and Privacy laws plus P3P

I thought I would clear up to some misconceptions regarding the use of third party cookies on a site and in paticular statistic tracking cookies such as Google Analytics Urchin cookie. There are two misunderstandings regarding the use of cookies, 1 that its illegal to store a third party cookie on a persons computer, 2 it breaches accessibility guidelines neither of these are true. You do however need to declare the use of third party cookies both in your privacy document and in your P3P document and compact headers.

Google Analytics holds your data on machines in the United States and as such you need to reference that it is held under US law which may differ from local laws of the host site.

If you need further proof that its perfectly legal to use Google Analytics in the UK and elsewhere have a look at www.ico.gov.uk which is the site for the Information Commissioner’s Office the people who regulate and enforce privacy in the UK and who tracks stats using, guess what Google Analytics!
Read the rest of this entry »