BLog

ImprintImpressum
PrivacyDatenschutz
DisclaimerHaftung
Downloads 

Google’s Blind Spot

Who made the Web a comercial place? Whose licentiousness asked for counter measures of all kind of regulation authorities, which after all do affect all of us? Whose gross income has created so much envy, which in turn inflamed the ass holes of so many netizens, and disbanded large parts of the web? I won’t take part of all this anymore, I freed my BLog from Google.

Starting on April 1st, 2018, I removed already AdSense placements from my BLog. On October 1st, 2018, I locked-out Google's bots at the firewall from entering my BLog. Contrary to common believe, it is not sufficient to disallow the Googlebot in a robots.txt file like follows:

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow:

Google deliberately ignores this directive in case our site is linked-to from 3rd party sites. However, this is almost always the case - we are talking about the web, don’t we?


Blinded by the Firewall

A while ago, I published on GitHub a set of Tools for IP based Geo-blocking and Geo-routing at the firewall on FreeBSD. Right for the present purpose, locking-out the Googlebot, I implemented into the lookup tool ipup(1) the IP table generation from Unique Network Segment Owner ID’s for feeding it into the firewall tool ipfw(8) - see: https://github.com/cyclaero/ipdb

Every record in the delegation statistics files of the RIR’s carries the UNSO, the Unique Network Segment Owner ID, and this can be employed as follows:

  1. Install the latest ipdb tools from my GitHub repository, since the FreeBSD port hasn’t been updated yet for looking up the Unique Network Segment Owner ID’s:
    mkdir -p ~/install
    svn checkout https://github.com/cyclaero/ipdb.git/trunk ~/install/ipdbtools
    cd ~/install/ipdbtools
    make install clean
  2. Set up the local IP Geo-location tables by downloading and processing the delegation statistics files of the 5 Regional Internet Registries (RIRs: AFRINIC, APNIC, ARIN, LACNIC, RIPENCC):
    ipdb-update.sh
    >>>>
    /usr/local/etc/ipdb/IPRanges/afrinic.md5                74  B 2886 kBps    00s
    /usr/local/etc/ipdb/IPRanges/afrinic.dat               490 kB  285 kBps    02s
    /usr/local/etc/ipdb/IPRanges/apnic.md5                  73  B 3250 kBps    00s
    /usr/local/etc/ipdb/IPRanges/apnic.dat                5571 kB  845 kBps    07s
    /usr/local/etc/ipdb/IPRanges/arin.md5                   67  B  238 kBps    00s
    /usr/local/etc/ipdb/IPRanges/arin.dat                 9226 kB  840 kBps    11s
    /usr/local/etc/ipdb/IPRanges/lacnic.md5                 74  B 3001 kBps    00s
    /usr/local/etc/ipdb/IPRanges/lacnic.dat               2823 kB  577 kBps    05s
    /usr/local/etc/ipdb/IPRanges/ripencc.md5                74  B 6163  Bps    00s
    /usr/local/etc/ipdb/IPRanges/ripencc.dat                12 MB  974 kBps    13s
    ipdb v1.2b (), Copyright © 2016-2018 Dr. Rolf Jansen
    Processing RIR data files ...
    
     afrinic.dat  apnic.dat  arin.dat  lacnic.dat  ripencc.dat 
    
    Total number of processed IP-Ranges = 144623
    Total number of processed Segments  = 220269
  3. Lookup the IP addresses of the Googlebot:
    $ host googlebot.com
    >>>>>
    googlebot.com has address 172.217.29.196
    googlebot.com has IPv6 address 2800:3f0:4001:807::2004
    googlebot.com mail is handled by 30 alt2.aspmx.l.google.com.
    googlebot.com mail is handled by 50 alt4.aspmx.l.google.com.
    googlebot.com mail is handled by 10 aspmx.l.google.com.
    googlebot.com mail is handled by 40 alt3.aspmx.l.google.com.
    googlebot.com mail is handled by 20 alt1.aspmx.l.google.com.
  4. Lookup the UNSO’s for the reported IPv4 and IPv6 addresses each:
    $ ipup 172.217.29.196
    >>>>>
    172.217.29.196 -> 172.200.0.0 - 172.217.255.255 in US
          net segment 172.217.0.0 - 172.217.255.255
             owned by 9d99e3f7d38d1b8026f2ebbea4017c9f
    
    $ ipup 2800:3f0:4001:807::2004
    >>>>>
    2800:3f0:4001:807::2004 -> 2800:3f0:0:0:0:0:0:0 - 2800:3f0:ffff:ffff:ffff:ffff:ffff:ffff in AR
                   net segment 2800:3f0:0:0:0:0:0:0 - 2800:3f0:ffff:ffff:ffff:ffff:ffff:ffff
                   owned by 58353
  5. Verify whether the reported UNSO’s of Google result in a reasonable table of network segments which are ready to be fed into the ipfw(8) tool:
    $ ipup -t 9d99e3f7d38d1b8026f2ebbea4017c9f:58353
    >>>>>
    table 0 add 64.233.160.0/19
    table 0 add 66.102.0.0/20
    table 0 add 66.249.64.0/19
    table 0 add 70.32.128.0/19
    table 0 add 72.14.192.0/18
    table 0 add 74.114.24.0/21
    table 0 add 74.125.0.0/16
    table 0 add 108.170.192.0/18
    table 0 add 108.177.0.0/17
    table 0 add 142.250.0.0/15
    table 0 add 172.217.0.0/16
    table 0 add 172.253.0.0/16
    table 0 add 173.194.0.0/16
    table 0 add 192.178.0.0/15
    table 0 add 199.36.152.0/21
    table 0 add 207.223.160.0/20
    table 0 add 208.68.108.0/22
    table 0 add 208.81.188.0/22
    table 0 add 209.85.128.0/17
    table 0 add 216.58.192.0/19
    table 0 add 216.239.32.0/19
    table 0 add 2001:4860:0:0:0:0:0:0/32
    table 0 add 2604:31c0:0:0:0:0:0:0/32
    table 0 add 2607:f8b0:0:0:0:0:0:0/32
    table 0 add 2800:3f0:0:0:0:0:0:0/32
  6. Add the above table generation command to the ipfw configuration script:
    ...
    /sbin/ipfw -q table all destroy
    ...
    ...
    # Google’s blind spot:
    /sbin/ipfw -q table 0 create
    /usr/local/bin/ipup -t 9d99e3f7d38d1b8026f2ebbea4017c9f:58353 | /sbin/ipfw -q /dev/stdin
    /sbin/ipfw -q add 71 deny tcp from table\(0\) to any 80,443 in recv em0 setup
    ...


Goodbye Google

My personal search engine of choice is now Qwant Lite.

Copyright © Dr. Rolf Jansen - 2018-10-02 09:27:18

PROMOTION