Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

mail_open
Prevents bot to scrap my magento site
Muhammad Faizan

Not so New
October 30, 2014 - 10:32 pm
Member Since: August 8, 2014
Forum Posts: 4
sp_UserOfflineSmall Offline

We have figured the bots are accessing our website on daily basis and trying to hit 1000’s of searches with manufacture part number to scrap the product data and the price for competitive analysis.

We have blocked the IP’s in past but it appears that they are using proxy IP’s to crawl our website.

We are thinking of putting some limitation of maximum 100 searches via an IP and but just wondering if there is any extension available .

or if there will be any harm doing it.

Thanks

Avatar
Dave Furness

Founder
October 30, 2014 - 11:27 pm
Member Since: July 19, 2013
Forum Posts: 4606
sp_UserOfflineSmall Offline

Hi Muhammad,

I haven’t come across an extension for this yet but I bet there is an extension that would do this in the connect marketplace.

Dave

Every expert was once a beginner

Muhammad Faizan

Not so New
October 30, 2014 - 11:38 pm
Member Since: August 8, 2014
Forum Posts: 4
sp_UserOfflineSmall Offline

Hi Dave.

Thanks for the quick response. I will check the connect marketplace and will update this thread with my further findings . BTW , I was analysing my server logs and figured 300-400 search request coming per second which is making kind of DDos.

This is very scary for us as it hit the server to its maximum utilization and site become unavailable for general visitors.

Thanks
Faizan

Avatar
Dave Furness

Founder
October 30, 2014 - 11:40 pm
Member Since: July 19, 2013
Forum Posts: 4606
sp_UserOfflineSmall Offline

Wow yea, I know there are things you can do Server side … have you tried contacting your host about this or are you self hosted?

Dave

Every expert was once a beginner

Muhammad Faizan

Not so New
October 30, 2014 - 11:47 pm
Member Since: August 8, 2014
Forum Posts: 4
sp_UserOfflineSmall Offline

I would be thankful if you can share the thing can be done on server side. We are using N E X C E S S (intentionally put these spaces) . I don’t want them to look bad but so far disappointed with them about this problem.

Avatar
Matthew Ogborne

Founder
October 31, 2014 - 8:37 am
Member Since: July 18, 2013
Forum Posts: 4565
sp_UserOfflineSmall Offline

Hi Faizan,

That must be most annoying, they’re probably using tor or similar so tracking by IP address is ni-on impossible because if they have done it correctly, it will switch IP addresses every few minutes or spoofed them.

If you are using your own dedicated server then you have several options available to you. These are the first two and I use them both myself.

Fail2ban – http://www.fail2ban.org/wiki/i…../Main_Page
ModSecurity – https://www.modsecurity.org/

Some light reading here: http://www.proxiblue.com.au/bl…..2ban-ddos/

CloudFlare is also an option here, they’re pretty smart and pick up on naughty IP’s quickly.

Also it’s highly likely that their bot is following a specific pattern, that will be it’s weakness so look for the starting URL’s as they’re be the ones to identify a bot quickly (so you can do something like this).

And I would personally have some fun with it and on the page where they are scraping the price, drop the price just to a known set of IP’s to something where they lose money. Won’t do much, but would cheer me up :D

Matt

"Selling an item online is easy, but making living from a business that sells online, well that’s something different entirely!"

Ultimo Magento Theme

Avatar
Jim @ Moogento
Global

Partner
January 2, 2015 - 11:07 am
Member Since: November 7, 2013
Forum Posts: 688
sp_UserOfflineSmall Offline

Fail2Ban is great, it banned something like 35,000 IPs the first day running.

I also use a server-side conf file to filter out obvious ones. When I first set this up (on .htaccess, years back) server bandwidth dropped about 95%!

##
## Add here all HTTP methods allowed
map $request_method $bad_method {
default 1;
~(?i)(GET|HEAD|POST) 0;
}
map $http_user_agent $is_bad_bot {
default 0;
~*(LWP::Simple|Purebot|Baiduspider|Lipperhey|Mail.Ru|scrapbot|wget|libwww-perl|BBBike) 1;
~(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) 1;
~(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) 1;
~(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) 1;
~(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) 1;
~(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) 1;
~(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) 1;
~(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) 1;
~(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) 1;
~(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) 1;
~(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) 1;
~(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) 1;
~(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) 1;
~(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) 1;
~(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) 1;
~(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) 1;
~(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) 1;
~(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) 1;
~(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) 1;
~(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) 1;
~(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) 1;
~(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) 1;
~(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) 1;
~(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) 1;
~(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) 1;
~web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) 1;
~*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ 1;
}
map $http_referer $is_bad_referrer {
default 0;
~*(jewelry|viagra|nude|girl|nudit|casino|poker|porn|sex|teen|babes|cialis|levitra|semalt.com|pills|spyfu|poker|texas.?hold-?em|diet|loan|cash) 1;
}
##

server {
#listen stuff here

##
# Deny access based on HTTP method
if ($bad_method = 1) {
return 444;
}
if ($is_bad_bot = 1) {
return 403;
}
if ($is_bad_referrer = 1) {
return 403;
}
# deny scripts inside writable directories
location ~* /(images|var|media|skin|logs|tmp)/.*.(php|pl|py|jsp|asp|sh|cgi)$ {
return 403;
error_page 403 /403_error.html;
}
# On with normal stuff here

Those cut-outs were taken from analysing access files. But, years ago. Probably there are some new words to ban these days, but I still use this.

You can adapt it for .htaccess (^ this is for nginx). I can dig it out for you if you like.

This file also used to save the IPs to another file as part of the ban. I stopped that for some reason, but it could be a good add-on.

  • pickPack - smarter Magento packing sheets and warehouse picklists
  • shipEasy - process multiple orders with no sweat & get a visual sales overview easily

 

Why Should You Join UnderstandingE?

 
  • Access to over 500 step-by-step video tutorials
  • +20 video courses available
  • Magento, M2E Pro, Magmi, eBay, Amazon & Design all covered
  • Everything is in 100% Plain English
  • Learn how to build your own multi-channel software for eBay & Amazon
  • Access to the community forums, meet fellow business owners like yourself

Join Now with 2 Clicks

 

Join now for less that £1 per day you can gain access to over 400 step-by-step video tutorials & learn how to build your very own multi-channel software.

 

Forum Timezone: Europe/London

Most Users Ever Online: 1012

Currently Online:
15 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

Jim @ Moogento: 688

Steve Froggatt: 514

Badeth - UE: 513

Jimbob: 453

Paul Cartwright: 414

Forum Stats:

Groups: 6

Forums: 37

Topics: 5223

Posts: 27502

Administrators: Matthew Ogborne: 4565, Dave Furness: 4606